New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nltk.pos_tag performance #1110
Comments
@stevenbird , I think the simplest to "reimplement" with top quality accuracy is actually honnibal's perception tagger. Next, the simplest to implement with unknown accuracy is ngram taggers with backoff ngram taggers using existing tagger corpora in NLTK or redistributable model that one can build with LDC's corpora (something like Spaghetti tagger. There's one hand-crafted tagger with unknown accuracy too in My last suggestion is a pre-trained Brill tagger, pattern has a list of rules: https://github.com/clips/pattern/blob/master/pattern/text/en/en-context.txt |
@alvations: thanks for those suggestions. It looks like Honnibal's tagger uses an MIT license. @syllog1sm is there any objection to including this tagger in NLTK? |
Sure, go ahead. You might want my transition-based parser too, which is also under MIT I think. |
@honnibal – great, thanks very much |
Another example of bad behaviour: http://stackoverflow.com/questions/32571486/nltk-red-not-recognised-as-an-adjective |
Resolved by #1122 |
NLTK's built-in POS tagger doesn't perform so well:
We need a better built-in model.
cf. http://stackoverflow.com/questions/30821188/python-nltk-pos-tag-not-returning-the-correct-part-of-speech-tag
The text was updated successfully, but these errors were encountered: