-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If a word does not have a POS (specifically xpos) value what should I do? #1366
Comments
This is kinda funny. The tokenizer has misinterpreted To answer the question regarding not having an Going through the English words ending with "nna", here are some broken examples:
Correctly not split:
Correctly split, but the lemmatizer gets it wrong - presumably no training data available -
should split to "do not" Then there's the ambiguous case of Heh, THREE Star Trek references in one github response. I'm becoming more efficient! |
Thank you for the quick response. That fixed my issue! |
Without knowing your particular application, I would be surprised if it's happy with |
Alright, I added a few more sentences with henna to the training data until it stopped splitting that word for no reason. The new models should be automatically downloaded by v1.8.1
|
This is now part of the 1.8.2 release |
(I apologize if this has already been asked I tried for a long time to find an existing answer).
I running into an error where I need the XPOS value for a word but it doesn't have any POS value.
I have provided an extremely simplified version of my code below.
These are the word values for the last two words:
I am using the XPOS value of each word in the text. The string "joanna" does not have any POS data. Is the reason for this that the word doesn't exist in the vocabulary where the POS data is sourced? It is a name so it makes sense that every name wouldn't be added to a vocabulary. Should I manually state in my code that the XPOS is PROPN? I am working with a large amount of text that has many potential instances of unique names. Is there a better way to handle such instances, like an existing library of uncommon names?
Thank you for your help, I'm sorry if there is an obvious solution to this I am a complete stanza novice.
The text was updated successfully, but these errors were encountered: