-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure special treatment of prefixes and suffixes #26
Comments
If I modify the conversion of a token string to In this case I'll have to be careful where I use |
The smarter option would be to analyze the grammatical structure of the word (ex. from linguistics API) in order to find prefixes, suffixes, particles, etc and remove them from Another simpler approach would be to define static lists of custom prefix and suffix patterns, provided by the user, and or included within quizgen. But again, this would still be prone to many false positives, excluding characters from the ends of words that are not actually grammatical prefixes or suffixes. |
English
Penn Treebank part-of-speech tags, as used by the POS tag API result object POS tagging works, but only for English. https://gist.github.com/ogallagher/5be9bfe5c1ef757cf4faccaac3dc7a55 |
Korean
Includes runtime performance analysis of different underlying models/engines. I will probably use the 꼬꼬마/kkma engine, which seems most accurate according to konlpy docs if the input string tokens are correctly delimited with spaces. |
I'm not sure what to do about them, but in cases where grammatical prefixes (ex. pre-, re-, un-) and suffixes (ex. -들, -에, -가, -는) occur frequently I've noticed some unusual results.
The text was updated successfully, but these errors were encountered: