You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I pushed a commit for v0.7.0 that does normalization by default, where normalization is downcasing and NFKD-based diacritic removal. This cannot be configured by the convenience methods wn.words(), wn.synsets(), etc., but it can be configured when creating a wn.Wordnet object via the normalize parameter. I think morphological analysis, such as morphy, would be a separate process, perhaps configurable via a lemmatize parameter, since the purpose is to find a lemmatic form.
If the pos is None, it yields values for all parts of speech. Also this function does not need to yield valid lemmatic forms. The results will be used for queries against the database, and these queries act as the filter, so there's no point in doing it twice. Something that requires a lexicon, such as Morphy's exception lists, can be instantiated to make use of a wordnet and still follow the signature:
I'm not sure if lemmatize should take the output of any normalize function, or vice versa. I can see arguments both ways. Maybe it needs some preprocessing pipeline instead: preprocessor=[normalize, lemmatize].
There needs to be a strategy for morphological normalization, for instance the use of Morphy in English-language lookup. Some ideas:
The text was updated successfully, but these errors were encountered: