You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like the token 's' can sometimes become lemmatized into the empty string.
>>> from spacy.en import English
>>> nlp = English()
>>> my_string = """s..."""
>>> tokens = nlp(my_string)
>>> for t in tokens:
... print("token: %s, lemma: %s" % (t, t.lemma_))
...
token: s, lemma:
token: ..., lemma: ...
In contrast, this does not happen if the string 's' is not followed by ellipsis dots:
>>> my_string = """s"""
>>> tokens = nlp(my_string)
>>> for t in tokens:
... print("token: %s, lemma: %s" % (t, t.lemma_))
...
token: s, lemma: s
This behaviour might cause unexpected results in downstream applications. One consequence is that textacy sometimes extracts the empty string as a keyword.
It seems like the token 's' can sometimes become lemmatized into the empty string.
In contrast, this does not happen if the string 's' is not followed by ellipsis dots:
This behaviour might cause unexpected results in downstream applications. One consequence is that textacy sometimes extracts the empty string as a keyword.
Environment
The text was updated successfully, but these errors were encountered: