Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokens can be lemmatized into the empty string #719

Closed
mlehl88 opened this issue Jan 4, 2017 · 1 comment
Closed

Tokens can be lemmatized into the empty string #719

mlehl88 opened this issue Jan 4, 2017 · 1 comment
Labels
bug Bugs and behaviour differing from documentation lang / en English language data and models

Comments

@mlehl88
Copy link

mlehl88 commented Jan 4, 2017

It seems like the token 's' can sometimes become lemmatized into the empty string.

>>> from spacy.en import English
>>> nlp = English()
>>> my_string = """s..."""
>>> tokens = nlp(my_string)
>>> for t in tokens:
...     print("token: %s, lemma: %s" % (t, t.lemma_))
... 
token: s, lemma: 
token: ..., lemma: ...

In contrast, this does not happen if the string 's' is not followed by ellipsis dots:

>>> my_string = """s"""
>>> tokens = nlp(my_string)
>>> for t in tokens:
...     print("token: %s, lemma: %s" % (t, t.lemma_))
... 
token: s, lemma: s

This behaviour might cause unexpected results in downstream applications. One consequence is that textacy sometimes extracts the empty string as a keyword.

Environment

  • Operating System: Mac OS X Version 10.10.5
  • Python Version Used: 3.6.0
  • spaCy Version Used: 1.5
  • Environment Information: Anaconda virtual environment
  • Spacy model: en-1.1.0
@ines ines added bug Bugs and behaviour differing from documentation lang / en English language data and models labels Jan 8, 2017
@ines ines added this to the Update lemmatizer and morphology milestone Feb 18, 2017
ines added a commit that referenced this issue Mar 13, 2017
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation lang / en English language data and models
Projects
None yet
Development

No branches or pull requests

2 participants