Unicode trouble with lemma_ still not fixed #51

NSchrading · 2015-04-13T19:40:20Z

As discussed in this issue: #32, printing lemmas of unicode words causes spaCy to crash. It was closed because I believe rsomeon did not want to contribute their patch, and instead preferred if you made your own changes. I don't believe the issue has been fixed, as I just tested it in v0.81 and I still get a crash.

from spacy.en import English
s = "Fiancé"
nlp = English()
tok = nlp(s)
print(tok[0].lemma_)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "spacy/tokens.pyx", line 585, in spacy.tokens.Token.lemma_.__get__ (spacy/tokens.cpp:10941)
  File "spacy/strings.pyx", line 73, in spacy.strings.StringStore.__getitem__ (spacy/strings.cpp:1671)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 5: unexpected end of data

The text was updated successfully, but these errors were encountered:

honnibal · 2015-04-13T21:26:03Z

Thanks, pushed a fix in 0.82

NSchrading · 2015-04-14T01:18:42Z

Yep, looks like it's working now. Thanks!

lock · 2018-05-09T18:31:24Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal pushed a commit that referenced this issue Apr 13, 2015

* Fix Issue #51: Handle non-ascii lemmas correctly

c670777

NSchrading closed this as completed Apr 14, 2015

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode trouble with lemma_ still not fixed #51

Unicode trouble with lemma_ still not fixed #51

NSchrading commented Apr 13, 2015

honnibal commented Apr 13, 2015

NSchrading commented Apr 14, 2015

lock bot commented May 9, 2018

Unicode trouble with lemma_ still not fixed #51

Unicode trouble with lemma_ still not fixed #51

Comments

NSchrading commented Apr 13, 2015

honnibal commented Apr 13, 2015

NSchrading commented Apr 14, 2015

lock bot commented May 9, 2018