Skip to content

Commit

Permalink
* Fix Issue #51: Handle non-ascii lemmas correctly
Browse files Browse the repository at this point in the history
  • Loading branch information
syllog1sm committed Apr 13, 2015
1 parent bf0aff5 commit c670777
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion spacy/en/pos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,8 @@ cdef class EnPosTagger:
cdef unicode lemma_string
lemma_strings = self.lemmatizer(py_string, pos)
lemma_string = sorted(lemma_strings)[0]
lemma = self.strings.intern(lemma_string.encode('utf8'), len(lemma_string)).i
bytes_string = lemma_string.encode('utf8')
lemma = self.strings.intern(bytes_string, len(bytes_string)).i
return lemma

def load_morph_exceptions(self, dict exc):
Expand Down

0 comments on commit c670777

Please sign in to comment.