de_dep_news_trf uses 1990s Spelling Convention in Lemmatization #9799
Labels
feat / lemmatizer
Feature: Rule-based and lookup lemmatization
lang / de
German language data and models
How to reproduce the behaviour
This prints
isst
whereessen
would be expected.Looks like the model just uses a lookup table which doesn't contain the 1996 changes to German spelling conventions. Same effect is observable for
frißt/frisst
as well.Your Environment
de-dep-news-trf @ https://github.com/explosion/spacy-models/releases/download/de_dep_news_trf-3.2.0/de_dep_news_trf-3.2.0-py3-none-any.whl
spacy==3.2.0
spacy-alignments==0.8.4
spacy-legacy==3.0.8
spacy-loggers==1.0.1
spacy-transformers==1.1.2
The text was updated successfully, but these errors were encountered: