You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading stanza from version 1.0.1 to 1.2.1 we've noticed some strange behavior with German NER, namely the performance drops significantly, a lot of entities are missed completely or in part, others are misclassified. For a concrete example things like this happen all around:
Version 1.0.1, working well:
>>>import stanza
>>>nlp = stanza.Pipeline('de', processors='tokenize,mwt,ner', dir='/usr/lib/stanza', use_gpu=True)
2021-07-01 18:13:16 INFO: Loading these models for language: de (German):`
=======================
| Processor | Package |
-----------------------
| tokenize | gsd |
| mwt | gsd |
| ner | conll03 |
=======================
2021-07-01 18:13:18 INFO: Use device: gpu
2021-07-01 18:13:18 INFO: Loading: tokenize
2021-07-01 18:13:20 INFO: Loading: mwt
2021-07-01 18:13:20 INFO: Loading: ner
2021-07-01 18:13:22 INFO: Done loading processors!
>>>doc = nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>>doc.entities
[{
"text": "Giscard d'Estaing",
"type": "PER",
"start_char": 24,
"end_char": 41
}]`
Exact same code/input, only with version 1.2.1:
>>>doc.entities
[]
Environment is the same in both cases (including the models), only the stanza module is updated.
English does not seem to show this behavior, other languages we haven't (yet) tested.
It's on a Centos 7.9 with default Python 3.6.8 but I'm not sure it plays any role here.
Any advice or help on how to fix it or how to find the possible reason is greatly appreciated.
The text was updated successfully, but these errors were encountered:
Unfortunately, this is a known issue in 1.2.1. There was a code change
which improved some models which inadvertently hurt other models. We will
probably have a new release in the next few days to address this, and in
the meantime you can try this:
https://test.pypi.org/project/stanza/1.3.0rc1/
>>>stanza.__version__'1.2.2'>>>doc=nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>>doc.entities
[{
"text": "Giscard d'Estaing",
"type": "PER",
"start_char": 24,
"end_char": 41
}]
After upgrading stanza from version 1.0.1 to 1.2.1 we've noticed some strange behavior with German NER, namely the performance drops significantly, a lot of entities are missed completely or in part, others are misclassified. For a concrete example things like this happen all around:
Version 1.0.1, working well:
Exact same code/input, only with version 1.2.1:
Environment is the same in both cases (including the models), only the stanza module is updated.
English does not seem to show this behavior, other languages we haven't (yet) tested.
It's on a Centos 7.9 with default Python 3.6.8 but I'm not sure it plays any role here.
Any advice or help on how to fix it or how to find the possible reason is greatly appreciated.
The text was updated successfully, but these errors were encountered: