German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

oraveczcsaba · 2021-07-01T16:54:35Z

After upgrading stanza from version 1.0.1 to 1.2.1 we've noticed some strange behavior with German NER, namely the performance drops significantly, a lot of entities are missed completely or in part, others are misclassified. For a concrete example things like this happen all around:
Version 1.0.1, working well:

>>>import stanza
>>>nlp = stanza.Pipeline('de', processors='tokenize,mwt,ner', dir='/usr/lib/stanza', use_gpu=True)
2021-07-01 18:13:16 INFO: Loading these models for language: de (German):`
=======================
| Processor | Package |
-----------------------
| tokenize  | gsd     |
| mwt       | gsd     |
| ner       | conll03 |
=======================

2021-07-01 18:13:18 INFO: Use device: gpu
2021-07-01 18:13:18 INFO: Loading: tokenize
2021-07-01 18:13:20 INFO: Loading: mwt
2021-07-01 18:13:20 INFO: Loading: ner
2021-07-01 18:13:22 INFO: Done loading processors!
>>>doc = nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>>doc.entities
[{
  "text": "Giscard d'Estaing",
  "type": "PER",
  "start_char": 24,
  "end_char": 41
}]`

Exact same code/input, only with version 1.2.1:

>>>doc.entities
[]

Environment is the same in both cases (including the models), only the stanza module is updated.
English does not seem to show this behavior, other languages we haven't (yet) tested.
It's on a Centos 7.9 with default Python 3.6.8 but I'm not sure it plays any role here.
Any advice or help on how to fix it or how to find the possible reason is greatly appreciated.

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2021-07-01T17:41:24Z

Unfortunately, this is a known issue in 1.2.1. There was a code change which improved some models which inadvertently hurt other models. We will probably have a new release in the next few days to address this, and in the meantime you can try this: https://test.pypi.org/project/stanza/1.3.0rc1/

AngledLuffa · 2021-07-13T04:44:59Z

Version 1.2.2 is pushed and should have this fix. Thanks for the report.

AngledLuffa · 2021-07-13T04:48:35Z

>>> stanza.__version__
'1.2.2'
>>> doc = nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>> doc.entities
[{
  "text": "Giscard d'Estaing",
  "type": "PER",
  "start_char": 24,
  "end_char": 41
}]

oraveczcsaba · 2021-07-13T07:21:49Z

Thanks!

oraveczcsaba added the bug label Jul 1, 2021

manning closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

oraveczcsaba commented Jul 1, 2021

AngledLuffa commented Jul 1, 2021 via email

AngledLuffa commented Jul 13, 2021

AngledLuffa commented Jul 13, 2021

oraveczcsaba commented Jul 13, 2021

German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

Comments

oraveczcsaba commented Jul 1, 2021

AngledLuffa commented Jul 1, 2021 via email

AngledLuffa commented Jul 13, 2021

AngledLuffa commented Jul 13, 2021

oraveczcsaba commented Jul 13, 2021