Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German NER performance drops significantly after upgrade from 1.0.1 to 1.2.1 #739

Closed
oraveczcsaba opened this issue Jul 1, 2021 · 4 comments
Labels

Comments

@oraveczcsaba
Copy link

After upgrading stanza from version 1.0.1 to 1.2.1 we've noticed some strange behavior with German NER, namely the performance drops significantly, a lot of entities are missed completely or in part, others are misclassified. For a concrete example things like this happen all around:
Version 1.0.1, working well:

>>>import stanza
>>>nlp = stanza.Pipeline('de', processors='tokenize,mwt,ner', dir='/usr/lib/stanza', use_gpu=True)
2021-07-01 18:13:16 INFO: Loading these models for language: de (German):`
=======================
| Processor | Package |
-----------------------
| tokenize  | gsd     |
| mwt       | gsd     |
| ner       | conll03 |
=======================

2021-07-01 18:13:18 INFO: Use device: gpu
2021-07-01 18:13:18 INFO: Loading: tokenize
2021-07-01 18:13:20 INFO: Loading: mwt
2021-07-01 18:13:20 INFO: Loading: ner
2021-07-01 18:13:22 INFO: Done loading processors!
>>>doc = nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>>doc.entities
[{
  "text": "Giscard d'Estaing",
  "type": "PER",
  "start_char": 24,
  "end_char": 41
}]`

Exact same code/input, only with version 1.2.1:

>>>doc.entities
[]

Environment is the same in both cases (including the models), only the stanza module is updated.
English does not seem to show this behavior, other languages we haven't (yet) tested.
It's on a Centos 7.9 with default Python 3.6.8 but I'm not sure it plays any role here.
Any advice or help on how to fix it or how to find the possible reason is greatly appreciated.

@AngledLuffa
Copy link
Collaborator

AngledLuffa commented Jul 1, 2021 via email

@AngledLuffa
Copy link
Collaborator

Version 1.2.2 is pushed and should have this fix. Thanks for the report.

@AngledLuffa
Copy link
Collaborator

>>> stanza.__version__
'1.2.2'
>>> doc = nlp("Wir haben uns bei Herrn Giscard d'Estaing beschwert und um Akteneinsicht in die Arbeit des Präsidiums gebeten.")
>>> doc.entities
[{
  "text": "Giscard d'Estaing",
  "type": "PER",
  "start_char": 24,
  "end_char": 41
}]

@oraveczcsaba
Copy link
Author

Thanks!

@manning manning closed this as completed Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants