-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of the CoNLL 2003 NER pretrained model? #390
Comments
Hello @nreimers thanks very much for reporting this. When running your script I found that the final evaluation numbers of the serialized NER model are different with every run and that predictions are currently not entirely deterministic (try I checked a more recently trained model and predictions are deterministic. With this model, I get the following results with your script: processed 46666 tokens with 5648 phrases; found: 5683 phrases; correct: 5269.
accuracy: 98.43%; precision: 92.72%; recall: 93.29%; FB1: 93.00
LOC: precision: 93.79%; recall: 94.24%; FB1: 94.02 1676
MISC: precision: 82.84%; recall: 83.19%; FB1: 83.01 705
ORG: precision: 91.32%; recall: 92.47%; FB1: 91.89 1682
PER: precision: 97.35%; recall: 97.53%; FB1: 97.44 1620 What this means from our side is that we need to do the following:
|
Hi @alanakbik |
Hello @alanakbik I also tried to reproduce your results but failed... However, it is related to this issue, I guess. Thanks. |
@yahshibu thanks for your interest - the next release will probably happen end of next week! |
@alanakbik Thank you for letting me know! I'm so pumped! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi,
thank you for open-sourcing this nice library for sequence tagging, great work!
I tried to evaluate the pre-trained CoNLL 2003 English-NER model, but I get slightly different results.
Output from the official Perl evaluation script of the pretrained NER model on the CoNLL 2003 NER test set:
According to the conlleval-script, the performance on the test set is 92.64. I would have expected that this model achieves a performance of 93.09 or something close to that. Am I doing something wrong?
My code:
It would be really great if you could also publish the performance scores of your pretrained models.
The text was updated successfully, but these errors were encountered: