Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information of the datasets #15

Closed
CamilleSchr opened this issue May 18, 2021 · 1 comment
Closed

Information of the datasets #15

CamilleSchr opened this issue May 18, 2021 · 1 comment

Comments

@CamilleSchr
Copy link

Hello,

Thanks for sharing these datasets !
I just try to find some more specific information on it ; for instance, how many tweets/comments/news are on the WNUT17 and on the CONLL 2003 ?

Thanks,
Cheers,
Camille

@juand-r
Copy link
Owner

juand-r commented May 18, 2021

Hi Camille,

It's probably best to find these statistics in either the associated papers or by looking at the datasets. However, I do have some statistics (number of tokens, sentences, and annotated entities by entity type) for these two:

https://github.com/juand-r/entity-recognition-datasets/blob/master/data/WNUT17/CONLL-format/docs/entity-list.txt
https://github.com/juand-r/entity-recognition-datasets/blob/master/data/conll2003/CONLL-format/docs/entity-list.txt

@juand-r juand-r closed this as completed May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants