Skip to content

Commit

Permalink
Document update
Browse files Browse the repository at this point in the history
  • Loading branch information
lfoppiano committed Aug 31, 2016
1 parent 98c398d commit 325a568
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion grobid-ner/doc/training-ner-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

### Datasets

The Grobid NER has been trained on several different datasets:
Grobid NER has been trained on several different datasets :

- Reuters NER [CONLL 2003](http://www.cnts.ua.ac.be/conll2003/ner/) manually annotated training data (10k words, 26 classes). This dataset is not public, so not shipped with the code. In order to obtain it,

- Manually annotated extract from the Wikipedia article on World War 1 (approximately 10k words, 26 classes)

The datasets distributed with this project are publicly available under the following licences:

- [Wikipedia](http://www.wikipedia.org) data is available under the licence [Creative Commons Attribution-ShareAlike License](https://creativecommons.org/licenses/by-sa/3.0/).

- [EHRI](https://portal.ehri-project.eu) data from the research portal, openly available as mentioned in the EHRI [data policy](https://portal.ehri-project.eu/data-policy).


The following datasets has been used as training data, but are not distributed with the project:

- Reuters corpus, not publicly available. To obtain it, contact [NIST](http://trec.nist.gov/data/reuters/reuters.html).

0 comments on commit 325a568

Please sign in to comment.