Updating a bit the documentatioin

kermitt2 · Aug 29, 2016 · 7dd5480 · 7dd5480
1 parent eed6b73
commit 7dd5480
Show file tree

Hide file tree

Showing 4 changed files with 20 additions and 18 deletions.
diff --git a/Readme.md b/Readme.md
@@ -1,13 +1,13 @@
 # grobid-ner
 
-[![Documentation Status](https://readthedocs.org/projects/grobid-ner/badge/?version=latest)](https://readthedocs.org/projects/grobid-ner/?badge=latest)
+[![Documentation Status](https://readthedocs.org/projects/grobid-ner/badge/?version=latest)](http://grobid-ner.readthedocs.io/en/latest/)
 [![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)
 
 ## Purpose
 
 GROBID NER is a Named-Entity Recogniser based on the GROBID library ([grobid](https://raw.github.com/kermitt2/grobid)), a text mining tool exploiting CRF. The installation of GROBID is necessary.  
 
-Grobid NER has been developed more specifically for the purpose of supporting disambiguation and resolution of the entities against knowledge bases such as Wikipedia. For a description of the NER, installation, usage and other technical features, see the [documentation](https://readthedocs.org/projects/grobid-ner/?badge=latest). 
+Grobid NER has been developed more specifically for the purpose of supporting disambiguation and resolution of the entities against knowledge bases such as Wikipedia. For a description of the NER, installation, usage and other technical features, see the [documentation](http://grobid-ner.readthedocs.io/en/latest/). 
 
 ## License
 

diff --git a/grobid-ner/doc/build-and-install.md b/grobid-ner/doc/build-and-install.md
@@ -1,8 +1,8 @@
-Grobid NER is a module of [Grobid](https://github.com/kermitt2/grobid) . 
+GROBID NER is a module of [Grobid](https://github.com/kermitt2/grobid) . 
 
 ## Grobid Installation
 
-Grobid is library for extracting bibliographical information from technical and scientific documents. 
+GROBID is library for extracting bibliographical information from technical and scientific documents. 
 The tool offers a convenient environment for creating efficient text mining tool based on CRF.
 
 Clone source code from github:
@@ -29,15 +29,15 @@ Clone source code from github:
 Or download directly the zip file:
 > https://github.com/kermitt2/grobid/zipball/master
 
-Grobid NER is actually a sub-project of Grobid. 
-Although Grobid NER will be merged with Grobid in the future, at this point the Grobid NER sub-module simply need to added manually. 
-In the main directory of Grobid NER:
+GROBID NER is actually a sub-project of GROBID. 
+Although GROBID NER might be merged with GROBID in the future, at this point the GROBID NER sub-module simply need to added manually. 
+In the main directory of GROBID NER:
 
 > cp -r grobid-ner /path/to/grobid/
 
 > cp -r grobid-home/models/* /path/to/grobid/grobid-home/
 
-Then build the Grobid NER subproject:
+Then build the GROBID NER subproject:
 
 > cd /path/to/grobid/grobid-ner
 

diff --git a/grobid-ner/doc/class-and-senses.md b/grobid-ner/doc/class-and-senses.md
@@ -1,6 +1,6 @@
-Grobid NER identifies named-entities and classifies them in 26 classes, as compared to the 4-classes or 7-classes model of most of the existing NER open source tools (usually using the Reuters/CoNLL 2003 annotated corpus, or the MUC annotated corpus). 
+GROBID NER identifies named-entities and classifies them in 26 classes, as compared to the 4-classes or 7-classes model of most of the existing NER open source tools (usually using the Reuters/CoNLL 2003 annotated corpus, or the MUC annotated corpus). 
 
-In addition the entities are often enriched with WordNet sense annotations to help further disambiguation and resolution of the entity. Grobid NER has been developed for the purposed of disambiguating and resolving entities against knowledge bases such as Wikipedia and FreeBase. Sense information can help to disambiguate the entity, because they refine based on contextual clues the entity class.
+In addition the entities are often enriched with WordNet sense annotations to help further disambiguation and resolution of the entity. GROBID NER has been developed for the purposed of disambiguating and resolving entities against knowledge bases such as Wikipedia and FreeBase. Sense information can help to disambiguate the entity, because they refine based on contextual clues the entity class.
 
 ## Named entity classes
 
@@ -37,7 +37,7 @@ The following table describes the 26 named entity classes produced by the model.
 					
 ## Conventions
 
-For the class assignation to entities, Grobid NER follows the longest match convention. For instance, the entity _University of Minnesota_ as a whole (longest match) will belong to the class INSTITUTION. Its component _Minnesota_ is a LOCATION, but as it is part of a larger entity chunk, it will not be identified. 
+For the class assignation to entities, GROBID NER follows the longest match convention. For instance, the entity _University of Minnesota_ as a whole (longest match) will belong to the class INSTITUTION. Its component _Minnesota_ is a LOCATION, but as it is part of a larger entity chunk, it will not be identified. 
 
 
 ## Sense information

diff --git a/grobid-ner/doc/index.md b/grobid-ner/doc/index.md
@@ -1,15 +1,17 @@
 # GROBID Named Entity Recognition Documentation
 
-## Purpose
+## Purposes
 
-Grobid NER is a Named-Entity Recogniser module for [GROBID](https://raw.github.com/kermitt2/grobid), a text mining tool exploiting CRF.
-Grobid NER has been developed more specifically for the purpose of supporting disambiguation and resolution of entities against knowledge bases such as Wikipedia.
+GROBID NER is a Named-Entity Recogniser module for [GROBID](https://raw.github.com/kermitt2/grobid), a tool based on CRF.
+GROBID NER has been developed more specifically for the purpose of further supporting post disambiguation and resolution of entities against knowledge bases such as Wikipedia.
 
-The models supplied with the source have been trained using the following dataset: 
-- [CONLL 2003](http://www.cnts.ua.ac.be/conll2003/ner/) Manually annotated training data (20k words, 4 classes)
-- Wikipedia semi-automatic generated data (approximately 10k words, 26 classes)
+The current models shipped with the source uses 26 Named Entity classes and have been trained using the following dataset: 
+- Reuters NER [CONLL 2003](http://www.cnts.ua.ac.be/conll2003/ner/) partially manually annotated training data (10k words)
+- Manually annotated extract from the Wikipedia article on World War 1 (approximately 10k words)
 
-Training data and annotation work will be always welcomed, if you like to contribute, you can contact us via email or by opening an issue in the GitHUB project.
+The training has been completed with a very large semi-supervised training based on the Wikipedia Idilia data set. 
+
+Annotated data will be always welcomed, if you like to contribute, you can contact us via email or by opening an issue in the GitHub project.
 
 ## About