Migration of the technical documentation to readthedocs.io #32

lfoppiano · Aug 11, 2017 · dac12cc · dac12cc
1 parent f3e3ac4
commit dac12cc
Show file tree

Hide file tree

Showing 4 changed files with 100 additions and 78 deletions.
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 [![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)
 <!-- [![Build Status](https://travis-ci.org/kermitt2/grobid-quantities.svg?branch=master)](https://travis-ci.org/kermitt2/grobid-quantities) -->
 <!-- [![Coverage Status](https://coveralls.io/repos/kermitt2/grobid-quantities/badge.svg)](https://coveralls.io/r/kermitt2/grobid-quantities) -->
-<!-- [![Documentation Status](https://readthedocs.org/projects/grobid-quantities/badge/?version=latest)](https://readthedocs.org/projects/grobid-quantities/?badge=latest) -->
+[![Documentation Status](https://readthedocs.org/projects/grobid-quantities/badge/?version=latest)](https://readthedocs.org/projects/grobid-quantities/?badge=latest)
 
 __Work in progress.__
 
@@ -17,84 +17,9 @@ One additional goal is also to identify and attached to the measurements the "qu
 
 As the other GROBID models, the module relies only on machine learning and uses linear CRF. 
 
-## Install, build, run
+## Documentation
 
-Building grobid-quantities requires maven and JDK 1.8.  
-
-First install the latest development version of GROBID as explained by the [documentation](http://grobid.readthedocs.org).
-
-Copy the module quantities as sibling sub-project to grobid-core, grobid-trainer, etc.:
-> cp -r grobid-quantities grobid/
-
-Try compiling everything with:
-> cd PATH-TO-GROBID/grobid/
-
-> mvn -Dmaven.test.skip=true clean install
-
-Run some test: 
-> cd PATH-TO-GROBID/grobid/grobid-quantities
-
-> mvn compile test
-
-**The models have to be trained before running the tests!**
-
-## Training
-
-For training the quantity model:
-> cd PATH-TO-GROBID/grobid/grobid-quantities
-
-> mvn generate-resources -Ptrain_quantities
-
-For training the unit model:
-
-> mvn generate-resources -Ptrain_units
-
-For the moment, the default training stop criteria are used. So, the training can be stopped manually after 1000 iterations, simply do a "control-C" to stop the training and save the model produced in the latest iteration. 1000 iterations are largely enough. Otherwise, the training will continue beyond several thousand iterations before stopping. 
-The models will be saved under ```grobid-home/models/quantities``` and ```grobid-home/models/units``` respectively.
-
-## Training data
-
-As the rest of GROBID, the training data is encoded following the [TEI P5](http://www.tei-c.org/Guidelines/P5). See the GROBID quantities [annotation guidelines page](doc/Annotation-Guidelines.md) for detailed explanations and examples.  
-
-## Generation of training data
-
-Training data generation works the same as in GROBID, with executable name ```createTrainingQuantities```, for example:
-
-> java -jar target/grobid-quantities-0.4.0-SNAPSHOT.one-jar.jar -gH ../grobid-home/ -gP ../grobid-home/config/grobid.properties -dIn ~/grobid/grobid-quantities/src/test/resources/ -dOut ~/test/ -exe createTrainingQuantities
-
-The input directory can contain PDF (.pdf, scientific articles only), XML/TEI (.xml or .tei, for patents and scientific articles) and text files (.txt).
-
-For the unit model the training data cannot be generated automatically from PDF. The overall effort is similar to create the training data from scratch manually.
-**Advanced**: There is the possibility to generate a simple unit training data file (covering mostly all the unit once, and the combiation between SI base units and prefixes). This generator uses the file lexicon file information (notation, inflections and so on, e.g. resources/en/units.json).
-
-To generate the data:
-
- ```bash
- > java -jar target/grobid-quantities-0.4.0-SNAPSHOT.one-jar.jar -gH ../grobid-home/ -gP ../grobid-home/config/grobid.properties
- -dIn input/resources -dOut /tmp/ -exe generateTrainingUnits
- ```
-
-The input directory should be the directory containing prefixes.txt and units.json (normally by language) (e.g. of input/resources /~/grobid-quantities/src/main/resources/en)
-
-
-## Start the service
-
-> mvn -Dmaven.test.skip=true jetty:run-war
-
-Demo/console web app is then accessible at ```http://localhost:8060```
-
-Using ```curl``` POST/GET requests:
-
-
-```
-curl -X POST -d "text=I've lost one minute." localhost:8060/processQuantityText
-```
-
-```
-curl -GET --data-urlencode "text=I've lost one minute." localhost:8060/processQuantityText
-```
-
-Note that the model is designed and trained to work at __paragraph level__. It means that, for the moment, the expected input to the parser is a paragraph or a text segment of similar size, not a complete document. In case you have a long textual document, it is better either to exploit existing structures (e.g. XML/HTML elements) to segment it initially into paragraphs or sentences, or to apply an automatic paragraph/sentence segmentation, and then send separately to grobid-quantities the equivalent of a paragraph-size texts to be processed. 
+You can find the latest documentation [here](http://grobid-quantities.readthedocs.io). 
 
 ## License
 

diff --git a/doc/gettingStarted.rst b/doc/gettingStarted.rst
@@ -0,0 +1,69 @@
+Getting started
+===============
+
+Building grobid-quantities requires *maven* and *JDK 1.8*.
+
+Build and install
+~~~~~~~~~~~~~~~~~
+
+First install the latest development version of GROBID as explained by the `documentation <http://grobid.readthedocs.org>`_.
+
+Copy the module quantities as sibling sub-project to grobid-core, grobid-trainer, etc.:
+::
+   cp -r grobid-quantities grobid/
+
+Try compiling everything with:
+::
+   cd PATH-TO-GROBID/grobid/
+
+   mvn -Dmaven.test.skip=true clean install
+
+Run some test:
+::
+   cd PATH-TO-GROBID/grobid/grobid-quantities
+
+   mvn compile test
+
+**The models have to be trained before running the tests!**
+
+Training
+~~~~~~~~
+
+For training the quantity model:
+::
+  cd PATH-TO-GROBID/grobid/grobid-quantities
+
+  mvn generate-resources -Ptrain_quantities
+
+For training the unit model:
+::
+  mvn generate-resources -Ptrain_units
+
+For the moment, the default training stop criteria are used. So, the training can be stopped manually after 1000 iterations, simply do a "control-C" to stop
+the training and save the model produced in the latest iteration. 1000 iterations are largely enough.
+
+Otherwise, the training will continue beyond several thousand iterations before stopping.
+
+The models will be saved under ``grobid-home/models/quantities`` and ``grobid-home/models/units`` respectively.
+
+
+Start the service
+~~~~~~~~~~~~~~~~~
+
+Grobid quantities can be run as a service using jetty:
+::
+  mvn -Dmaven.test.skip=true jetty:run-war
+
+Demo/console web app is then accessible at ``http://localhost:8060``
+
+Using ``curl`` POST/GET requests:
+::
+  curl -X POST -d "text=I've lost one minute." localhost:8060/processQuantityText
+
+  curl -GET --data-urlencode "text=I've lost one minute." localhost:8060/processQuantityText
+
+Note that the model is designed and trained to work at *paragraph level*.
+It means that, for the moment, the expected input to the parser is a paragraph or a text segment of similar size, not a complete document.
+In case you have a long textual document, it is better either to exploit existing structures (e.g. XML/HTML elements) to segment it
+initially into paragraphs or sentences, or to apply an automatic paragraph/sentence segmentation, and then send separately to
+grobid-quantities the equivalent of a paragraph-size texts to be processed.
diff --git a/doc/index.rst b/doc/index.rst
@@ -9,5 +9,7 @@ Welcome to Grobid-quantities's documentation!
 .. toctree::
   :maxdepth: 2
 
+  gettingStarted.rst
   guidelines.rst
+  trainingData.rst
 
diff --git a/doc/trainingData.rst b/doc/trainingData.rst
@@ -0,0 +1,26 @@
+Training data
+=============
+
+As the rest of GROBID, the training data is encoded following the `TEI P5 <http://www.tei-c.org/Guidelines/P5>`_.
+See :doc:`guidelines` for detailed explanations and examples.
+
+Generation of training data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Training data generation works the same as in GROBID, with executable name ``createTrainingQuantities``, for example:
+::
+   java -jar target/grobid-quantities-0.4.0-SNAPSHOT.one-jar.jar -gH ../grobid-home/ -gP ../grobid-home/config/grobid.properties
+   -dIn ~/grobid/grobid-quantities/src/test/resources/ -dOut ~/test/ -exe createTrainingQuantities
+
+The input directory can contain PDF (.pdf, scientific articles only), XML/TEI (.xml or .tei, for patents and scientific articles) and text files (.txt).
+
+For the unit model the training data cannot be generated automatically from PDF. The overall effort is similar to create the training data from scratch manually.
+
+**Advanced**: There is the possibility to generate a simple unit training data file (covering mostly all the unit once, and the combiation between SI base units and prefixes). This generator uses the file lexicon file information (notation, inflections and so on, e.g. resources/en/units.json).
+
+To generate the data:
+::
+  java -jar target/grobid-quantities-0.4.0-SNAPSHOT.one-jar.jar -gH ../grobid-home/ -gP ../grobid-home/config/grobid.properties
+  -dIn input/resources -dOut /tmp/ -exe generateTrainingUnits
+
+The input directory should be the directory containing prefixes.txt and units.json (normally by language) (e.g. of input/resources /~/grobid-quantities/src/main/resources/en)