Skip to content

Commit

Permalink
update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kermitt2 committed Aug 12, 2020
1 parent aeb0db8 commit 7ddbd46
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
2 changes: 1 addition & 1 deletion doc/End-to-end-evaluation.md
Expand Up @@ -2,7 +2,7 @@

Individual models can be evaluated as explained in [Training the different models of Grobid](Training-the-models-of-Grobid.md).

For an end-to-end evaluation, covering the whole extraction process from the parsing of PDF to the end result of the cascading of several CRF models, GROBID includes two possible evaluation progresses:
For an end-to-end evaluation, covering the whole extraction process from the parsing of PDF to the end result of the cascading of several sequence labelling models, GROBID includes two possible evaluation progresses:

* against a set of JATS-encoded (NLM) articles, such as [PubMed Central](http://www.ncbi.nlm.nih.gov/pmc) or [bioRxiv](https://www.biorxiv.org). For its publications, PubMed Central aprovides both PDF and fulltext XML files in the [NLM](http://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/style.html) format. Keeping in mind some limits described bellow, it is possible to estimate the ability of Grobid to extract and normalize the content of the PDF documents for matching the quality of the NLM file. bioRxiv is used in Grobid to evaluate more precisely performance on preprint articles.

Expand Down
3 changes: 1 addition & 2 deletions doc/Training-the-models-of-Grobid.md
Expand Up @@ -2,7 +2,7 @@

## Models

Grobid uses different CRF models depending on the labeling task to be realized. For a complex extraction and parsing tasks (for instance header extraction and parsing), several models are used in cascade. The current models are the following ones:
Grobid uses different sequence labelling models depending on the labeling task to be realized. For a complex extraction and parsing tasks (for instance header extraction and parsing), several models are used in cascade. The current models are the following ones:

* affiliation-address

Expand Down Expand Up @@ -129,4 +129,3 @@ If you wish to maintain the training corpus as gold standard, these automaticall
## Training guidelines

Annotation guidelines for creating the training data corresponding to the different GROBID models are available from the [following page](training/General-principles.md).

0 comments on commit 7ddbd46

Please sign in to comment.