Skip to content

Commit

Permalink
Merge pull request #168 from lfoppiano/update-training-data
Browse files Browse the repository at this point in the history
Update training data, update models, fix docker, update e2e eval, new documentation with Markdown
  • Loading branch information
lfoppiano committed Mar 27, 2024
2 parents 5d475a8 + 8eb8830 commit caacf7d
Show file tree
Hide file tree
Showing 142 changed files with 1,334,313 additions and 847,015 deletions.
12 changes: 6 additions & 6 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ build:
submodules:
exclude: all

#python:
# install:
# - requirements: doc/requirements.txt
python:
install:
- requirements: doc/requirements.txt

#mkdocs:
# configuration: mkdocs.yml
# fail_on_warning: false
mkdocs:
configuration: doc/mkdocs.yml
fail_on_warning: false
2 changes: 1 addition & 1 deletion Dockerfile.local
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ WORKDIR /opt
# build runtime image
# -------------------

FROM grobid/grobid:0.8.0 as runtime
FROM lfoppiano/grobid:0.8.0-full-slim as runtime

# setting locale is likely useless but to be sure
ENV LANG C.UTF-8
Expand Down
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ task copyModels(type: Copy) {
task downloadTransformers(dependsOn: copyModels) {
doLast {
download {
src "https://transformers-data.s3.eu-central-1.amazonaws.com/quantities-transformers.zip"
src "https://transformers-data.s3.eu-central-1.amazonaws.com/quantities-transformers-240226.zip"
dest "${grobidHome}/models/quantities-transformers.zip"
overwrite false
print "Download bulky transformers files under grobid-home: ${grobidHome}"
Expand Down
20 changes: 0 additions & 20 deletions doc/Makefile

This file was deleted.

181 changes: 0 additions & 181 deletions doc/conf.py

This file was deleted.

125 changes: 125 additions & 0 deletions doc/evaluation-scores.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Evaluation scores

## End 2 end evaluation

The end-to-end evaluation was performed with the [MeasEval dataset](https://github.com/harperco/MeasEval) (SemEval-2021
Task 8).
The scores in the following table are the micro average. MeasEval was annotated to allow approximated entities, which
are not supported in grobid-quantities.

| Type (Ref) | Matching method | Precision | Recall | F1-score | Support |
|---------------------------|------------------|-----------|--------|----------|---------|
| Quantities (QUANT) | strict | 54.09 | 54.47 | 54.28 | 1137 |
| Quantities (QUANT) | soft | 67.02 | 67.49 | 67.26 | 1137 |
| Quantified substance (ME) | strict | 13.82 | 9.67 | 11.38 | 615 |
| Quantified substance (ME) | soft | 21.63 | 15.13 | 17.80 | 615 |

Note: the ME (Measured Entity) is still experimental in Grobid-quantities.

To reproduce the end-to-end evaluation, you can run the `scripts/measeval_e2e_eval.py` script (use the requirements.txt
to install the correct dependencies).

## Machine Learning Named Entities Recognition Evaluation

The scores (P: Precision, R: Recall, F1: F1-score) for all the models are performed either as 10-fold cross-validation
or using a holdout dataset.
The holdout dataset of Grobid-quantities is composed by the following examples:

- Quantities ML: 10 articles
- Units ML: [UNISCOR dataset](references.md) with around 1600 examples
- Values ML: 950 examples

For Deep learning models (BidLSTM_CRF/BidLSTM_CRF_FEATURES, BERT_CRF) models, we provide the average over 5 runs.

The models are organised as follows:

- BidLSTM_CRF is a RNN model based on (Lample et al., 2016) work, with a CRF model as activation function
- BidLSTM_CRF_FEATURES is an extension of BidLSTM_CRF that allow using layout features
- BERT_CRF is a BERT-based model obtained by fine-tuning a SciBERT encoder. Like others, the activation function is
composed by a CRF layer.

### Results from

The evaluation was performed on the holdout dataset from the grobid-quantities dataset.
Average values are computed as Micro average.
To reproduce it, see `evaluation_doc`{.interpreted-text role="ref"}.

#### Quantities

| Labels | CRF | | | | BERT_CRF | | | | Support |
|-----------------|-------|-------|-------|--------------|----------|-------|-------|--------|---------|
| Metrics | P | R | F1 | | P | R | F1 | St.dev | |
| `<unitLeft>` | 90.26 | 83.84 | 86.93 | | 93.13 | 89.96 | 91.52 | 0.0086 | 464 |
| `<unitRight>` | 36.36 | 30.77 | 33.33 | | 23.67 | 40.00 | 29.70 | 0.0139 | 13 |
| `<valueAtomic>` | 75.75 | 77.97 | 76.84 | | 85.46 | 87.99 | 86.70 | 0.0041 | 581 |
| `<valueBase>` | 80.77 | 60.00 | 68.85 | | 98.75 | 90.29 | 94.33 | 0.0163 | 35 |
| `<valueLeast>` | 76.24 | 61.11 | 67.84 | | 84.58 | 72.22 | 77.91 | 0.0212 | 126 |
| `<valueList>` | 27.27 | 11.32 | 16.00 | | 61.10 | 39.62 | 47.79 | 0.0262 | 53 |
| `<valueMost>` | 68.35 | 55.67 | 61.36 | | 78.93 | 71.75 | 75.16 | 0.0179 | 97 |
| `<valueRange>` | 91.18 | 88.57 | 89.86 | | 100.00 | 91.43 | 95.52 | 0.0000 | 35 |
| ---- |
| All (micro avg) | 79.49 | 73.72 | 76.5 | | 86.50 | 83.97 | 85.22 | 0.0031 | 1404 |

| Labels | BidLSTM_CRF | | | | | BidLSTM_CRF_FEATURES | | | | Support |
|-----------------|-------------|-------|-------|--------|--------|----------------------|-------|-------|--------|---------|
| Metrics | P | R | F1 | St.dev | | P | R | F1 | St.dev | |
| `<unitLeft>` | 87.58 | 89.96 | 88.75 | 0.0074 | | 86.95 | 89.57 | 88.24 | 0.0097 | 464 |
| `<unitRight>` | 25.01 | 30.77 | 27.50 | 0.0193 | | 23.99 | 30.77 | 26.91 | 0.0146 | 13 |
| `<valueAtomic>` | 79.52 | 85.71 | 82.49 | 0.0044 | | 78.33 | 86.57 | 82.24 | 0.0062 | 581 |
| `<valueBase>` | 83.84 | 97.14 | 89.97 | 0.0185 | | 80.99 | 97.14 | 88.32 | 0.0115 | 35 |
| `<valueLeast>` | 83.79 | 62.38 | 71.45 | 0.0294 | | 84.37 | 60.00 | 70.06 | 0.0335 | 126 |
| `<valueList>` | 80.12 | 13.58 | 23.05 | 0.0326 | | 69.29 | 14.34 | 23.37 | 0.0715 | 53 |
| `<valueMost>` | 75.91 | 70.92 | 73.22 | 0.0311 | | 75.54 | 67.01 | 70.99 | 0.0370 | 97 |
| `<valueRange>` | 92.87 | 94.86 | 93.84 | 0.0783 | | 95.58 | 97.14 | 96.35 | 0.0673 | 35 |
| ---- | | |
| All (micro avg) | 82.12 | 81.28 | 81.70 | 0.0048 | | 81.26 | 81.11 | 81.19 | 0.0090 | 1404 |

#### Units

Units were evaluated using UNISCOR dataset. For more information check the section [UNISCOR](references.md#uniscor).

| Labels | CRF | | | | BERT_CRF | | | | Support |
|-----------------|-------|-------|-------|-|----------|-------|-------|--------|---------|
| Metrics | P | R | F1 | | P | R | F1 | St.dev | |
| `<base>` | 80.64 | 82.71 | 81.66 | | 73.63 | 76.26 | 74.89 | 0.0231 | 3228 |
| `<pow>` | 71.94 | 74.34 | 73.12 | | 80.20 | 57.35 | 66.75 | 0.0752 | 1773 |
| `<prefix>` | 92.6 | 86.48 | 89.43 | | 77.61 | 88.05 | 82.12 | 0.0338 | 1287 |
| ----- |
| All (micro avg) | 80.39 | 81.12 | 80.76 | | 75.55 | 73.34 | 74.41 | 0.0178 | 6288 |


| Labels | BidLSTM_CRF | | | | | BidLSTM_CRF_FEATURES | | | | Support |
|-----------------|-------------|-------|-------|---------|-|----------------------|---------|--------|--------|---------|
| Metrics | P | R | F1 | St.dev | | P | R | F1 | St.dev | |
| `<base>` | 52.17 | 46.16 | 48.93 | 0.0494 | | 51.99 | 48.00 | 49.88 | 0.0259 | 3228 |
| `<pow>` | 94.25 | 56.89 | 70.94 | 0.0125 | | 94.20 | 56.92 | 70.96 | 0.0062 | 1773 |
| `<prefix>` | 81.36 | 82.88 | 82.01 | 0.0119 | | 82.11 | 82.94 | 82.43 | 0.0201 | 1287 |
| ----- |
| All (micro avg) | 68.12 | 56.70 | 61.85 | 0.0282 | | 67.76 | 57.67 | 62.29 | 0.0173 | 6288 |

#### Values

| Labels | CRF | | | BERT_CRF | | | | |
|-----------------|-------|-------|-------|----------|--------|--------|--------|---------|
| Metrics | P | R | F1 | P | R | F1 | St.dev | Support |
| `<alpha>` | 96.9 | 99.21 | 98.04 | 99.21 | 99.37 | 99.29 | 0.0017 | 464 |
| `<base>` | 100 | 92.31 | 96 | 100.00 | 100.00 | 100.00 | 0.0000 | 13 |
| `<number>` | 99.14 | 99.63 | 99.38 | 99.43 | 99.46 | 99.44 | 0.0005 | 581 |
| `<pow>` | 100 | 100 | 100 | 100.00 | 100.00 | 100.00 | 0.0000 | 35 |
| ----- |
| All (micro avg) | 98.86 | 99.48 | 99.17 | 99.42 | 99.46 | 99.44 | 0.0004 | 1093 |

| Labels | BidLSTM_CRF | | | | | BidLSTM_CRF_FEATURES | | | | |
|-----------------|-------------|-------|-------|--------|-------|----------------------|-------|-------|--------|---------|
| Metrics | P | R | F1 | St.dev | | P | R | F1 | St.dev | Support |
| `<alpha>` | 97.82 | 99.53 | 98.66 | 0.0035 | | 93.13 | 89.96 | 91.52 | 0.0086 | 464 |
| `<base>` | 97.78 | 67.69 | 79.46 | 0.0937 | | 23.67 | 40.00 | 29.70 | 0.0139 | 13 |
| `<number>` | 98.92 | 99.33 | 99.13 | 0.0008 | | 85.46 | 87.99 | 86.70 | 0.0041 | 581 |
| `<pow>` | 69.11 | 73.85 | 71.29 | 0.1456 | | 98.75 | 90.29 | 94.33 | 0.0163 | 35 |
| ----- | | |
| All (micro avg) | 98.34 | 98.59 | 98.47 | 0.0023 | | 86.50 | 83.97 | 85.22 | 0.0031 | 1093 |

### Other published results

> :information_source: The paper \"Automatic Identification and Normalisation of Physical Measurements in Scientific
> Literature,\" published in September 2019, reported macro averaged evaluation scores.

0 comments on commit caacf7d

Please sign in to comment.