Skip to content

Commit

Permalink
Merge pull request #167 from lfoppiano/evaluation-measeval
Browse files Browse the repository at this point in the history
Add evaluation scores with MeasEval and updates the documentation
  • Loading branch information
lfoppiano committed Feb 4, 2024
2 parents 1e10437 + dfde2d8 commit 1d35623
Show file tree
Hide file tree
Showing 10 changed files with 790 additions and 150 deletions.
137 changes: 2 additions & 135 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,147 +37,14 @@ Spaces: https://lfoppiano-grobid-quantities.hf.space/

## Latest version

The latest released version of grobid-quantities
is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is
0.7.4-SNAPSHOT.
The latest released version of grobid-quantities is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is 0.7.4-SNAPSHOT.
**Important**: to upgrade please check [here](https://grobid-quantities.readthedocs.io/gettingStarted.html#upgrade).

### Update from 0.7.2 to 0.7.3

#### Grobid models

In version 0.7.3 we have updated the DeLFT models. The DL models must be updated by running `./gradlew copyModels`.

#### JDK Update

The version 0.7.3 enable the support for running with JDK > 11. We recommend to run it with JDK 17.
Running grobid-quantities with gradle (`./gradlew clean run`) is already supported in the `build.gradle`.
Running grobid-quantities via the JAR file requires an additional parameter to set the java.path:

- Linux: `-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep`
- Mac (arm): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
- Mac (intel): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
With `MY_VIRTUAL_ENV` I use `/Users/lfoppiano/anaconda3/envs/jep`

### Update from 0.7.1 to 0.7.2

In version 0.7.2 we have updated the DeLFT models.
The DL models must be updated by running `./gradlew copyModels`.

### Update from 0.7.0 to 0.7.1

In version 0.7.1 a new version of DeLFT using Tensorflow 2.x is used.
The DL models must be updated by running `./gradlew copyModels`.

### Update from 0.6.0 to 0.7.0

In version 0.7.0 the models have been updated, therefore is required to run a `./gradlew copyModels` to have properly
results especially for what concern the unit normalisation.

## Documentation

You can find the latest documentation [here](http://grobid-quantities.readthedocs.io).

## Evaluation

The results (Precision, Recall, F-score) for all the models have been obtained using an holdout set.
For DL models we provide the average over 5 runs.
Update on the 27/10/2022

#### Quantities

| Labels | CRF | | | **BidLSTM_CRF** | | | **BidLSTM_CRF_FEATURES** | | | **BERT_CRF** | | | **Support** |
|-----------------|---------------|------------|--------------|-----------------|------------|--------------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
| Metrics | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | |
| `<unitLeft>` | 88.74 | 83.19 | 85.87 | 88.56 | 92.07 | 90.28 | 88.91 | 92.20 | 90.53 | 93.99 | 90.30 | 92.11 | 464 |
| `<unitRight>` | 30.77 | 30.77 | 30.77 | 24.75 | 30.77 | 27.42 | 21.73 | 30.77 | 25.41 | 21.84 | 36.92 | 27.44 | 13 |
| `<valueAtomic>` | 76.29 | 78.66 | 77.46 | 78.14 | 86.06 | 81.90 | 78.21 | 86.20 | 82.01 | 84.50 | 88.19 | 86.31 | 581 |
| `<valueBase>` | 84.62 | 62.86 | 72.13 | 83.51 | 94.86 | 88.61 | 83.36 | 97.14 | 89.72 | 100.00 | 90.86 | 95.20 | 35 |
| `<valueLeast>` | 77.68 | 69.05 | 73.11 | 82.14 | 60.63 | 69.67 | 80.73 | 60.63 | 69.12 | 81.09 | 71.59 | 76.04 | 126 |
| `<valueList>` | 45.45 | 18.87 | 26.67 | 62.15 | 10.19 | 17.34 | 73.33 | 8.68 | 15.33 | 64.12 | 43.78 | 51.64 | 53 |
| `<valueMost>` | 71.62 | 54.64 | 61.99 | 77.64 | 68.25 | 72.61 | 77.25 | 70.31 | 73.58 | 81.52 | 67.42 | 73.71 | 97 |
| `<valueRange>` | 100.00 | 97.14 | 98.55 | 96.72 | 100.00 | 98.32 | 94.05 | 98.86 | 96.38 | 99.39 | 91.43 | 95.24 | 35 |
| -- | | | | | | | | | | | | | |
| All (micro avg) | 80.08 | 75 | 77.45 | 81.81 | 81.73 | 81.76 | 81.76 | 81.94 | 81.85 | 86.24 | 83.96 | 85.08 | |

#### Units

| | **CRF** | | | **BidLSTM_CRF** | | | **BidLSTM_CRF_FEATURES** | | | **BERT_CRF** | | | **Support** |
|-----------------|---------------|------------|--------------|-----------------|------------|--------------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
| Labels | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | |
| `<base>` | 80.57 | 82.34 | 81.45 | 56.01 | 50.34 | 53.02 | 59.98 | 56.33 | 58.09 | 61.41 | 57.08 | 59.16 | 3228 |
| `<pow>` | 72.65 | 74.45 | 73.54 | 93.70 | 62.38 | 74.88 | 93.71 | 68.40 | 78.94 | 91.24 | 64.60 | 75.60 | 1773 |
| `<prefix>` | 93.8 | 84.69 | 89.02 | 80.31 | 85.25 | 82.54 | 83.21 | 83.58 | 83.35 | 82.10 | 85.30 | 83.62 | 1287 |
| -- | | | | | | | | | | | | | |
| All (micro avg) | 80.73 | 80.6 | 80.66 | 70.19 | 60.88 | 65.20 | 73.03 | 65.31 | 68.94 | 73.02 | 64.97 | 68.76 | |

#### Values

| | **CRF** | | | **BidLSTM_CRF** | | | **BidLSTM_CRF_FEATURES** | | | **BERT_CRF** | | | **Support** |
|-----------------|---------------|------------|--------------|-----------------|------------|----------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
| Labels | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | F1-Score | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** | |
| `<alpha>` | 98.06 | 96.03 | 92.02 | 97.67 | 99.53 | 98.58 | 97.82 | 99.53 | 98.66 | 98.59 | 99.53 | 99.05 | 126 |
| `<base>` | 99.91 | 92.31 | 96 | 96.92 | 92.31 | 94.52 | 96.92 | 93.85 | 95.32 | 90.40 | 98.46 | 92.88 | 13 |
| `<number>` | 97.5 | 99.88 | 98.36 | 99.24 | 99.34 | 99.29 | 99.21 | 99.38 | 99.30 | 99.48 | 99.31 | 99.40 | 811 |
| `<pow>` | 100 | 100 | 100 | 92.92 | 92.31 | 92.47 | 90.28 | 93.85 | 91.90 | 100.00 | 100.00 | 100.00 | 13 |
| -- | | | | | | | | | | | | | |
| All (micro avg) | 95.79 | 99.27 | 97.5 | 98.90 | 99.17 | 99.03 | 98.86 | 99.25 | 99.05 | 99.13 | 99.33 | 99.23 | |

<details>
<summary>Previous evaluations</summary>

Previous evaluation were provided using 10-fold cross-validation (with average metrics over the 10 folds).

The `CRF` model was evaluated on the 30/04/2020.
The `BidLSTM_CRF_FEATURES` model was evaluated on the 28/11/2021

#### Quantities

| | CRF | | | BidLSTM_CRF_FEATURES | | |
|-----------------|---------------|------------|--------------|----------------------|------------|----------|
| Labels | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | F1-Score |
| `<unitLeft>` | 96.45 | 95.06 | 95.74 | 95.17 | 96.67 | 95.91 |
| `<unitRight>` | 88.96 | 68.65 | 75.43 | 92.52 | 83.64 | 87.69 |
| `<valueAtomic> | 85.75 | 85.35 | 85.49 | 81.74 | 89.21 | 85.30 |
| `<valueBase>` | 73.06 | 66.43 | 68.92 | 100.00 | 75.00 | 85.71 |
| `<valueLeast>` | 85.68 | 79.03 | 82.07 | 89.24 | 82.25 | 85.55 |
| `<valueList>` | 68.38 | 53.31 | 58.94 | 75.27 | 75.33 | 75.12 |
| `<valueMost>` | 83.67 | 75.82 | 79.42 | 89.02 | 81.56 | 85.10 |
| `<valueRange>` | 90.25 | 88.58 | 88.86 | 100.00 | 96.25 | 97.90 |
| -- | | | | | | |
| All (micro avg) | 88.96 | 85.4 | 87.14 | 87.23 | 89.00 | 88.10 |

#### Units

CRF was updated the 10/02/2021

| | CRF | | | BidLSTM_CRF_FEATURES | | |
|-----------------|---------------|------------|--------------|----------------------|------------|----------|
| Labels | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | F1-Score |
| `<base>` | 98.82 | 99.14 | 98.98 | 98.26 | 98.52 | 98.39 |
| `<pow>` | 97.62 | 98.56 | 98.08 | 100.00 | 98.57 | 99.28 |
| `<prefix>` | 99.5 | 98.76 | 99.13 | 98.89 | 97.75 | 98.30 |
| -- | | | | | | |
| All (micro avg) | 98.85 | 99.01 | 98.93 | 98.51 | 98.39 | 98.45 |

#### Values

| | CRF | | | BidLSTM_CRF_FEATURES | | |
|-----------------|---------------|------------|--------------|----------------------|------------|----------|
| Labels | **Precision** | **Recall** | **F1-Score** | **Precision** | **Recall** | F1-Score |
| `<alpha>` | 96.9 | 98.84 | 97.85 | 99.41 | 99.55 | 99.48 |
| `<base>` | 85.14 | 74.48 | 79 | 96.67 | 100.00 | 98.00 |
| `<number>` | 98.07 | 99.05 | 98.55 | 99.55 | 98.68 | 99.11 |
| `<pow>` | 80.05 | 76.33 | 77.54 | 72.50 | 75.00 | 73.50 |
| `<time>` | 73.07 | 86.82 | 79.26 | 80.84 | 100.00 | 89.28 |
| -- | | | | | | |
| All (micro avg) | 96.15 | 97.95 | 97.4 | 98.49 | 98.66 | 98.57 |

</details>

The current average results have been calculated using micro average which provides more realistic results by giving
different weights to labels based on their frequency.
The [paper](https://hal.inria.fr/hal-02294424) "Automatic Identification and Normalisation of Physical Measurements in
Scientific Literature", published in September 2019 reported average evaluation based on macro average.

## Acknowledgement

Expand Down
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = 'Grobid-quantitiesdoc'
htmlhelp_basename = 'Grobid-quantities'


# -- Options for LaTeX output ---------------------------------------------
Expand Down Expand Up @@ -152,7 +152,7 @@
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'Grobid-quantities', 'Grobid-quantities Documentation',
author, 'Grobid-quantities', 'One line description of project.',
author, 'Grobid-quantities', 'GROBID extension for identifying and normalizing physical quantities.',
'Miscellaneous'),
]

Expand Down

0 comments on commit 1d35623

Please sign in to comment.