Merge pull request #167 from lfoppiano/evaluation-measeval

Add evaluation scores with MeasEval and updates the documentation
lfoppiano · Feb 4, 2024 · 1d35623 · 1d35623
2 parents 1e10437 + dfde2d8
commit 1d35623
Show file tree

Hide file tree

Showing 10 changed files with 790 additions and 150 deletions.
diff --git a/README.md b/README.md
@@ -37,147 +37,14 @@ Spaces: https://lfoppiano-grobid-quantities.hf.space/
 
 ## Latest version
 
-The latest released version of grobid-quantities
-is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is
-0.7.4-SNAPSHOT.
+The latest released version of grobid-quantities is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is 0.7.4-SNAPSHOT.
+**Important**: to upgrade please check [here](https://grobid-quantities.readthedocs.io/gettingStarted.html#upgrade).
 
-### Update from 0.7.2 to 0.7.3
-
-#### Grobid models
-
-In version 0.7.3 we have updated the DeLFT models. The DL models must be updated by running `./gradlew copyModels`.
-
-#### JDK Update
-
-The version 0.7.3 enable the support for running with JDK > 11. We recommend to run it with JDK 17.
-Running grobid-quantities with gradle (`./gradlew clean run`) is already supported in the `build.gradle`.
-Running grobid-quantities via the JAR file requires an additional parameter to set the java.path:
-
-- Linux: `-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep`
-- Mac (arm): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
-- Mac (intel): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
-  With `MY_VIRTUAL_ENV` I use `/Users/lfoppiano/anaconda3/envs/jep`
-
-### Update from 0.7.1 to 0.7.2
-
-In version 0.7.2 we have updated the DeLFT models.   
-The DL models must be updated by running `./gradlew copyModels`.
-
-### Update from 0.7.0 to 0.7.1
-
-In version 0.7.1 a new version of DeLFT using Tensorflow 2.x is used.  
-The DL models must be updated by running `./gradlew copyModels`.
-
-### Update from 0.6.0 to 0.7.0
-
-In version 0.7.0 the models have been updated, therefore is required to run a `./gradlew copyModels` to have properly
-results especially for what concern the unit normalisation.
 
 ## Documentation
 
 You can find the latest documentation [here](http://grobid-quantities.readthedocs.io).
 
-## Evaluation
-
-The results (Precision, Recall, F-score) for all the models have been obtained using an holdout set.
-For DL models we provide the average over 5 runs.
-Update on the 27/10/2022
-
-#### Quantities
-
-| Labels          | CRF           |            |              | **BidLSTM_CRF** |            |              | **BidLSTM_CRF_FEATURES** |            |              | **BERT_CRF**  |            |              | **Support** | 
-|-----------------|---------------|------------|--------------|-----------------|------------|--------------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
-| Metrics         | **Precision** | **Recall** | **F1-Score** | **Precision**   | **Recall** | **F1-Score** | **Precision**            | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** |             |
-| `<unitLeft>`    | 88.74         | 83.19      | 85.87        | 88.56           | 92.07      | 90.28        | 88.91                    | 92.20      | 90.53        | 93.99         | 90.30      | 92.11        | 464         |
-| `<unitRight>`   | 30.77         | 30.77      | 30.77        | 24.75           | 30.77      | 27.42        | 21.73                    | 30.77      | 25.41        | 21.84         | 36.92      | 27.44        | 13          |
-| `<valueAtomic>` | 76.29         | 78.66      | 77.46        | 78.14           | 86.06      | 81.90        | 78.21                    | 86.20      | 82.01        | 84.50         | 88.19      | 86.31        | 581         |
-| `<valueBase>`   | 84.62         | 62.86      | 72.13        | 83.51           | 94.86      | 88.61        | 83.36                    | 97.14      | 89.72        | 100.00        | 90.86      | 95.20        | 35          |
-| `<valueLeast>`  | 77.68         | 69.05      | 73.11        | 82.14           | 60.63      | 69.67        | 80.73                    | 60.63      | 69.12        | 81.09         | 71.59      | 76.04        | 126         |
-| `<valueList>`   | 45.45         | 18.87      | 26.67        | 62.15           | 10.19      | 17.34        | 73.33                    | 8.68       | 15.33        | 64.12         | 43.78      | 51.64        | 53          |
-| `<valueMost>`   | 71.62         | 54.64      | 61.99        | 77.64           | 68.25      | 72.61        | 77.25                    | 70.31      | 73.58        | 81.52         | 67.42      | 73.71        | 97          |
-| `<valueRange>`  | 100.00        | 97.14      | 98.55        | 96.72           | 100.00     | 98.32        | 94.05                    | 98.86      | 96.38        | 99.39         | 91.43      | 95.24        | 35          |
-| --              |               |            |              |                 |            |              |                          |            |              |               |            |              |             | 
-| All (micro avg) | 80.08         | 75         | 77.45        | 81.81           | 81.73      | 81.76        | 81.76                    | 81.94      | 81.85        | 86.24         | 83.96      | 85.08        |             | 
-
-#### Units
-
-|                 | **CRF**       |            |              | **BidLSTM_CRF** |            |              | **BidLSTM_CRF_FEATURES** |            |              | **BERT_CRF**  |            |              | **Support** |
-|-----------------|---------------|------------|--------------|-----------------|------------|--------------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
-| Labels          | **Precision** | **Recall** | **F1-Score** | **Precision**   | **Recall** | **F1-Score** | **Precision**            | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** |             | 
-| `<base>`        | 80.57         | 82.34      | 81.45        | 56.01           | 50.34      | 53.02        | 59.98                    | 56.33      | 58.09        | 61.41         | 57.08      | 59.16        | 3228        |
-| `<pow>`         | 72.65         | 74.45      | 73.54        | 93.70           | 62.38      | 74.88        | 93.71                    | 68.40      | 78.94        | 91.24         | 64.60      | 75.60        | 1773        |
-| `<prefix>`      | 93.8          | 84.69      | 89.02        | 80.31           | 85.25      | 82.54        | 83.21                    | 83.58      | 83.35        | 82.10         | 85.30      | 83.62        | 1287        |
-| --              |               |            |              |                 |            |              |                          |            |              |               |            |              |             | 
-| All (micro avg) | 80.73         | 80.6       | 80.66        | 70.19           | 60.88      | 65.20        | 73.03                    | 65.31      | 68.94        | 73.02         | 64.97      | 68.76        |             |
-
-#### Values
-
-|                 | **CRF**       |            |              | **BidLSTM_CRF** |            |          | **BidLSTM_CRF_FEATURES** |            |              | **BERT_CRF**  |            |              | **Support** | 
-|-----------------|---------------|------------|--------------|-----------------|------------|----------|--------------------------|------------|--------------|---------------|------------|--------------|-------------|
-| Labels          | **Precision** | **Recall** | **F1-Score** | **Precision**   | **Recall** | F1-Score | **Precision**            | **Recall** | **F1-Score** | **Precision** | **Recall** | **F1-Score** |             | 
-| `<alpha>`       | 98.06         | 96.03      | 92.02        | 97.67           | 99.53      | 98.58    | 97.82                    | 99.53	     | 98.66        | 98.59         | 99.53      | 99.05        | 126         |
-| `<base>`        | 99.91         | 92.31      | 96           | 96.92           | 92.31      | 94.52    | 96.92                    | 93.85	     | 95.32        | 90.40         | 98.46      | 92.88        | 13          |
-| `<number>`      | 97.5          | 99.88      | 98.36        | 99.24           | 99.34      | 99.29    | 99.21                    | 99.38	     | 99.30        | 99.48         | 99.31      | 99.40        | 811         |
-| `<pow>`         | 100           | 100        | 100          | 92.92           | 92.31      | 92.47    | 90.28                    | 93.85	     | 91.90        | 100.00        | 100.00     | 100.00       | 13          |
-| --              |               |            |              |                 |            |          |                          |            |              |               |            |              |             | 
-| All (micro avg) | 95.79         | 99.27      | 97.5         | 98.90           | 99.17      | 99.03    | 98.86                    | 99.25	     | 99.05        | 99.13         | 99.33      | 99.23        |             |
-
-<details>
-  <summary>Previous evaluations</summary>
-
-Previous evaluation were provided using 10-fold cross-validation (with average metrics over the 10 folds).
-
-The `CRF` model was evaluated on the 30/04/2020.
-The `BidLSTM_CRF_FEATURES` model was evaluated on the 28/11/2021
-
-#### Quantities
-
-|                 | CRF           |            |              | BidLSTM_CRF_FEATURES |            |          |
-|-----------------|---------------|------------|--------------|----------------------|------------|----------|
-| Labels          | **Precision** | **Recall** | **F1-Score** | **Precision**        | **Recall** | F1-Score |
-| `<unitLeft>`    | 96.45         | 95.06      | 95.74        | 95.17                | 96.67      | 95.91    |    
-| `<unitRight>`   | 88.96         | 68.65      | 75.43        | 92.52                | 83.64      | 87.69    |    
-| `<valueAtomic>  | 85.75         | 85.35      | 85.49        | 81.74                | 89.21      | 85.30    |    
-| `<valueBase>`   | 73.06         | 66.43      | 68.92        | 100.00               | 75.00      | 85.71    |     
-| `<valueLeast>`  | 85.68         | 79.03      | 82.07        | 89.24                | 82.25      | 85.55    |    
-| `<valueList>`   | 68.38         | 53.31      | 58.94        | 75.27                | 75.33      | 75.12    |  
-| `<valueMost>`   | 83.67         | 75.82      | 79.42        | 89.02                | 81.56      | 85.10    |  
-| `<valueRange>`  | 90.25         | 88.58      | 88.86        | 100.00               | 96.25      | 97.90    |  
-| --              |               |            |              |                      |            |          |  
-| All (micro avg) | 88.96         | 85.4       | 87.14        | 87.23                | 89.00      | 88.10    |    
-
-#### Units
-
-CRF was updated the 10/02/2021
-
-|                 | CRF           |            |              | BidLSTM_CRF_FEATURES |            |          |
-|-----------------|---------------|------------|--------------|----------------------|------------|----------|
-| Labels          | **Precision** | **Recall** | **F1-Score** | **Precision**        | **Recall** | F1-Score |
-| `<base>`        | 98.82         | 99.14      | 98.98        | 98.26                | 98.52      | 98.39    |    
-| `<pow>`         | 97.62         | 98.56      | 98.08        | 100.00               | 98.57      | 99.28    |    
-| `<prefix>`      | 99.5          | 98.76      | 99.13        | 98.89                | 97.75      | 98.30    |    
-| --              |               |            |              |                      |            |          |  
-| All (micro avg) | 98.85         | 99.01      | 98.93        | 98.51                | 98.39      | 98.45    |
-
-#### Values
-
-|                 | CRF           |            |              | BidLSTM_CRF_FEATURES |            |          |
-|-----------------|---------------|------------|--------------|----------------------|------------|----------|
-| Labels          | **Precision** | **Recall** | **F1-Score** | **Precision**        | **Recall** | F1-Score |
-| `<alpha>`       | 96.9          | 98.84      | 97.85        | 99.41                | 99.55      | 99.48    |    
-| `<base>`        | 85.14         | 74.48      | 79           | 96.67                | 100.00     | 98.00    |    
-| `<number>`      | 98.07         | 99.05      | 98.55        | 99.55                | 98.68      | 99.11    |    
-| `<pow>`         | 80.05         | 76.33      | 77.54        | 72.50                | 75.00      | 73.50    |     
-| `<time>`        | 73.07         | 86.82      | 79.26        | 80.84                | 100.00     | 89.28    |
-| --              |               |            |              |                      |            |          |  
-| All (micro avg) | 96.15         | 97.95      | 97.4         | 98.49                | 98.66      | 98.57    |
-
-</details>
-
-The current average results have been calculated using micro average which provides more realistic results by giving
-different weights to labels based on their frequency.
-The [paper](https://hal.inria.fr/hal-02294424) "Automatic Identification and Normalisation of Physical Measurements in
-Scientific Literature", published in September 2019 reported average evaluation based on macro average.
 
 ## Acknowledgement
 

diff --git a/doc/conf.py b/doc/conf.py
@@ -103,7 +103,7 @@
 # -- Options for HTMLHelp output ------------------------------------------
 
 # Output file base name for HTML help builder.
-htmlhelp_basename = 'Grobid-quantitiesdoc'
+htmlhelp_basename = 'Grobid-quantities'
 
 
 # -- Options for LaTeX output ---------------------------------------------
@@ -152,7 +152,7 @@
 #  dir menu entry, description, category)
 texinfo_documents = [
     (master_doc, 'Grobid-quantities', 'Grobid-quantities Documentation',
-     author, 'Grobid-quantities', 'One line description of project.',
+     author, 'Grobid-quantities', 'GROBID extension for identifying and normalizing physical quantities.',
      'Miscellaneous'),
 ]