Work in progress.
The goal of this GROBID module is to recognize in textual documents any expressions of measurements (e.g. pressure, temperature, etc.), to parse and normalization them, and finally to convert these measurements into SI units. We focus our work on technical and scientific articles (text, XML and PDF input) and patents (text and XML input).
As part of this task we support the recognition of the different value representation: numerical, alphabetical, exponential and date/time expressions.
Finally we support the identification of the "quantified" substance related to the measure, e.g. silicon nitride powder in
As the other GROBID models, the module relies only on machine learning and uses linear CRF. The normalisation is handled by the java library Units of measurement.
The latest version (and actually the first official release) of grobid-quantities is 0.6.0, on the 30/04/2020.
You can find the latest documentation here.
The results (Precision, Recall, F-score) for all the models have been obtained using 10-fold cross-validation (average metrics over the 10 folds). We also indicate the best and worst results over the 10 folds in the complete result page.
Evaluated on the 30/04/2020
|all (micro avg.)||88.96||85.4||87.14|
|all (micro avg.)||98.7||98.89||98.8|
|all (micro avg.)||96.15||97.95||97.4|
The current aveages results have been calculated using micro average which provides more realistic results by giving different weights to labels based on their frequency. The paper "Automatic Identification and Normalisation of Physical Measurements in Scientific Literature", published in September 2019 reported average evaluation based on macro average.
GROBID and grobid-quantities are distributed under Apache 2.0 license.