This repository shows how to fine-tune Flair models on the CO-Fun NER dataset.
The Company Outsourcing in Fund Prospectuses (CO-Fun) dataset consists of 948 sentences with 5,969 named entity annotations, including 2,340 Outsourced Services, 2,024 Companies, 1,594 Locations and 11 Software annotations.
Overall, the following named entities are annotated:
Auslagerung
(engl. outsourcing)Unternehmen
(engl. company)Ort
(engl. location)Software
The CO-Fun NER dataset from the Model Hub is used for fine-tuning Flair models.
The main fine-tuning is done in experiment.py
.
Fine-tuning can be started by calling the run_experiment()
method and passing a so called ExperimentConfiguration
.
In ExperimentConfiguration
all necessary hyper-parameters and fine-tuning options are stored. The interface looks
like:
class ExperimentConfiguration:
batch_size: int
learning_rate: float
epoch: int
context_size: int
seed: int
base_model: str
base_model_short: str
layers: str = "-1"
subtoken_pooling: str = "first"
use_crf: bool = False
use_tensorboard: bool = True
A hyper-parameter search grid is defined in script.py
. This file is used to start the fine-tuning process.
Additionally, the script upload the Flair model to the Model Hub. The following environment variables should be set:
Environment Variable | Description |
---|---|
CONFIG |
Should point to a configuration file in the configs folder, e.g. configs/german_dbmdz_bert_base.json |
HF_TOKEN |
HF Access Token, which can be found here |
HUB_ORG_NAME |
Should point to user-name or organization where the model should be uploaded to |
HF_UPLOAD |
If this variable is set, fine-tuned Flair model won't be uploaded to the Model Hub |
In this example the following hyper-parameter search grid is used:
- Batch Sizes =
[8, 16]
- Learning Rates =
[3e-05, 5e-05]
- Seeds =
[1, 2, 3, 4, 5]
This means 20 models will be fine-tuned in total (2 x 2 x 5 = 20).
After each model is fine-tuned, it will automatically be uploaded to the Hugging Model Hub. The following files are uploaded:
pytorch-model.bin
: Flair internally tracks the best model asbest-model.pt
over all epochs. To be compatible with the Model Hub thebest-model.pt
, is renamed automatically topytorch_model.bin
training.log
: Flair stores the training log intraining.log
. This file is later needed to parse the best F1-score on development set./runs
: In this folder the TensorBoard logs are stored. This enables a nice display of metrics on the Model Hub
Additionally, this repository shows how to automatically generate model cards for all uploaded models. This includes also a results overview table with linked models.
The Example.ipynb
notebook gives a detailed overview of all necessary steps.
We perform experiment for three BERT-based German language models:
The following overview table shows the best configuration (batch size, learning rate) for each model on the development dataset:
Model Name | Configuration | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Average |
---|---|---|---|---|---|---|---|
German BERT | bs8-e10-lr5e-05 |
0.9346 | 0.9388 | 0.9301 | 0.9291 | 0.9346 | 0.9334 ± 0.0039 |
German DBMDZ BERT | bs8-e10-lr5e-05 |
0.9378 | 0.928 | 0.9383 | 0.9374 | 0.9364 | 0.9356 ± 0.0043 |
GBERT | bs8-e10-lr5e-05 |
0.9477 | 0.935 | 0.9517 | 0.9443 | 0.9342 | 0.9426 ± 0.0077 |
It can be seen, that GBERT has strongest performance and achieves an average F1-Score of 94.26% on the development set.
Now, we retrieve the F1-Score for the best configuration of each model on the test set:
Model Name | Configuration | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Average |
---|---|---|---|---|---|---|---|
German BERT | bs8-e10-lr5e-05 |
0.9141 | 0.9159 | 0.9121 | 0.9062 | 0.9105 | 0.9118 ± 0.0033 |
German DBMDZ BERT | bs8-e10-lr5e-05 |
0.9134 | 0.9076 | 0.9070 | 0.8821 | 0.9091 | 0.9038 ± 0.0111 |
GBERT | bs8-e10-lr5e-05 |
0.9180 | 0.9117 | 0.9163 | 0.9155 | 0.9110 | 0.9145 ± 0.0027 |
The GBERT model has strongest performance again. With 91.45% our result is close to the reported 92.2% (see Table 1 in the CO-Fun paper).
All fine-tuned models for this repository are available on the Hugging Face Model Hub incl. a working inference widget that allows to perform NER:
All fine-tuned models can be found here. The best models can be found in this collection.
- 28.03.2024: Initial version of this repository.