Skip to content

stefan-it/co-funer

Repository files navigation

CO-Funer

This repository shows how to fine-tune Flair models on the CO-Fun NER dataset.

Dataset

The Company Outsourcing in Fund Prospectuses (CO-Fun) dataset consists of 948 sentences with 5,969 named entity annotations, including 2,340 Outsourced Services, 2,024 Companies, 1,594 Locations and 11 Software annotations.

Overall, the following named entities are annotated:

  • Auslagerung (engl. outsourcing)
  • Unternehmen (engl. company)
  • Ort (engl. location)
  • Software

The CO-Fun NER dataset from the Model Hub is used for fine-tuning Flair models.

Fine-Tuning

The main fine-tuning is done in experiment.py.

Fine-tuning can be started by calling the run_experiment() method and passing a so called ExperimentConfiguration. In ExperimentConfiguration all necessary hyper-parameters and fine-tuning options are stored. The interface looks like:

class ExperimentConfiguration:
    batch_size: int
    learning_rate: float
    epoch: int
    context_size: int
    seed: int
    base_model: str
    base_model_short: str
    layers: str = "-1"
    subtoken_pooling: str = "first"
    use_crf: bool = False
    use_tensorboard: bool = True

A hyper-parameter search grid is defined in script.py. This file is used to start the fine-tuning process. Additionally, the script upload the Flair model to the Model Hub. The following environment variables should be set:

Environment Variable Description
CONFIG Should point to a configuration file in the configs folder, e.g. configs/german_dbmdz_bert_base.json
HF_TOKEN HF Access Token, which can be found here
HUB_ORG_NAME Should point to user-name or organization where the model should be uploaded to
HF_UPLOAD If this variable is set, fine-tuned Flair model won't be uploaded to the Model Hub

Hyper-Parameter Search

In this example the following hyper-parameter search grid is used:

  • Batch Sizes = [8, 16]
  • Learning Rates = [3e-05, 5e-05]
  • Seeds = [1, 2, 3, 4, 5]

This means 20 models will be fine-tuned in total (2 x 2 x 5 = 20).

Model Upload

After each model is fine-tuned, it will automatically be uploaded to the Hugging Model Hub. The following files are uploaded:

  • pytorch-model.bin: Flair internally tracks the best model as best-model.pt over all epochs. To be compatible with the Model Hub the best-model.pt, is renamed automatically to pytorch_model.bin
  • training.log: Flair stores the training log in training.log. This file is later needed to parse the best F1-score on development set
  • ./runs: In this folder the TensorBoard logs are stored. This enables a nice display of metrics on the Model Hub

Model Card

Additionally, this repository shows how to automatically generate model cards for all uploaded models. This includes also a results overview table with linked models.

The Example.ipynb notebook gives a detailed overview of all necessary steps.

Results

We perform experiment for three BERT-based German language models:

The following overview table shows the best configuration (batch size, learning rate) for each model on the development dataset:

Model Name Configuration Seed 1 Seed 2 Seed 3 Seed 4 Seed 5 Average
German BERT bs8-e10-lr5e-05 0.9346 0.9388 0.9301 0.9291 0.9346 0.9334 ± 0.0039
German DBMDZ BERT bs8-e10-lr5e-05 0.9378 0.928 0.9383 0.9374 0.9364 0.9356 ± 0.0043
GBERT bs8-e10-lr5e-05 0.9477 0.935 0.9517 0.9443 0.9342 0.9426 ± 0.0077

It can be seen, that GBERT has strongest performance and achieves an average F1-Score of 94.26% on the development set.

Now, we retrieve the F1-Score for the best configuration of each model on the test set:

Model Name Configuration Seed 1 Seed 2 Seed 3 Seed 4 Seed 5 Average
German BERT bs8-e10-lr5e-05 0.9141 0.9159 0.9121 0.9062 0.9105 0.9118 ± 0.0033
German DBMDZ BERT bs8-e10-lr5e-05 0.9134 0.9076 0.9070 0.8821 0.9091 0.9038 ± 0.0111
GBERT bs8-e10-lr5e-05 0.9180 0.9117 0.9163 0.9155 0.9110 0.9145 ± 0.0027

The GBERT model has strongest performance again. With 91.45% our result is close to the reported 92.2% (see Table 1 in the CO-Fun paper).

Release of Fine-tuned Models

All fine-tuned models for this repository are available on the Hugging Face Model Hub incl. a working inference widget that allows to perform NER:

Inference Widget

All fine-tuned models can be found here. The best models can be found in this collection.

Changelog

  • 28.03.2024: Initial version of this repository.

About

Experiments on CO-Fun NER Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published