# Training with 19 fine-grained labels

# Preprocessing

Download the [German LER Dataset](https://github.com/elenanereiss/Legal-Entity-Recognition) splits (train, dev, test) from GitHub and save it in the `data` folder.

In [None]:
%%bash
wget https://raw.githubusercontent.com/elenanereiss/Legal-Entity-Recognition/master/data/ler_train.conll -P data
wget https://raw.githubusercontent.com/elenanereiss/Legal-Entity-Recognition/master/data/ler_dev.conll -P data
wget https://raw.githubusercontent.com/elenanereiss/Legal-Entity-Recognition/master/data/ler_test.conll -P data

We can define some variables that we need for further pre-processing steps and training the model.
You can find more models for training on 🤗 [huggingface](https://huggingface.co/models?language=de&pipeline_tag=fill-mask&sort=downloads), for example:
- **BERT multilingual** (`bert-base-multilingual-cased`, `bert-base-multilingual-uncased`)
- **BERT German** (`bert-base-german-cased`, `dbmdz/bert-base-german-uncased`, ...)
- **DistilBERT** (`distilbert-base-german-cased`, `distilbert-base-multilingual-cased`)
- **XLM-RoBERTa** (`xlm-roberta-base`, `xlm-roberta-large`, `facebook/xlm-roberta-xl`, ...)
- **ELECTRA** (`stefan-it/electra-base-gc4-64k-200000-cased-generator`, ...)
- **DeBERTa** (`microsoft/mdeberta-v3-base`)
- ...

We use [bert-base-german-cased](https://huggingface.co/bert-base-german-cased) for training.

In [None]:
%%bash
export DATA_DIR=data
export MODEL_DIR=models

export MAX_LENGTH=512
export BERT_MODEL=bert-base-german-cased

export TRAIN=$DATA_DIR/ler_train.conll
export DEV=$DATA_DIR/ler_dev.conll
export TEST=$DATA_DIR/ler_test.conll

Court decisions consists of long sentences that need to be edited. The `preprocess.py` script splits longer sentences into smaller ones (once the max. subtoken length is reached).
Run the pre-processing script on train, dev and test datasets splits. Note that the script `run_ner.py` takes the following files for training and evaluation: `train.txt`, `dev.txt`, `test.txt`. Then we collect all the labels from the splits.

In [None]:
%%bash
python3 src/preprocess.py $TRAIN $BERT_MODEL $MAX_LENGTH > $DATA_DIR/train.txt
python3 src/preprocess.py $DEV $BERT_MODEL $MAX_LENGTH > $DATA_DIR/dev.txt
python3 src/preprocess.py $TEST $BERT_MODEL $MAX_LENGTH > $DATA_DIR/test.txt
cat $DATA_DIR/ler_train.conll $DATA_DIR/ler_dev.conll $DATA_DIR/ler_test.conll | cut -d " " -f 2 | grep -v "^$"| sort | uniq >  $DATA_DIR/labels.txt

# Training
## Training with Pytorch

To see what arguments can be set, run `python3 run_ner.py --help`.

BERT is trained and evaluated on dev and test splits.

In [None]:
%%bash
python3 run_ner.py --data_dir $DATA_DIR \
    --labels $DATA_DIR/labels.txt \
    --task_type NER \
    --model_name_or_path $BERT_MODEL \
    --output_dir $MODEL_DIR/$BERT_MODEL-$LABEL_TYPE-$MAX_LENGTH \
    --max_seq_length $MAX_LENGTH \
    --num_train_epochs 3 \
    --per_gpu_train_batch_size 12 \
    --learning_rate 1e-5 \
    --save_steps 7500 \
    --seed 1 \
    --do_train \
    --do_eval \
    --do_predict

Results on the dev set:

```bash
10/30/2022 17:42:03 - INFO - __main__ - ***** Eval results *****
10/30/2022 17:42:03 - INFO - __main__ -   eval_precision = 0.9212203128016991
10/30/2022 17:42:03 - INFO - __main__ -   eval_recall = 0.9458762886597938
10/30/2022 17:42:03 - INFO - __main__ -   eval_f1 = 0.9333855032769246
```

Results on the test set:

```bash
[INFO|trainer.py:2891] 2022-10-30 17:42:14,836 >> ***** Running Prediction *****
10/30/2022 17:45:18 - INFO - __main__ -   test_precision = 0.9449558173784978
10/30/2022 17:45:18 - INFO - __main__ -   test_recall = 0.9644870349492672
10/30/2022 17:45:18 - INFO - __main__ -   test_f1 = 0.9546215361725869
```

## Training with Tensorflow

To see what arguments can be set, also run `python3 run_tf_ner.py --help`.

In [None]:
%%bash
python3 run_tf_ner.py --data_dir $DATA_DIR \
    --labels $DATA_DIR/labels.txt \
    --task_type NER \
    --model_name_or_path $BERT_MODEL \
    --output_dir $MODEL_DIR/$BERT_MODEL-$LABEL_TYPE-$MAX_LENGTH \
    --max_seq_length $MAX_LENGTH \
    --num_train_epochs 3 \
    --per_gpu_train_batch_size 12 \
    --learning_rate 1e-5 \
    --save_steps 7500 \
    --seed 1 \
    --do_train \
    --do_eval \
    --do_predict

Results on the dev set:

```
[INFO|trainer_tf.py:320] 2022-11-02 09:04:17,682 >> ***** Running Prediction *****
11/02/2022 09:05:46 - INFO - __main__ -
              precision    recall  f1-score   support

          AN       0.75      0.50      0.60        12
         EUN       0.92      0.93      0.92       116
         GRT       0.95      0.99      0.97       331
          GS       0.98      0.98      0.98      1720
         INN       0.84      0.91      0.88       199
          LD       0.95      0.95      0.95       109
         LDS       0.82      0.43      0.56        21
         LIT       0.88      0.92      0.90       231
         MRK       0.50      0.70      0.58        23
         ORG       0.64      0.71      0.67       103
         PER       0.86      0.93      0.90       186
          RR       0.97      0.98      0.97       144
          RS       0.94      0.95      0.94      1126
          ST       0.91      0.88      0.89        58
         STR       0.29      0.29      0.29         7
          UN       0.81      0.85      0.83       143
          VO       0.76      0.95      0.84        37
          VS       0.62      0.80      0.70        56
          VT       0.87      0.92      0.90       275

   micro avg       0.92      0.94      0.93      4897
   macro avg       0.80      0.82      0.80      4897
weighted avg       0.92      0.94      0.93      4897
```

Results on the test set:

```
[INFO|trainer_tf.py:320] 2022-11-02 09:11:42,672 >> ***** Running Prediction *****
11/02/2022 09:19:33 - INFO - __main__ -
              precision    recall  f1-score   support

          AN       1.00      0.89      0.94         9
         EUN       0.90      0.97      0.93       150
         GRT       0.98      0.98      0.98       321
          GS       0.98      0.99      0.98      1818
         INN       0.90      0.95      0.92       222
          LD       0.97      0.92      0.94       149
         LDS       0.91      0.45      0.61        22
         LIT       0.92      0.96      0.94       314
         MRK       0.78      0.88      0.82        32
         ORG       0.82      0.88      0.85       113
         PER       0.92      0.88      0.90       173
          RR       0.95      0.99      0.97       142
          RS       0.97      0.98      0.97      1245
          ST       0.79      0.86      0.82        64
         STR       0.75      0.80      0.77        15
          UN       0.90      0.95      0.93       108
          VO       0.80      0.83      0.81        71
          VS       0.73      0.84      0.78        64
          VT       0.93      0.97      0.95       290

   micro avg       0.94      0.96      0.95      5322
   macro avg       0.89      0.89      0.89      5322
weighted avg       0.95      0.96      0.95      5322
```

# Evaluation
The model (e.g. `bert-base-german-cased`, saved in the folder `models/bert-base-german-cased-fine-512`) can be easily evaluated (Pytorch or Tensorflow) with the following commands:

In [None]:
%%bash
python3 run_ner.py --data_dir data   --task_type NER   --model_name_or_path models/bert-base-german-cased-fine-512   --output_dir models/bert-base-german-cased-fine-512   --do_eval   --do_predict --labels data/labels.txt   --per_device_eval_batch 16   --max_seq_length 512

In [None]:
%%bash
python3 run_tf_ner.py --data_dir data   --task_type NER   --model_name_or_path models/bert-base-german-cased-fine-512   --output_dir models/bert-base-german-cased-fine-512   --do_eval   --do_predict --labels data/labels.txt   --per_device_eval_batch 16   --max_seq_length 512