# Fine tune BERT on Italian Data for Name Entity Recognition

This project aims to fine-tune pre-trained BERT models for named-entity recognition (NER) on Italian Data. 

- we fine-tune two pre-trained BERT models:
  - [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) 
  - [bert-base-italian-cased](https://huggingface.co/dbmdz/bert-base-italian-cased) 


### The dataset:
[Wikineural IT](https://github.com/Babelscape/wikineural/tree/master/data/wikineural/it) comprises 111k sentences from Wikipedia, tokenized and ner tagged. The Dataset is organized in 3 splits: train, test, and validation. The sentences are cased and contain punctuation. The **entity categories** are  encoded as illustrated below:  
```
{'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}
```
### The pre-trained models used in this project: 
  - [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) pre-trained on 104 languages with the largest Wikipedia Dataset.  
  The model evaluated on the Italian Content Annotation Bank (I-CAB) for the NER task scored an average accuracy of 84.69 ± 0.51. [Report by Schweter (2020)](https://github.com/stefan-it/italian-bertelectra)
  - [bert-base-italian-cased](https://huggingface.co/dbmdz/bert-base-italian-cased) pre-trained on Wikipedia texts and OPUS corpora for a total Corpus of the size of 13GB. On the same evaluation task of the multilingual version this model reached an average accuracy of 85.96 (± 0.23). 
Both models are case-sensitive. 

## Load the Dataset

In [None]:
! nvidia-smi

Wed Jun 29 14:02:15 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%%capture
! pip install datasets transformers seqeval
from datasets import load_dataset, load_metric, concatenate_datasets

datasets = load_dataset("Babelscape/wikineural")

In [None]:
train_dataset = datasets["train_it"]
val_dataset = datasets["val_it"]
test_dataset = datasets["test_it"]

In [None]:
datasets["train_it"]

Dataset({
    features: ['tokens', 'ner_tags', 'lang'],
    num_rows: 88400
})

In [None]:
# ner tags in the Dataset
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']
labels_vocab = {'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}
labels_vocab_reverse = {v:k for k,v in labels_vocab.items()}

## example of how data are prepared: 
- import tokenizer from the pre-trained model
- input a sentence
- output: the encoded sentence as input_ids, token_type_ids, attention mask

<a name="s1"></a> 

In [None]:
from transformers import AutoTokenizer
# import the tokenizer
mult_tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased") 
ita_tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-cased") 

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/625 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/972k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.87M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/230k [00:00<?, ?B/s]

In [None]:
import transformers
assert isinstance(mult_tokenizer, transformers.PreTrainedTokenizerFast)
assert isinstance(ita_tokenizer, transformers.PreTrainedTokenizerFast)

In [None]:
print( "multilingual bert", mult_tokenizer("la città ha un importante porto sul mar di marmara") )
print("italian bert", ita_tokenizer("la città ha un importante porto sul mar di marmara") )

multilingual bert {'input_ids': [101, 10109, 12870, 10228, 10119, 12596, 36084, 12037, 12318, 10120, 12318, 41244, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
italian bert {'input_ids': [102, 146, 984, 278, 141, 1605, 3446, 340, 882, 120, 882, 17523, 103], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [None]:
print( "multilingual bert", mult_tokenizer(["la","città","ha","un","importante", "porto","sul", "mar","di","marmara"], is_split_into_words=True) )
print("italian bert", ita_tokenizer(["la","città","ha","un","importante", "porto","sul", "mar","di","marmara"], is_split_into_words=True) )

multilingual bert {'input_ids': [101, 10109, 12870, 10228, 10119, 12596, 36084, 12037, 12318, 10120, 12318, 41244, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
italian bert {'input_ids': [102, 146, 984, 278, 141, 1605, 3446, 340, 882, 120, 882, 17523, 103], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [None]:
example = datasets["train_it"][5]
tokenized_input = mult_tokenizer(example["tokens"], is_split_into_words=True)
tokens = mult_tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])

print("multilingual bert tokenizer: \n", example["tokens"])
print(tokens)
print(tokenized_input.word_ids())

word_ids = tokenized_input.word_ids()
aligned_labels = [-100 if i is None else example["ner_tags"][i] for i in word_ids] 
print(len(aligned_labels), len(tokenized_input["input_ids"]))    

multilingual bert tokenizer: 
 ['Fondatore', 'del', 'diritto', 'commerciale', 'è', 'considerato', 'il', 'giurista', 'cinquecentesco', 'anconitano', 'Benvenuto', 'Stracca', '.']
['[CLS]', 'Fonda', '##tore', 'del', 'diritto', 'commerciale', 'è', 'considerato', 'il', 'gi', '##uris', '##ta', 'cinque', '##centes', '##co', 'an', '##con', '##itano', 'Ben', '##venu', '##to', 'St', '##rac', '##ca', '.', '[SEP]']
[None, 0, 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, None]
26 26


In [None]:
example = datasets["train_it"][5]
tokenized_input = ita_tokenizer(example["tokens"], is_split_into_words=True)
tokens = ita_tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])

print("italian bert tokenizer: \n", example["tokens"])
print(tokens)
print(tokenized_input.word_ids())

word_ids = tokenized_input.word_ids()
aligned_labels = [-100 if i is None else example["ner_tags"][i] for i in word_ids]
print(len(aligned_labels), len(tokenized_input["input_ids"]))

italian bert tokenizer: 
 ['Fondatore', 'del', 'diritto', 'commerciale', 'è', 'considerato', 'il', 'giurista', 'cinquecentesco', 'anconitano', 'Benvenuto', 'Stracca', '.']
['[CLS]', 'Fonda', '##tore', 'del', 'diritto', 'commerciale', 'è', 'considerato', 'il', 'giuris', '##ta', 'cinque', '##centesco', 'an', '##coni', '##tano', 'Benvenuto', 'Stra', '##cca', '.', '[SEP]']
[None, 0, 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 8, 9, 9, 9, 10, 11, 11, 12, None]
21 21


BERT WordPiece tokenization:  
The word piecies are converted into their corresponding IDs, and special tokens are added. 

Notice the difference of how the two models tokenize the sentence: 26 tokens (multilingual) vs 21 tokens (italian).

ex. word : cinquecentesco
- multilingual: cinque', '##centes', '##co',
- italian: 'cinque', '##centesco',

During data preprocessing we align labels and token_ids: set special tokens labels to - 100 (index ignored by PyTorch), the labels of all other tokens are set to the label of the word they come from.

## Data Preparation

prepare the data to fine-tune the pre-trained model

In [None]:
from transformers import BertTokenizer, BertModel, AutoTokenizer, AutoModel
import torch 
from transformers import pipeline
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-cased") 

Downloading:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/230k [00:00<?, ?B/s]

In [None]:
label_all_tokens = False

def tokenize_and_align_labels(examples, tokenizer= ita_tokenizer):  #set the tokenizer
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)

    labels = []
    for i, label in enumerate(examples["ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special tokens have a word id that is None. We set the label to -100 so they are automatically
            # ignored in the loss function.
            if word_idx is None:
                label_ids.append(-100)
            # We set the label for the first token of each word.
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            # For the other tokens in a word, we set the label to either the current label or -100, depending on
            # the label_all_tokens flag.
            else:
                label_ids.append(label[word_idx] if label_all_tokens else -100)
            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

In [None]:
tokenize_and_align_labels(datasets['train_it'][:3])

{'input_ids': [[102, 2141, 2052, 5397, 179, 4393, 8035, 141, 1553, 5241, 30561, 3827, 111, 1953, 3251, 593, 162, 17852, 1307, 417, 175, 6420, 1307, 21079, 207, 955, 677, 13741, 162, 5272, 298, 1553, 202, 179, 202, 158, 24645, 334, 2299, 3606, 9962, 146, 7959, 157, 284, 15331, 285, 1023, 163, 284, 697, 103], [102, 329, 1912, 14323, 212, 529, 710, 482, 14930, 2251, 1307, 1105, 134, 4913, 136, 1953, 13270, 4701, 29736, 30876, 126, 136, 23224, 30876, 1365, 5106, 116, 273, 28254, 697, 103], [102, 654, 288, 1030, 141, 1835, 1259, 120, 4685, 126, 120, 11279, 3785, 3035, 1307, 9388, 273, 13090, 2987, 483, 8356, 348, 30895, 208, 697, 103]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attent

In [None]:
train_tokenized = train_dataset.map(tokenize_and_align_labels, batched=True)
val_tokenized = val_dataset.map(tokenize_and_align_labels, batched=True)
test_tokenized = test_dataset.map(tokenize_and_align_labels, batched=True)



  0%|          | 0/89 [00:00<?, ?ba/s]

  0%|          | 0/12 [00:00<?, ?ba/s]

  0%|          | 0/12 [00:00<?, ?ba/s]

## Load the model 
the same procedure is applied to fine-tune both models, just change the model directory.

In [None]:
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
"""import the model to fine_tune
   specify the number of labels"""

#model = AutoModelForTokenClassification.from_pretrained("bert-base-multilingual-cased", num_labels=len(label_list), label2id=labels_vocab, id2label=labels_vocab_reverse)
model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-base-italian-cased", num_labels=len(label_list), label2id=labels_vocab, id2label=labels_vocab_reverse)


Downloading:   0%|          | 0.00/422M [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-base-italian-cased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at

## Training Arguments: 
the hyperparameters are:
- learning rate: 2e-5
- batch size: 16 (as advised by the official documentation), 
- number of epochs: we tried with 1 and 2 

In [None]:
 """set here the model of choice"""
#model_name = "bert-base-multilingual-cased"
model_name = "dbmdz/bert-base-italian-cased"

args = TrainingArguments(
    "/content/drive/MyDrive/bert-NER-base-ita",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    push_to_hub=False,
    #eval_steps=10000,
    #save_steps=10000,
    save_strategy= "epoch",
)

## Import the Tokenizer 
the model's tokenizer was imported [here](#s1)

In [None]:
"""set the tokenizer according the model of choice """
# tokenizer = multi_tokenizer
tokenizer = ita_tokenizer

## Data collator
This object batches the processed samples together, and apply padding (to obtain the same size)

In [None]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer)

## Load the metric [seqeval](https://huggingface.co/spaces/evaluate-metric/seqeval)

In [None]:
metric = load_metric("seqeval")

labels = [label_list[i] for i in example["ner_tags"]]
metric.compute(predictions=[labels], references=[labels])

Downloading builder script:   0%|          | 0.00/2.47k [00:00<?, ?B/s]

{'PER': {'f1': 1.0, 'number': 1, 'precision': 1.0, 'recall': 1.0},
 'overall_accuracy': 1.0,
 'overall_f1': 1.0,
 'overall_precision': 1.0,
 'overall_recall': 1.0}

this function will post-process the output and measure the accuracy. 
- convert the predicted index of each token to a string of labels. 
- pass to the metric the list of predicted labels and the ground truth. The output of the metric is: 
  - accuracy 
  - precision 
  - recall
  - f1 score (harmonic mean of the precision and recall)



In [None]:
import numpy as np

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

## Trainer

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=train_tokenized,
    eval_dataset=test_tokenized,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

## Start training 

(if the training stops you can restarted from the last checkpoint, uncomment the line)

In [None]:
trainer.train()
#trainer.train(resume_from_checkpoint = True)

Loading model from /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-5525.
The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: tokens, ner_tags, lang. If tokens, ner_tags, lang are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 88400
  Num Epochs = 2
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 11050
  Continuing training from checkpoint, will skip to saved global_step
  Continuing training from epoch 1
  Continuing training from global step 5525
  Will skip the first 1 epochs then the first 0 batches in the first epoch. If this takes a lot of time, you can add the `--ignore_data_skip` flag to your launch command, but you will resume the training

0it [00:00, ?it/s]

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
2,0.014,0.018308,0.953666,0.956093,0.954878,0.994621


The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: tokens, ner_tags, lang. If tokens, ner_tags, lang are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 11069
  Batch size = 16
Saving model checkpoint to /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-11050
Configuration saved in /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-11050/config.json
Model weights saved in /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-11050/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-11050/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/MACHINE_LEARNING/SoSe_project/wikineural-ita-ner2/checkpoint-11050/speci

TrainOutput(global_step=11050, training_loss=0.007097534761169917, metrics={'train_runtime': 1379.1515, 'train_samples_per_second': 128.195, 'train_steps_per_second': 8.012, 'total_flos': 6544852744292352.0, 'train_loss': 0.007097534761169917, 'epoch': 2.0})

## Evaluation
- test the model on the test set
- compute precision, recall, f1 for each category

In [None]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: tokens, ner_tags, lang. If tokens, ner_tags, lang are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 11069
  Batch size = 16


{'epoch': 2.0,
 'eval_accuracy': 0.9946209425488555,
 'eval_f1': 0.9548779953563762,
 'eval_loss': 0.018307985737919807,
 'eval_precision': 0.9536664333216661,
 'eval_recall': 0.9560926397052373,
 'eval_runtime': 64.0628,
 'eval_samples_per_second': 172.784,
 'eval_steps_per_second': 10.802}

In [None]:
predictions, labels, _ = trainer.predict(test_tokenized)
predictions = np.argmax(predictions, axis=2)

# Remove ignored index (special tokens)
true_predictions = [
    [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
true_labels = [
    [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]

results = metric.compute(predictions=true_predictions, references=true_labels)
results

The following columns in the test set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: tokens, ner_tags, lang. If tokens, ner_tags, lang are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 11069
  Batch size = 16


{'LOC': {'f1': 0.9661560384752405,
  'number': 9788,
  'precision': 0.9625798600547611,
  'recall': 0.9697588884348182},
 'MISC': {'f1': 0.8500635324015247,
  'number': 2396,
  'precision': 0.8628546861564919,
  'recall': 0.8376460767946577},
 'ORG': {'f1': 0.9300265721877768,
  'number': 2229,
  'precision': 0.9182334936598163,
  'recall': 0.9421265141318977},
 'PER': {'f1': 0.9778732033160374,
  'number': 8385,
  'precision': 0.9780481985206395,
  'recall': 0.9776982707215265},
 'overall_accuracy': 0.9946209425488555,
 'overall_f1': 0.9548779953563762,
 'overall_precision': 0.9536664333216661,
 'overall_recall': 0.9560926397052373}

# Results in training on the test set:

- Bert-base-multilingual-cased fine-tune for Italian NER


```
{'LOC': {'f1': 0.9706346378950583,
  'number': 9788,
  'precision': 0.9670418821620526,
  'recall': 0.9742541888026155},

 'MISC': {'f1': 0.8807106598984771,
  'number': 2396,
  'precision': 0.8927958833619211,
  'recall': 0.8689482470784641},

 'ORG': {'f1': 0.947860962566845,
  'number': 2229,
  'precision': 0.9415670650730412,
  'recall': 0.9542395693135935},

 'PER': {'f1': 0.9809654513992482,
  'number': 8385,
  'precision': 0.9816097444470981,
  'recall': 0.9803220035778175},

 'overall_accuracy': 0.9956143433014125,
 'overall_f1': 0.9628704190776785,
 'overall_precision': 0.9622798563042145,
 'overall_recall': 0.9634617071672954}
```


- Bert-base-italian-cased fine-tune for Italian NER

```
  {'LOC': {'f1': 0.9661560384752405,
  'number': 9788,
  'precision': 0.9625798600547611,
  'recall': 0.9697588884348182},
 'MISC': {'f1': 0.8500635324015247,
  'number': 2396,
  'precision': 0.8628546861564919,
  'recall': 0.8376460767946577},
 'ORG': {'f1': 0.9300265721877768,
  'number': 2229,
  'precision': 0.9182334936598163,
  'recall': 0.9421265141318977},
 'PER': {'f1': 0.9778732033160374,
  'number': 8385,
  'precision': 0.9780481985206395,
  'recall': 0.9776982707215265},

  'overall_accuracy': 0.9946209425488555,
  'overall_f1': 0.9548779953563762,
  'overall_precision': 0.9536664333216661,
  'overall_recall': 0.9560926397052373}
```

