# IIC-3670 NLP UC

- Versiones de librerías, python 3.8.10

- numpy 1.20.3
- flair 0.12
- allennlp 0.9.0

## Actividad en clase

Vamos a hacer fine tuning para NER tagger usando el tagger preentrenado de **flair**. Para esto haga lo siguiente:

- Importe el dataset CONLL_03_SPANISH.
- Cree el diccionario de labels a partir del corpus.
- Seleccione un pretrained BERT transformer encoder y cárguelo. Justifique su decisión. Revise la lista de modelos disponibles en https://huggingface.co/
- Use los embeddings para hacer fine tuning del tagger preentrenado de flair.
- Muestre un ejemplo de su ner funcionando.
- Cuanto termine, me avisa para entregarle una **L (logrado)**.
- Recuerde que las L otorgan un bono en la nota final de la asignatura.


***Tiene hasta el final de la clase.***

In [1]:
from flair.embeddings import TransformerWordEmbeddings
from flair.data import Sentence
import flair.datasets

corpus = flair.datasets.CONLL_03_SPANISH()
print(corpus)

2024-04-17 13:41:59,455 Reading data from /home/marcelo/.flair/datasets/conll_03_spanish
2024-04-17 13:41:59,456 Train: /home/marcelo/.flair/datasets/conll_03_spanish/esp.train
2024-04-17 13:41:59,456 Dev: /home/marcelo/.flair/datasets/conll_03_spanish/esp.testa
2024-04-17 13:41:59,456 Test: /home/marcelo/.flair/datasets/conll_03_spanish/esp.testb
Corpus: 8323 train + 1915 dev + 1517 test sentences


In [2]:
label_dict = corpus.make_label_dictionary(label_type='ner', add_unk=False)
print(label_dict)

2024-04-17 13:42:05,677 Computing label dictionary. Progress:


8323it [00:00, 35528.27it/s]

2024-04-17 13:42:05,937 Dictionary created for label 'ner' with 4 values: ORG (seen 7390 times), LOC (seen 4914 times), PER (seen 4321 times), MISC (seen 2173 times)
Dictionary with 4 tags: ORG, LOC, PER, MISC





In [3]:
embeddings = TransformerWordEmbeddings(model='xlm-roberta-base',
                                       layers="-1",
                                       layer_mean=False,
                                       subtoken_pooling="first",
                                       fine_tune=True,
                                       use_context=True,
                                       model_max_length=512,
                                       )

In [4]:
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

tagger = SequenceTagger(
    hidden_size=256,
    embeddings=embeddings,
    tag_dictionary=label_dict,
    tag_type='ner',
    tag_format="BIOES",
    use_crf=False,
    use_rnn=False,
    reproject_embeddings=False,
)


trainer = ModelTrainer(tagger, corpus)

2024-04-17 13:42:33,511 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC


In [5]:
trainer.fine_tune(
    base_path="resources/taggers/ner-xlm-roberta-base",
    train_with_dev=False,
    max_epochs=8,
    learning_rate=2.0e-5,
    mini_batch_size=4,
    shuffle=False,
)


Gathered 16792 of total 250002
Reducing vocab size by 93.2833%
Reducing model size by 64.4163%
Reducing training parameter count by 64.4163%


2024-04-17 13:42:41,959 ----------------------------------------------------------------------------------------------------
2024-04-17 13:42:41,961 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(250003, 768)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0): XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bi

2024-04-17 13:42:41,963 ----------------------------------------------------------------------------------------------------
2024-04-17 13:42:41,964 Corpus: "Corpus: 8323 train + 1915 dev + 1517 test sentences"
2024-04-17 13:42:41,965 ----------------------------------------------------------------------------------------------------
2024-04-17 13:42:41,965 Parameters:
2024-04-17 13:42:41,966  - learning_rate: "0.000020"
2024-04-17 13:42:41,966  - mini_batch_size: "4"
2024-04-17 13:42:41,967  - patience: "3"
2024-04-17 13:42:41,969  - anneal_factor: "0.5"
2024-04-17 13:42:41,970  - max_epochs: "8"
2024-04-17 13:42:41,971  - shuffle: "False"
2024-04-17 13:42:41,973  - train_with_dev: "False"
2024-04-17 13:42:41,974  - batch_growth_annealing: "False"
2024-04-17 13:42:41,975 ----------------------------------------------------------------------------------------------------
2024-04-17 13:42:41,976 Model training base path: "resources/taggers/ner-xlm-roberta-base"
2024-04-17 13:42:41,977 -

100%|██████████| 479/479 [00:12<00:00, 39.65it/s]

2024-04-17 13:46:25,958 Evaluating as a multi-label problem: False





2024-04-17 13:46:26,056 DEV : loss 0.11206331849098206 - f1-score (micro avg)  0.8252
2024-04-17 13:46:26,118 ----------------------------------------------------------------------------------------------------
2024-04-17 13:46:46,831 epoch 2 - iter 208/2081 - loss 0.11012243 - time (sec): 20.71 - samples/sec: 1290.69 - lr: 0.000019
2024-04-17 13:47:07,607 epoch 2 - iter 416/2081 - loss 0.10364780 - time (sec): 41.49 - samples/sec: 1328.06 - lr: 0.000019
2024-04-17 13:47:28,325 epoch 2 - iter 624/2081 - loss 0.10068833 - time (sec): 62.21 - samples/sec: 1217.87 - lr: 0.000019
2024-04-17 13:47:49,171 epoch 2 - iter 832/2081 - loss 0.09825677 - time (sec): 83.05 - samples/sec: 1224.19 - lr: 0.000018
2024-04-17 13:48:10,443 epoch 2 - iter 1040/2081 - loss 0.09605382 - time (sec): 104.32 - samples/sec: 1241.32 - lr: 0.000018
2024-04-17 13:48:31,669 epoch 2 - iter 1248/2081 - loss 0.10284990 - time (sec): 125.55 - samples/sec: 1258.34 - lr: 0.000018
2024-04-17 13:48:52,237 epoch 2 - iter 14

100%|██████████| 479/479 [00:15<00:00, 31.20it/s]

2024-04-17 13:50:09,281 Evaluating as a multi-label problem: False





2024-04-17 13:50:09,320 DEV : loss 0.1204131469130516 - f1-score (micro avg)  0.8823
2024-04-17 13:50:09,351 ----------------------------------------------------------------------------------------------------
2024-04-17 13:50:30,213 epoch 3 - iter 208/2081 - loss 0.08337261 - time (sec): 20.86 - samples/sec: 1281.45 - lr: 0.000016
2024-04-17 13:50:51,347 epoch 3 - iter 416/2081 - loss 0.07894309 - time (sec): 41.99 - samples/sec: 1312.02 - lr: 0.000016
2024-04-17 13:51:12,524 epoch 3 - iter 624/2081 - loss 0.07657264 - time (sec): 63.17 - samples/sec: 1199.24 - lr: 0.000016
2024-04-17 13:51:33,683 epoch 3 - iter 832/2081 - loss 0.07614358 - time (sec): 84.33 - samples/sec: 1205.61 - lr: 0.000016
2024-04-17 13:51:54,490 epoch 3 - iter 1040/2081 - loss 0.07454950 - time (sec): 105.14 - samples/sec: 1231.71 - lr: 0.000015
2024-04-17 13:52:15,695 epoch 3 - iter 1248/2081 - loss 0.08286486 - time (sec): 126.34 - samples/sec: 1250.43 - lr: 0.000015
2024-04-17 13:52:36,520 epoch 3 - iter 145

100%|██████████| 479/479 [00:14<00:00, 32.90it/s]

2024-04-17 13:53:54,417 Evaluating as a multi-label problem: False





2024-04-17 13:53:54,450 DEV : loss 0.12456652522087097 - f1-score (micro avg)  0.8861
2024-04-17 13:53:54,477 ----------------------------------------------------------------------------------------------------
2024-04-17 13:54:15,251 epoch 4 - iter 208/2081 - loss 0.07728391 - time (sec): 20.77 - samples/sec: 1286.89 - lr: 0.000014
2024-04-17 13:54:36,178 epoch 4 - iter 416/2081 - loss 0.06538405 - time (sec): 41.70 - samples/sec: 1321.29 - lr: 0.000013
2024-04-17 13:54:56,688 epoch 4 - iter 624/2081 - loss 0.06475300 - time (sec): 62.21 - samples/sec: 1217.79 - lr: 0.000013
2024-04-17 13:55:17,433 epoch 4 - iter 832/2081 - loss 0.06386773 - time (sec): 82.95 - samples/sec: 1225.61 - lr: 0.000013
2024-04-17 13:55:38,362 epoch 4 - iter 1040/2081 - loss 0.06194995 - time (sec): 103.88 - samples/sec: 1246.58 - lr: 0.000013
2024-04-17 13:55:58,926 epoch 4 - iter 1248/2081 - loss 0.06731973 - time (sec): 124.45 - samples/sec: 1269.47 - lr: 0.000012
2024-04-17 13:56:19,917 epoch 4 - iter 14

100%|██████████| 479/479 [00:14<00:00, 32.51it/s]

2024-04-17 13:57:37,950 Evaluating as a multi-label problem: False
2024-04-17 13:57:38,013 DEV : loss 0.1320403665304184 - f1-score (micro avg)  0.8966





2024-04-17 13:57:38,051 ----------------------------------------------------------------------------------------------------
2024-04-17 13:57:58,745 epoch 5 - iter 208/2081 - loss 0.05968133 - time (sec): 20.69 - samples/sec: 1291.86 - lr: 0.000011
2024-04-17 13:58:19,746 epoch 5 - iter 416/2081 - loss 0.05171991 - time (sec): 41.69 - samples/sec: 1321.49 - lr: 0.000011
2024-04-17 13:58:40,577 epoch 5 - iter 624/2081 - loss 0.04967908 - time (sec): 62.53 - samples/sec: 1211.64 - lr: 0.000010
2024-04-17 13:59:01,404 epoch 5 - iter 832/2081 - loss 0.04954458 - time (sec): 83.35 - samples/sec: 1219.78 - lr: 0.000010
2024-04-17 13:59:22,220 epoch 5 - iter 1040/2081 - loss 0.04699391 - time (sec): 104.17 - samples/sec: 1243.18 - lr: 0.000010
2024-04-17 13:59:43,344 epoch 5 - iter 1248/2081 - loss 0.04909439 - time (sec): 125.29 - samples/sec: 1260.92 - lr: 0.000009
2024-04-17 14:00:04,293 epoch 5 - iter 1456/2081 - loss 0.04715321 - time (sec): 146.24 - samples/sec: 1275.06 - lr: 0.000009
2

100%|██████████| 479/479 [00:14<00:00, 33.16it/s]

2024-04-17 14:01:21,727 Evaluating as a multi-label problem: False





2024-04-17 14:01:21,766 DEV : loss 0.13759379088878632 - f1-score (micro avg)  0.8979
2024-04-17 14:01:21,796 ----------------------------------------------------------------------------------------------------
2024-04-17 14:01:42,829 epoch 6 - iter 208/2081 - loss 0.04441864 - time (sec): 21.03 - samples/sec: 1271.01 - lr: 0.000008
2024-04-17 14:02:03,733 epoch 6 - iter 416/2081 - loss 0.03891758 - time (sec): 41.94 - samples/sec: 1313.88 - lr: 0.000008
2024-04-17 14:02:24,550 epoch 6 - iter 624/2081 - loss 0.03655828 - time (sec): 62.75 - samples/sec: 1207.24 - lr: 0.000008
2024-04-17 14:02:45,448 epoch 6 - iter 832/2081 - loss 0.03705826 - time (sec): 83.65 - samples/sec: 1215.42 - lr: 0.000007
2024-04-17 14:03:06,616 epoch 6 - iter 1040/2081 - loss 0.03533833 - time (sec): 104.82 - samples/sec: 1235.46 - lr: 0.000007
2024-04-17 14:03:27,613 epoch 6 - iter 1248/2081 - loss 0.03708164 - time (sec): 125.82 - samples/sec: 1255.67 - lr: 0.000007
2024-04-17 14:03:49,028 epoch 6 - iter 14

100%|██████████| 479/479 [00:14<00:00, 32.65it/s]

2024-04-17 14:05:06,858 Evaluating as a multi-label problem: False





2024-04-17 14:05:06,940 DEV : loss 0.14882870018482208 - f1-score (micro avg)  0.8915
2024-04-17 14:05:06,982 ----------------------------------------------------------------------------------------------------
2024-04-17 14:05:27,900 epoch 7 - iter 208/2081 - loss 0.03495998 - time (sec): 20.92 - samples/sec: 1278.05 - lr: 0.000005
2024-04-17 14:05:48,741 epoch 7 - iter 416/2081 - loss 0.03061950 - time (sec): 41.76 - samples/sec: 1319.46 - lr: 0.000005
2024-04-17 14:06:09,686 epoch 7 - iter 624/2081 - loss 0.02930267 - time (sec): 62.70 - samples/sec: 1208.22 - lr: 0.000005
2024-04-17 14:06:30,837 epoch 7 - iter 832/2081 - loss 0.02913596 - time (sec): 83.85 - samples/sec: 1212.48 - lr: 0.000004
2024-04-17 14:06:51,969 epoch 7 - iter 1040/2081 - loss 0.02776395 - time (sec): 104.99 - samples/sec: 1233.49 - lr: 0.000004
2024-04-17 14:07:13,240 epoch 7 - iter 1248/2081 - loss 0.02945623 - time (sec): 126.26 - samples/sec: 1251.29 - lr: 0.000004
2024-04-17 14:07:34,471 epoch 7 - iter 14

100%|██████████| 479/479 [00:14<00:00, 32.52it/s]


2024-04-17 14:08:52,398 Evaluating as a multi-label problem: False
2024-04-17 14:08:52,477 DEV : loss 0.16359102725982666 - f1-score (micro avg)  0.8935
2024-04-17 14:08:52,517 ----------------------------------------------------------------------------------------------------
2024-04-17 14:09:13,426 epoch 8 - iter 208/2081 - loss 0.03400685 - time (sec): 20.91 - samples/sec: 1278.56 - lr: 0.000003
2024-04-17 14:09:34,127 epoch 8 - iter 416/2081 - loss 0.02750880 - time (sec): 41.61 - samples/sec: 1324.20 - lr: 0.000002
2024-04-17 14:09:54,675 epoch 8 - iter 624/2081 - loss 0.02727541 - time (sec): 62.16 - samples/sec: 1218.81 - lr: 0.000002
2024-04-17 14:10:15,552 epoch 8 - iter 832/2081 - loss 0.02537403 - time (sec): 83.03 - samples/sec: 1224.44 - lr: 0.000002
2024-04-17 14:10:37,179 epoch 8 - iter 1040/2081 - loss 0.02372918 - time (sec): 104.66 - samples/sec: 1237.32 - lr: 0.000001
2024-04-17 14:10:58,142 epoch 8 - iter 1248/2081 - loss 0.02623760 - time (sec): 125.62 - samples/se

100%|██████████| 479/479 [00:14<00:00, 32.16it/s]


2024-04-17 14:12:36,776 Evaluating as a multi-label problem: False
2024-04-17 14:12:36,843 DEV : loss 0.16008912026882172 - f1-score (micro avg)  0.885
2024-04-17 14:12:39,027 ----------------------------------------------------------------------------------------------------
2024-04-17 14:12:39,029 Testing using last state of model ...


100%|██████████| 380/380 [00:09<00:00, 38.52it/s]

2024-04-17 14:12:48,916 Evaluating as a multi-label problem: False





2024-04-17 14:12:48,957 0.8891	0.9036	0.8963	0.8606
2024-04-17 14:12:48,958 
Results:
- F-score (micro) 0.8963
- F-score (macro) 0.8886
- Accuracy 0.8606

By class:
              precision    recall  f1-score   support

         ORG     0.8640    0.9164    0.8894      1400
         LOC     0.8930    0.8625    0.8775      1084
         PER     0.9666    0.9837    0.9751       735
        MISC     0.8160    0.8088    0.8124       340

   micro avg     0.8891    0.9036    0.8963      3559
   macro avg     0.8849    0.8929    0.8886      3559
weighted avg     0.8894    0.9036    0.8961      3559

2024-04-17 14:12:48,958 ----------------------------------------------------------------------------------------------------


{'test_score': 0.8963210702341137,
 'dev_score_history': [0.8251685393258427,
  0.8822727272727271,
  0.8860615699193457,
  0.8966461327857631,
  0.8978519195612431,
  0.8915110551036775,
  0.8934707903780069,
  0.8850245967280632],
 'train_loss_history': [0.49164654559153087,
  0.09582007639815401,
  0.08102016101342534,
  0.0659440910564966,
  0.04863045349449287,
  0.03785637262144272,
  0.030741337178146148,
  0.029474600166816564],
 'dev_loss_history': [0.11206331849098206,
  0.1204131469130516,
  0.12456652522087097,
  0.1320403665304184,
  0.13759379088878632,
  0.14882870018482208,
  0.16359102725982666,
  0.16008912026882172]}

In [6]:
# load the model you trained
model = SequenceTagger.load("resources/taggers/ner-xlm-roberta-base/final-model.pt")

# create example sentence
from flair.data import Sentence
sentence = Sentence("George Washington fue a Washington")

# predict tags and print
model.predict(sentence)

print(sentence.to_tagged_string())

2024-04-17 14:13:16,297 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC
Sentence[5]: "George Washington fue a Washington" → ["George Washington"/PER, "Washington"/LOC]
