# <span style="font-family:Courier New; color:#CCCCCC">**Named Entity Recognition CRF**</span>

## <span style="font-family:Courier New; color:#336666">**Load Data and Imports**</span>

In [16]:
from preprocessing import convert_BIO
from NER_evaluation import *
from feature_getter import Feature_getter
import pycrfsuite
from collections import Counter
import pandas as pd

import nltk
nltk.download('conll2002')
from nltk.corpus import conll2002

#Dutch Data
ned_train = conll2002.iob_sents('ned.train')
ned_test = conll2002.iob_sents('ned.testb')

#Spanish Data
esp_train = conll2002.iob_sents('esp.train')
esp_test = conll2002.iob_sents('esp.testb')

[nltk_data] Downloading package conll2002 to
[nltk_data]     C:\Users\Jordi\AppData\Roaming\nltk_data...
[nltk_data]   Package conll2002 is already up-to-date!


## <span style="font-family:Courier New; color:#336666">**Preprocessing Data**</span>

In [17]:
#Dutch
ned_train_BIO = convert_BIO(ned_train)
ned_test_BIO = convert_BIO(ned_test)

X_ned_test_BIO = [[word[0] for word in sent] for sent in ned_test_BIO]
y_ned_test_BIO = [[word[1] for word in sent] for sent in ned_test_BIO]

In [18]:
#Spanish
esp_train_BIO = convert_BIO(esp_train)
esp_test_BIO = convert_BIO(esp_test)

X_esp_test_BIO = [[word[0] for word in sent] for sent in esp_test_BIO]
y_esp_test_BIO = [[word[1] for word in sent] for sent in esp_test_BIO]

## <span style="font-family:Courier New; color:#336666">**Train Classifier**</span>

In [19]:
#Summary avaluation tables
results_esp = pd.DataFrame()
results_ned = pd.DataFrame()
def save_ent_results(nclf, results, results_agg_ent, df):
    df.loc[nclf,'total acc'] = results["precision"]
    df.loc[nclf,'total recall'] = results["recall"]
    df.loc[nclf,'total F1'] = results["F1-score"]
    df.loc[nclf,'PER F1'] = results_agg_ent["PER"]["F1-score"]
    df.loc[nclf,'ORG F1'] = results_agg_ent["ORG"]["F1-score"]
    df.loc[nclf,'LOC F1'] = results_agg_ent["LOC"]["F1-score"]
    df.loc[nclf,'MISC F1'] = results_agg_ent["MISC"]["F1-score"]
    return df

### <span style="font-family:Courier New; color:#336633">**Dutch Classifier**</span>

<span style="font-family:Courier New">Hyper&feature_opt-ned notebook suggests a model with our customed Feature Getter (not considering Next token features), for which best hyperparamaters are: {'c1': 0.01, 'c2': 0.1, 'max_iterations': 200, 'possible_transitions': True, 'possible_states': True, 'min_freq' = 0}. </span>

In [20]:
default_hyperparams = {'c1': 0.01, 'c2': 0.1, 'max_iterations': 50, 'feature.possible_transitions': True,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func = Feature_getter(language='ned', next_tok=False), training_opt = default_hyperparams)
model.train(ned_train_BIO, 'models/model.crf.tagger')

#### <span style="font-family:Courier New; color:#994C00">**Evalutation**</span>

In [21]:
pred_ned_BIO = model.tag_sents(X_ned_test_BIO)
y_pred_BIO = [[word[1] for word in sent] for sent in pred_ned_BIO]

print(bio_classification_report(y_ned_test_BIO, y_pred_BIO))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(ned_test_BIO, pred_ned_BIO)
results_ned = save_ent_results("Dutch_BIO", results, results_agg_ent, results_ned)
results_ned

              precision    recall  f1-score   support

       B-LOC       0.86      0.82      0.84       774
       I-LOC       0.59      0.53      0.56        49
      B-MISC       0.86      0.76      0.81      1187
      I-MISC       0.61      0.45      0.52       410
       B-ORG       0.80      0.70      0.75       882
       I-ORG       0.81      0.63      0.71       551
       B-PER       0.78      0.89      0.83      1098
       I-PER       0.86      0.96      0.91       807

   micro avg       0.81      0.77      0.79      5758
   macro avg       0.77      0.72      0.74      5758
weighted avg       0.81      0.77      0.79      5758
 samples avg       0.06      0.06      0.06      5758

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Dutch_BIO,0.808,0.783,0.795,0.776,0.758,0.858,0.802


#### <span style="font-family:Courier New; color:#994C00">**Feature Importance**</span>

In [22]:
def print_state_features(state_features):
        for (attr, label), weight in state_features:
            string = "%0.3f %-6s %s" % (weight, label, attr)
            print(string, end = " "*(40 - len(string)))

def feature_importance(model):

    info = model._tagger.info()
    positive_features = Counter(info.state_features).most_common(10)
    negative_features = Counter(info.state_features).most_common()[-10:]

    print("Top positive:                       |     Top negative:")
    print("-----------------------------------------------------------------------------")

    for positive, negative in zip(positive_features, negative_features):
        print_state_features([positive])
        print_state_features([negative])
        print()
feature_importance(model)

Top positive:                       |     Top negative:
-----------------------------------------------------------------------------
5.001 O      PUNCTUATION                -1.972 O      SUF_our                   
4.259 O      SHAPE_xxxx                 -1.976 O      SHAPE_XXX-xxxx            
3.779 O      POS_PUNCT                  -2.003 B-MISC SHAPE_xxxx                
3.122 B-MISC SHAPE_XXX-xxxx             -2.019 O      WORD_groenen              
3.001 O      POS_ADV                    -2.088 O      WORD_Allen                
2.954 O      HAS_NUM                    -2.223 B-PER  HAS_NUM                   
2.891 O      SHAPE_xxxx-xxxx            -2.299 B-PER  SHAPE_xxxx                
2.814 O      SHAPE_xxx                  -2.458 I-MISC -1_SUF_se                 
2.755 O      WORD_.                     -2.836 B-ORG  SHAPE_xxxx                
2.738 B-MISC WORD_groenen               -2.985 B-LOC  SHAPE_xxxx                


### <span style="font-family:Courier New; color:#336633">**Spanish Classifier**</span>

<span style="font-family:Courier New">Hyper&feature_opt-esp notebook suggests a model   with our customed Feature Getter consideting all features, for which best hyperparamaters are: {'c1': 0.01, 'c2': 0.1, 'max_iterations': 200, 'possible_transitions': False, 'possible_states': True, 'min_freq' = 0}.

In [23]:
customed_hyperparams = {'c1': 0.01, 'c2': 1, 'max_iterations': 200, 'feature.possible_transitions': False,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func=Feature_getter(), training_opt = customed_hyperparams)
model.train(esp_train_BIO, 'models/model.crf.tagger')

#### <span style="font-family:Courier New; color:#994C00">**Evalutation**</span>

In [24]:
pred_esp_BIO = model.tag_sents(X_esp_test_BIO)
y_pred_BIO = [[word[1] for word in sent] for sent in pred_esp_BIO]

print(bio_classification_report(y_esp_test_BIO, y_pred_BIO))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(esp_test_BIO, pred_esp_BIO)
results_esp = save_ent_results("Spanish_BIO", results, results_agg_ent, results_esp)
results_esp

              precision    recall  f1-score   support

       B-LOC       0.80      0.79      0.80      1084
       I-LOC       0.68      0.66      0.67       325
      B-MISC       0.69      0.53      0.60       339
      I-MISC       0.67      0.54      0.59       557
       B-ORG       0.81      0.84      0.83      1400
       I-ORG       0.83      0.80      0.82      1104
       B-PER       0.85      0.89      0.87       735
       I-PER       0.89      0.95      0.92       634

   micro avg       0.81      0.79      0.80      6178
   macro avg       0.78      0.75      0.76      6178
weighted avg       0.80      0.79      0.79      6178
 samples avg       0.09      0.09      0.09      6178

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Spanish_BIO,0.792,0.785,0.788,0.852,0.786,0.791,0.619


#### <span style="font-family:Courier New; color:#994C00">**Feature Importance**</span>

In [25]:
feature_importance(model)

Top positive:                       |     Top negative:
-----------------------------------------------------------------------------
3.477 O      WORD_.                     -1.290 O      -1_SHAPE_xxxx             
3.172 O      PUNCTUATION                -1.293 B-PER  SUF_os                    
2.848 I-MISC -1_WORD_.                  -1.354 I-PER  PUNCTUATION               
2.570 O      SHAPE_xxxx                 -1.358 B-ORG  -1_WORD_.                 
2.357 I-ORG  -1_WORD_.                  -1.386 B-PER  -1_WORD_.                 
2.197 O      HAS_NUM                    -1.577 I-MISC WORD_.                    
2.157 O      +1_CAPITALIZATION          -1.763 O      POS_PROPN                 
1.851 O      POS_ADV                    -1.867 O      -1_WORD_.                 
1.795 B-MISC CAPITALIZATION             -1.943 I-ORG  PUNCTUATION               
1.741 I-PER  -1_WORD_.                  -4.367 O      CAPITALIZATION            


<div class="alert alert-block alert-info">
<b>See:</b> We can see how both models have ≈ 0.8 F1. Tables show that dutch model performs a little better on the test set. On feature selection, we can see that dutch model relies heavily on the shape feature we have included. On the other side, we can see how spanish model relies in punctuations.
</div>

## <span style="font-family:Courier New; color:#336666">**Changing Tagger Format**</span>

<span style="font-family:Courier New">At this point, lets check whether changing the codification of entities has a postive impact on performance. </span>

### <span style="font-family:Courier New; color:#336633">**Dutch Classifier**</span>

#### <span style="font-family:Courier New; color:#994C00">**IO**</span>

In [10]:
ned_train_IO = convert_BIO(ned_train, begin = False)
ned_test_IO = convert_BIO(ned_test, begin = False)

X_ned_test_IO = [[word[0] for word in sent] for sent in ned_test_IO]
y_ned_test_IO = [[word[1] for word in sent] for sent in ned_test_IO]

In [13]:
default_hyperparams = {'c1': 0.01, 'c2': 0.1, 'max_iterations': 50, 'feature.possible_transitions': True,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func = Feature_getter(language='ned', next_tok=False), training_opt = default_hyperparams)
model.train(ned_train_IO, 'models/model.crf.tagger')

In [14]:
pred_ned_IO = model.tag_sents(X_ned_test_IO)
y_pred_IO = [[word[1] for word in sent] for sent in pred_ned_IO]

print(bio_classification_report(y_ned_test_IO, y_pred_IO))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(ned_test_IO, pred_ned_IO)
results_ned = save_ent_results("Dutch_IO", results, results_agg_ent, results_ned)
results_ned

              precision    recall  f1-score   support

       I-LOC       0.80      0.81      0.80       823
      I-MISC       0.77      0.66      0.71      1597
       I-ORG       0.81      0.66      0.73      1433
       I-PER       0.82      0.92      0.87      1905

   micro avg       0.80      0.77      0.78      5758
   macro avg       0.80      0.76      0.78      5758
weighted avg       0.80      0.77      0.78      5758
 samples avg       0.06      0.06      0.06      5758

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Dutch_BIO,0.81,0.786,0.798,0.774,0.762,0.866,0.807
Dutch_IO,0.784,0.75,0.766,0.764,0.727,0.83,0.752


#### <span style="font-family:Courier New; color:#994C00">**BIOS**</span>

In [15]:
ned_train_BIOS = convert_BIO(ned_train, begin = True, single = True)
ned_test_BIOS = convert_BIO(ned_test, begin = True, single = True)

X_ned_test_BIOS = [[word[0] for word in sent] for sent in ned_test_BIOS]
y_ned_test_BIOS = [[word[1] for word in sent] for sent in ned_test_BIOS]

In [16]:
default_hyperparams = {'c1': 0.01, 'c2': 0.1, 'max_iterations': 50, 'feature.possible_transitions': True,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func = Feature_getter(language='ned', next_tok=False), training_opt = default_hyperparams)
model.train(ned_train_BIOS, 'models/model.crf.tagger')

In [17]:
pred_ned_BIOS = model.tag_sents(X_ned_test_BIOS)
y_pred_BIOS = [[word[1] for word in sent] for sent in pred_ned_BIOS]

print(bio_classification_report(y_ned_test_BIOS, y_pred_BIOS))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(ned_test_BIOS, pred_ned_BIOS)
results_ned = save_ent_results("Dutch_BIOS", results, results_agg_ent, results_ned)
results_ned

              precision    recall  f1-score   support

       B-LOC       0.68      0.43      0.53        60
       I-LOC       0.67      0.53      0.59        49
       S-LOC       0.89      0.85      0.87       714
      B-MISC       0.75      0.52      0.61       372
      I-MISC       0.61      0.45      0.52       410
      S-MISC       0.86      0.81      0.84       815
       B-ORG       0.79      0.70      0.74       430
       I-ORG       0.80      0.63      0.71       551
       S-ORG       0.75      0.67      0.71       452
       B-PER       0.85      0.93      0.89       708
       I-PER       0.86      0.96      0.91       807
       S-PER       0.65      0.81      0.72       390

   micro avg       0.80      0.77      0.78      5758
   macro avg       0.76      0.69      0.72      5758
weighted avg       0.80      0.77      0.78      5758
 samples avg       0.06      0.06      0.06      5758

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Dutch_BIO,0.81,0.786,0.798,0.774,0.762,0.866,0.807
Dutch_IO,0.784,0.75,0.766,0.764,0.727,0.83,0.752
Dutch_BIOS,0.807,0.781,0.794,0.772,0.747,0.876,0.8


#### <span style="font-family:Courier New; color:#994C00">**BIOES**</span>

In [18]:
ned_train_BIOES = convert_BIO(ned_train, begin = True, single = True, end = True)
ned_test_BIOES = convert_BIO(ned_test, begin = True, single = True, end = True)

X_ned_test_BIOES = [[word[0] for word in sent] for sent in ned_test_BIOES]
y_ned_test_BIOES = [[word[1] for word in sent] for sent in ned_test_BIOES]

In [19]:
default_hyperparams = {'c1': 0.01, 'c2': 0.1, 'max_iterations': 50, 'feature.possible_transitions': True,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func = Feature_getter(language='ned', next_tok=False), training_opt = default_hyperparams)
model.train(ned_train_BIOES, 'models/model.crf.tagger')

In [20]:
pred_ned_BIOES = model.tag_sents(X_ned_test_BIOES)
y_pred_BIOES = [[word[1] for word in sent] for sent in pred_ned_BIOES]

print(bio_classification_report(y_ned_test_BIOES, y_pred_BIOES))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(ned_test_BIOES, pred_ned_BIOES)
results_ned = save_ent_results("Dutch_BIOES", results, results_agg_ent, results_ned)
results_ned

              precision    recall  f1-score   support

       B-LOC       0.69      0.45      0.55        60
       E-LOC       0.69      0.57      0.62        42
       I-LOC       0.33      0.14      0.20         7
       S-LOC       0.88      0.86      0.87       714
      B-MISC       0.74      0.51      0.60       372
      E-MISC       0.68      0.47      0.55       260
      I-MISC       0.46      0.35      0.39       150
      S-MISC       0.87      0.81      0.84       815
       B-ORG       0.78      0.70      0.74       430
       E-ORG       0.80      0.73      0.76       399
       I-ORG       0.62      0.30      0.40       152
       S-ORG       0.74      0.68      0.71       452
       B-PER       0.86      0.94      0.90       708
       E-PER       0.86      0.96      0.91       690
       I-PER       0.87      0.89      0.88       117
       S-PER       0.66      0.81      0.72       390

   micro avg       0.80      0.76      0.78      5758
   macro avg       0.72   

Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Dutch_BIO,0.81,0.786,0.798,0.774,0.762,0.866,0.807
Dutch_IO,0.784,0.75,0.766,0.764,0.727,0.83,0.752
Dutch_BIOS,0.807,0.781,0.794,0.772,0.747,0.876,0.8
Dutch_BIOES,0.809,0.784,0.796,0.782,0.742,0.874,0.802


<span style="font-family:Courier New">As we can see, the codification that works best is BIO, with the higher F1-score. On the other side, we find that IO is the worst, surely because of its lack of information. What is sure is that adding 'Single' label turns out in good models. This points us that there are plenty of single token entities. Further actions could be taken with this type of models, but we will continue in the same line, with the BIO model.</span>

### <span style="font-family:Courier New; color:#336633">**Spanish Classifier**</span>

#### <span style="font-family:Courier New; color:#994C00">**IO**</span>

In [27]:
esp_train_IO = convert_BIO(esp_train, begin = False)
esp_test_IO = convert_BIO(esp_test, begin = False)

X_esp_test_IO = [[word[0] for word in sent] for sent in esp_test_IO]
y_esp_test_IO = [[word[1] for word in sent] for sent in esp_test_IO]

In [28]:
customed_hyperparams = {'c1': 0.01, 'c2': 1, 'max_iterations': 200, 'feature.possible_transitions': False,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func=Feature_getter(), training_opt = customed_hyperparams)
model.train(esp_train_IO, 'models/model.crf.tagger')

In [29]:
pred_esp_IO = model.tag_sents(X_esp_test_IO)
y_pred_IO = [[word[1] for word in sent] for sent in pred_esp_IO]

print(bio_classification_report(y_esp_test_IO, y_pred_IO))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(esp_test_IO, pred_esp_IO)
results_esp = save_ent_results("Spanish_IO", results, results_agg_ent, results_esp)
results_esp

              precision    recall  f1-score   support

       I-LOC       0.79      0.76      0.77      1409
      I-MISC       0.67      0.55      0.61       896
       I-ORG       0.82      0.82      0.82      2504
       I-PER       0.85      0.92      0.88      1369

   micro avg       0.80      0.79      0.80      6178
   macro avg       0.78      0.76      0.77      6178
weighted avg       0.80      0.79      0.79      6178
 samples avg       0.09      0.09      0.09      6178

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Spanish_BIO,0.792,0.785,0.788,0.852,0.786,0.791,0.619
Spanish_IO,0.778,0.769,0.773,0.822,0.778,0.782,0.587


#### <span style="font-family:Courier New; color:#994C00">**BIOS**</span>

In [30]:
esp_train_BIOS = convert_BIO(esp_train, begin = True, single = True)
esp_test_BIOS = convert_BIO(esp_test, begin = True, single = True)

X_esp_test_BIOS = [[word[0] for word in sent] for sent in esp_test_BIOS]
y_esp_test_BIOS = [[word[1] for word in sent] for sent in esp_test_BIOS]

In [31]:
customed_hyperparams = {'c1': 0.01, 'c2': 1, 'max_iterations': 200, 'feature.possible_transitions': False,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func=Feature_getter(), training_opt = customed_hyperparams)
model.train(esp_train_BIOS, 'models/model.crf.tagger')

In [32]:
pred_esp_BIOS = model.tag_sents(X_esp_test_BIOS)
y_pred_BIOS = [[word[1] for word in sent] for sent in pred_esp_BIOS]

print(bio_classification_report(y_esp_test_BIOS, y_pred_BIOS))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(esp_test_BIOS, pred_esp_BIOS)
results_esp = save_ent_results("Spanish_BIOS", results, results_agg_ent, results_esp)
results_esp

              precision    recall  f1-score   support

       B-LOC       0.72      0.64      0.68       196
       I-LOC       0.70      0.65      0.67       325
       S-LOC       0.82      0.82      0.82       888
      B-MISC       0.63      0.52      0.57       183
      I-MISC       0.66      0.54      0.59       557
      S-MISC       0.71      0.45      0.55       156
       B-ORG       0.84      0.78      0.81       467
       I-ORG       0.83      0.81      0.82      1104
       S-ORG       0.78      0.86      0.82       933
       B-PER       0.89      0.95      0.92       504
       I-PER       0.90      0.94      0.92       634
       S-PER       0.79      0.74      0.76       231

   micro avg       0.80      0.78      0.79      6178
   macro avg       0.77      0.72      0.74      6178
weighted avg       0.80      0.78      0.79      6178
 samples avg       0.09      0.09      0.09      6178

Entity level evaluation


Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Spanish_BIO,0.792,0.785,0.788,0.852,0.786,0.791,0.619
Spanish_IO,0.778,0.769,0.773,0.822,0.778,0.782,0.587
Spanish_BIOS,0.789,0.781,0.785,0.86,0.778,0.793,0.595


#### <span style="font-family:Courier New; color:#994C00">**BIOES**</span>

In [33]:
esp_train_BIOES = convert_BIO(esp_train, begin = True, single = True, end = True)
esp_test_BIOES = convert_BIO(esp_test, begin = True, single = True, end = True)

X_esp_test_BIOES = [[word[0] for word in sent] for sent in esp_test_BIOES]
y_esp_test_BIOES = [[word[1] for word in sent] for sent in esp_test_BIOES]

In [34]:
customed_hyperparams = {'c1': 0.01, 'c2': 1, 'max_iterations': 200, 'feature.possible_transitions': False,
                                            'feature.possible_states': True, 'feature.minfreq': 0}
model = nltk.tag.CRFTagger(feature_func=Feature_getter(), training_opt = customed_hyperparams)
model.train(esp_train_BIOES, 'models/model.crf.tagger')

In [35]:
pred_esp_BIOES = model.tag_sents(X_esp_test_BIOES)
y_pred_BIOES = [[word[1] for word in sent] for sent in pred_esp_BIOES]

print(bio_classification_report(y_esp_test_BIOES, y_pred_BIOES))
print('='*80)
print('Entity level evaluation')
print('='*80)
results, results_agg_ent = compute_metrics(esp_test_BIOES, pred_esp_BIOES)
results_esp = save_ent_results("Spanish_BIOES", results, results_agg_ent, results_esp)
results_esp

              precision    recall  f1-score   support

       B-LOC       0.73      0.65      0.69       196
       E-LOC       0.75      0.69      0.72       177
       I-LOC       0.66      0.58      0.62       148
       S-LOC       0.82      0.82      0.82       888
      B-MISC       0.65      0.55      0.59       183
      E-MISC       0.58      0.50      0.53       183
      I-MISC       0.69      0.51      0.59       374
      S-MISC       0.71      0.44      0.55       156
       B-ORG       0.86      0.77      0.81       467
       E-ORG       0.78      0.72      0.75       458
       I-ORG       0.82      0.80      0.81       646
       S-ORG       0.77      0.86      0.82       933
       B-PER       0.88      0.96      0.92       504
       E-PER       0.87      0.96      0.91       498
       I-PER       0.91      0.85      0.87       136
       S-PER       0.78      0.74      0.76       231

   micro avg       0.80      0.77      0.78      6178
   macro avg       0.77   

Unnamed: 0,total acc,total recall,total F1,PER F1,ORG F1,LOC F1,MISC F1
Spanish_BIO,0.792,0.785,0.788,0.852,0.786,0.791,0.619
Spanish_IO,0.778,0.769,0.773,0.822,0.778,0.782,0.587
Spanish_BIOS,0.789,0.781,0.785,0.86,0.778,0.793,0.595
Spanish_BIOES,0.789,0.785,0.787,0.852,0.781,0.797,0.607


<span style="font-family:Courier New">As we can see, as in the dutch model, the codification that works best is BIO, with the higher F1-score. On the other side, we find that IO is the worst, surely because of its lack of information. What is sure is that adding 'Single' label turns out in good models. This points us that there are plenty of single token entities. Further actions could be taken with this type of models, but we will continue in the same line, with the BIO model.</span>

## <span style="font-family:Courier New; color:#336666">**Adding Gazetteers**</span>

In [None]:
'''names =  []
for sent in ned_train_BIO:
    for token, label in sent:
        if label == 'B-PER':
            names.append(token)
r = Counter(names)
print(r.keys())'''

## <span style="font-family:Courier New; color:#336666">**Language Models Comparsion**</span>

<span style="font-family:Courier New">In this section, once we have found the best models for each language, we consider opportune to make a comparison.</span>

In [36]:
pred_esp_BIO

[[('La', 'B-LOC'),
  ('Coruña', 'I-LOC'),
  (',', 'O'),
  ('23', 'O'),
  ('may', 'O'),
  ('(', 'O'),
  ('EFECOM', 'B-ORG'),
  (')', 'O'),
  ('.', 'O')],
 [('-', 'O')],
 [('Las', 'O'),
  ('reservas', 'O'),
  ('"', 'O'),
  ('on', 'O'),
  ('line', 'O'),
  ('"', 'O'),
  ('de', 'O'),
  ('billetes', 'O'),
  ('aéreos', 'O'),
  ('a', 'O'),
  ('través', 'O'),
  ('de', 'O'),
  ('Internet', 'B-MISC'),
  ('aumentaron', 'O'),
  ('en', 'O'),
  ('España', 'B-LOC'),
  ('un', 'O'),
  ('300', 'O'),
  ('por', 'O'),
  ('ciento', 'O'),
  ('en', 'O'),
  ('el', 'O'),
  ('primer', 'O'),
  ('trimestre', 'O'),
  ('de', 'O'),
  ('este', 'O'),
  ('año', 'O'),
  ('con', 'O'),
  ('respecto', 'O'),
  ('al', 'O'),
  ('mismo', 'O'),
  ('período', 'O'),
  ('de', 'O'),
  ('1999', 'O'),
  (',', 'O'),
  ('aseguró', 'O'),
  ('hoy', 'O'),
  ('Iñigo', 'B-PER'),
  ('García', 'I-PER'),
  ('Aranda', 'I-PER'),
  (',', 'O'),
  ('responsable', 'O'),
  ('de', 'O'),
  ('comunicación', 'O'),
  ('de', 'O'),
  ('Savia', 'B-LOC'),
  ('A