#### Fine-tuning a pretrained Spacy model

In order to improve the accuracy of the models discussed during the model evaluation part, the chosen models will be fine-tuned to be accomodated to our task and the way which we've annotated our entities.

##### Fine-tune Example

This example will show a glimpse of fine-tuning a Spacy pretrained model and how it can lead to desired results. The fine tuning is achieved through the *model.update()* function which takes as input **Example** objects that contain pairs of sentences and their corresponding entities that follow the IOB2 schema.  

In [9]:
import spacy
import random
from spacy import displacy
from spacy.training import Example
from ner_test_utils import load_spacy_ner_model

##### Load a pretrained model & Enable the NER component
original_model = load_spacy_ner_model("en_core_web_md")
fine_tuned = load_spacy_ner_model("en_core_web_md")

print("Two models loaded (a default one and a copy of it to be fine-tuned)\n...")

##### Custom Train Set
train_set = [
     ("This Digital Health project was funded by the WHO in cooperation with the United Kingdom and the United States, spearheaded by Dr. Christoph John who graduated in Japan",
      ["O", "O", "O", "O", "O", "O", "O", "O", "U-ORG", "O", "O", "O", "O", "B-GPE", "L-GPE", "O", "O", "B-GPE", "L-GPE", "O", "O", "B-PERSON", "I-PERSON", "L-PERSON", "O", "O",
       "O", "O", "U-GPE"]),
     ("What is GDPR, the EU's new data protection law?", ["O", "O", "U-LAW", "O", "U-ORG", "O", "O", "O", "O", "O", "O"])]

##### Train the NER model
optimizer = fine_tuned.resume_training()
epochs = 5
losses_dictionary = {}
dropout = 0.0

for i in range(epochs):
    random.shuffle(train_set)
    for text, annotations in train_set:
        doc = fine_tuned.make_doc(text)
        example = Example.from_dict(doc, {"entities": annotations})
        fine_tuned.update([example],  sgd = optimizer, losses = losses_dictionary, drop = dropout)
        
print("Fine-Tuning Finished")

Two models loaded (a default one and a copy of it to be fine-tuned)
...
Fine-Tuning Finished


On the same example sentence, perform one prediction from the original pretrained model and the fine-tuned version of it to show that the differences (if any) impact our desired performance positively.

In [10]:
#### Test the model on a new sentence
example_sentence = "The United States agreed to strengthen it's cooperation with digital health organizations in Paris and Geneva, while following GDPR"
doc = original_model(example_sentence)
docerino = fine_tuned(example_sentence)

print('Original Model: ')
displacy.render(doc, style = 'ent')
print('Fine-tuned:')
displacy.render(docerino, style = 'ent')

Original Model: 


Fine-tuned:


##### Model Fine-Tuning to our Custom Dataset

First of all, the dataset will be loaded from Prodigy again with the difference being the split into a train and test set. The train set will be used to update (fine-tune) the pretrained model while the test set to evaluate it as before.

In [11]:
from ner_test_utils import load_spacy_ner_model

spacy.prefer_gpu()
model_name = "en_core_web_trf"
nlp = load_spacy_ner_model(model_name)
    
# nlp = spacy.load(model_name, disable = ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'])
print("Spacy NLP model named '{}' successfully loaded".format(model_name))

Spacy NLP model named 'en_core_web_trf' successfully loaded


Loading the entire dataset from *Prodigy* and shuffling it so the order does not influence the score. 

After being shuffled, the dataset is split into the train and test components, by also taking into account a variable containing the fraction of the test size in comparison to the total size.

In [12]:
import random
from prodigy.components.db import connect


#### Load and shuffle dataset
prodigy_dataset_name = 'ner_1000_health'
seed = 44
random.seed(seed)
db = connect()
ner_dataset = db.get_dataset(prodigy_dataset_name)
random.shuffle(ner_dataset)
print('Custom Health NER Dataset (named {}) loaded and shuffled'.format(prodigy_dataset_name))

test_fraction = 0.3
test_size = int(test_fraction * len(ner_dataset))
train_size = (len(ner_dataset) - test_size)
print('Original Length: {}, Test Size: {}, Train Size: {}'.format(len(ner_dataset), test_size, train_size))

train_set = ner_dataset[:train_size]
test_set = ner_dataset[train_size:]

Custom Health NER Dataset (named ner_1000_health) loaded and shuffled
Original Length: 1050, Test Size: 315, Train Size: 735


##### Training Loop

Uncommenting the cell below allows you to fine tune a *SpaCy* model using a custom Python function however, a cause of performance and accuracy reasons, this approach is not recommended.

**Instead**, one should use the **spacy train** function on the command line that takes as input the model in addition to the train and dev sets. It outputs the fine tuned model and saves it in the directory. In my case, the resulting fine tuned model is found in the preceding cell after this one.

In [13]:
# from ner_test_utils import load_spacy_ner_model
# from ner_train_and_test_functions import fine_tune_model

# epochs = 1
# p_dropout = 0.0

# ##### Function to fine-tune a SpaCy Model to better adapt to our task
# model = load_spacy_ner_model(model_name)
    
# print('"{}" model successfully loaded and fine-tuning started | {} Epochs | Adam Optimizer | {} Dropout Probability \n'.format(model_name, epochs, p_dropout))
# fine_tuned_model = fine_tune_model(model)
# print('"{}" model successfully fine-tuned | {} Epochs | {} Batch Size | Adam Optimizer | {} Dropout Probability \n'.format(model_name, epochs, p_dropout))    

Loading our fine-tuned model and evaluating it.

In [14]:
from spacy import displacy

fine_tuned_model = spacy.load("../models/model-best-lg")
example = "Digital health projects in LMICs are one of the main focuses of conferences held at the World Health Organization (WHO) in New York, USA."
prediction = fine_tuned_model(example)
displacy.render(prediction, style = 'ent')

##### Evaluation on Test Data

In [15]:
from ner_train_and_test_functions import evaluate_model
from ner_score_report import remove_iob

##### Enable or Disable entity prediction visualization using displacy
##### NOTE: Preemptively decide which small sub-sample of the dataset to visualize to avoid bloating
visualization_set = test_set[20:40]
visualization = True

##### Enable or Disable the pipelines to also perform POS tagging in parallel with NER
automatic_pos_tagging = False


all_tags, all_examples = evaluate_model(fine_tuned_model, ner_dataset, visualization_set, visualization, automatic_pos_tagging)

Model evaluation on the test set started | Visualization: True | POS Tagging: False
Testing finished...
Total runtime in seconds: 16.4270
Average Time to process one single sentence: 0.0156 seconds
Visualization: 
Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


##### Final Scores

In [16]:
from ner_score_report import set_tags_to_fixed_labels, compute_and_print_strict_scores

##### Set any IOB tag that contains a different entity label than our wanted ones to 'O' (Others)
##### Both the strict and lenient score functions need lists of these fixed labels
our_labels = ['GPE', 'LAW', 'ORG', 'PERSON', 'PRODUCT']
all_tags['true'] = set_tags_to_fixed_labels(our_labels, all_tags['true'])
all_tags['predicted'] = set_tags_to_fixed_labels(our_labels, all_tags['predicted'])

##### STRICT & LENIENT scores
compute_and_print_strict_scores(all_tags, our_labels)

Performance with respect to only our fixed labels: 

Strict Score Report (Entity Span-Level):
              precision    recall  f1-score   support

         GPE       0.93      0.96      0.95       661
         LAW       0.90      0.79      0.84       110
         ORG       0.94      0.90      0.92       838
      PERSON       0.89      0.88      0.88       216
     PRODUCT       0.97      0.76      0.85       116

   micro avg       0.93      0.90      0.92      1941
   macro avg       0.92      0.86      0.89      1941
weighted avg       0.93      0.90      0.91      1941

Lenient Score Report (Token-Level):
              precision    recall  f1-score   support

         GPE       0.95      0.96      0.95       913
         LAW       0.96      0.85      0.90       556
         ORG       0.96      0.92      0.94      2912
      PERSON       0.94      0.89      0.91       460
     PRODUCT       0.96      0.75      0.84       208

   micro avg       0.96      0.91      0.93      5049
 