#### Fine-tuning a pretrained Spacy model

In order to improve the accuracy of the models discussed during the model evaluation part, the chosen models will be fine-tuned to be accomodated to our task and the way which we've annotated our entities.

##### Fine-tune Example

This example will show a glimpse of fine-tuning a Spacy pretrained model and how it can lead to desired results. The fine tuning is achieved through the *model.update()* function which takes as input **Example** objects that contain pairs of sentences and their corresponding entities that follow the IOB2 schema.  

In [1]:
import spacy
import random
from spacy import displacy
from spacy.training import Example

##### Load a pretrained model & Enable the NER component
original_model = spacy.load("en_core_web_md", enable = ['tok2vec', 'ner'])
fine_tuned = spacy.load("en_core_web_md", enable = ['tok2vec', 'ner'])


##### Custom Train Set
train_set = [
     ("This Digital Health project was funded by the WHO in cooperation with the United Kingdom and the United States, spearheaded by Dr. Christoph John who graduated in Japan",
      ["O", "O", "O", "O", "O", "O", "O", "O", "U-ORG", "O", "O", "O", "O", "B-GPE", "L-GPE", "O", "O", "B-GPE", "L-GPE", "O", "O", "B-PERSON", "I-PERSON", "L-PERSON", "O", "O",
       "O", "O", "U-GPE"]),
     ("What is GDPR, the EU's new data protection law?", ["O", "O", "U-LAW", "O", "U-ORG", "O", "O", "O", "O", "O", "O"])]

##### Train the NER model
optimizer = fine_tuned.resume_training()
epochs = 5
losses_dictionary = {}
dropout = 0.0

for i in range(epochs):
    random.shuffle(train_set)
    for text, annotations in train_set:
        doc = fine_tuned.make_doc(text)
        example = Example.from_dict(doc, {"entities": annotations})
        fine_tuned.update([example],  sgd = optimizer, losses = losses_dictionary, drop = dropout)

On the same example sentence, perform one prediction from the original pretrained model and the fine-tuned version of it to show that the differences (if any) impact our desired performance positively.

In [2]:
#### Test the model on a new sentence
example_sentence = "The United States agreed to strengthen it's cooperation with digital health organizations in Paris and Geneva, while following GDPR"
doc = original_model(example_sentence)
docerino = fine_tuned(example_sentence)

print('Original Model: ')
displacy.render(doc, style = 'ent')
print('Fine-tuned:')
displacy.render(docerino, style = 'ent')

Original Model: 


Fine-tuned:


##### Model Fine-Tuning to our Custom Dataset

First of all, the dataset will be loaded from Prodigy again with the difference being the split into a train and test set. The train set will be used to update (fine-tune) the pretrained model while the test set to evaluate it as before.

In [3]:
spacy.prefer_gpu()
model_name = "en_core_web_trf"

if model_name == "en_core_web_trf":
    nlp = spacy.load(model_name, enable = ['transformer', 'ner'])
else:
    nlp = spacy.load(model_name, enable = ['tok2vec', 'ner'])
    
# nlp = spacy.load(model_name, disable = ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'])
print("Spacy NLP model named '{}' successfully loaded".format(model_name))

Spacy NLP model named 'en_core_web_trf' successfully loaded


In [4]:
import random
from prodigy.components.db import connect


#### Load and shuffle dataset
prodigy_dataset_name = 'ner_1000_health'
seed = 44
random.seed(seed)
db = connect()
ner_dataset = db.get_dataset(prodigy_dataset_name)
random.shuffle(ner_dataset)
print('Custom Health NER Dataset (named {}) loaded and shuffled'.format(prodigy_dataset_name))

test_fraction = 0.4
test_size = int(test_fraction * len(ner_dataset))
train_size = (len(ner_dataset) - test_size)
print('Original Length: {}, Test Size: {}, Train Size: {}'.format(len(ner_dataset), test_size, train_size))

train_set = ner_dataset[:train_size]
test_set = ner_dataset[train_size:]

Custom Health NER Dataset (named ner_1000_health) loaded and shuffled
Original Length: 1050, Test Size: 420, Train Size: 630


In [5]:
# from spacy.tokens import DocBin
# from ner_test_utils import get_entities_from_jsonl

# nlp = spacy.blank("en")

# db = DocBin()
# for jsonl_sample in test_set:
#     doc = nlp(jsonl_sample['text'])
#     spans, entities = get_entities_from_jsonl(jsonl_sample)
#     ents = []
#     for start, end, label in spans:
#         span = doc.char_span(start, end, label=label)
#         ents.append(span)
#     doc.ents = ents
#     db.add(doc)
# db.to_disk("./test.spacy")

##### Training Loop

In [6]:
# epochs = 1
# p_dropout = 0.0

# ##### Function to fine-tune a SpaCy Model to better adapt to our task
# if model_name == "en_core_web_trf":
#     model = spacy.load(model_name, enable = ['transformer', 'ner'])
# else:
#     model = spacy.load(model_name, enable = ['tok2vec', 'ner'])
    
# print('"{}" model successfully loaded and fine-tuning started | {} Epochs | Adam Optimizer | {} Dropout Probability \n'.format(model_name, epochs, p_dropout))
# optimizer = model.resume_training()
# losses_dictionary = {}

# ###### The model is updated by taking a list of Example objects and therefore learn from them
# for i in range(epochs):
#     ###### Each iteration, the train data is shuffled to avoid generalizations based on order
#     random.shuffle(train_set)
#     # batches = [train_set[j:j + batch_size] for j in range(0, len(train_set), batch_size)]
#     # for batch in batches:
#     examples_in_batch = []
#     for jsonl_sample in train_set:
#         ###### Example objects require a Doc Object containing the sample's text and the IOB tag representation of it
#         ###### Hence, we want to create the Doc Object and IOB tags from the JSONL Prodigy Samples using the functions below
#         spans, _ = get_entities_from_jsonl(jsonl_sample)
#         doc = model.make_doc(jsonl_sample['text'])
#         entity_iob_tags = single_token_tags(doc, spans)
#         ###### Convert them to an Example object and add them to a list to be fed into the model after each batch
#         examples_in_batch.append(Example.from_dict(doc, {"entities": entity_iob_tags}))
#     model.update(examples_in_batch,  sgd = optimizer, losses = losses_dictionary, drop = p_dropout)
#     print('Epoch {} | Loss: {}'.format(i + 1, losses_dictionary['ner']))    


# # print('"{}" model successfully fine-tuned | {} Epochs | {} Batch Size | Adam Optimizer | {} Dropout Probability \n'.format(model_name, epochs, batch_size, p_dropout))    

In [7]:
model = spacy.load("./models/model-best-44")

In [8]:
from spacy import displacy

example = "Digital health projects in LMICs are one of the main focuses of conferences held at the World Health Organization (WHO) in New York, USA."
prediction = model(example)
displacy.render(prediction, style = 'ent')

##### Evaluation on Test Data

In [9]:
from ner_train_and_test_functions import evaluate_model

def remove_iob(tags_list):
    for tags in tags_list:
        for i, tag in enumerate(tags):
            if tag != 'O':
                tags[i] = tag[2:]
    return tags_list

##### Enable or Disable entity prediction visualization using displacy
##### NOTE: Preemptively decide which small sub-sample of the dataset to visualize to avoid bloating
visualization_set = test_set[:50]
visualization = True

##### Enable or Disable the pipelines to also perform POS tagging in parallel with NER
automatic_pos_tagging = False


all_tags, all_examples = evaluate_model(model, ner_dataset, visualization_set, visualization, automatic_pos_tagging)

Model evaluation on the test set started | Visualization: True | POS Tagging: False
Testing finished...
Total runtime in seconds: 12.9954
Average Time to process one single sentence: 0.0124 seconds
Visualization: 
Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 




True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


##### Final Scores

In [10]:
from sklearn.metrics import classification_report
from ner_score_report import set_tags_to_fixed_labels, compute_and_print_strict_scores

def remove_iob(tags_list):
    for tags in tags_list:
        for i, tag in enumerate(tags):
            if tag != 'O':
                tags[i] = tag[2:]
    return tags_list

##### Set any IOB tag that contains a different entity label than our wanted ones to 'O' (Others)
##### Both the strict and lenient score functions need lists of these fixed labels
our_labels = ['GPE', 'LAW', 'ORG', 'PERSON', 'PRODUCT']
all_tags['true'] = set_tags_to_fixed_labels(our_labels, all_tags['true'])
all_tags['predicted'] = set_tags_to_fixed_labels(our_labels, all_tags['predicted'])

##### STRICT
print('For model name: {}'.format(model_name))
compute_and_print_strict_scores(all_tags)

##### LENIENT
true_tags_no_iob = remove_iob(all_tags['true'])
predicted_tags_no_iob = remove_iob(all_tags['predicted'])

true_tags_flat = [tag for seq in true_tags_no_iob for tag in seq]
predicted_tags_flat = [tag for seq in predicted_tags_no_iob for tag in seq]

lenient_report = classification_report(true_tags_flat, predicted_tags_flat, labels = our_labels)
print('\nLenient Score Report (Token-Level):')
print('\n'.join(lenient_report.splitlines()))

For model name: en_core_web_trf
Performance with respect to only our fixed labels: 

Strict Score Report (Entity Span-Level):
              precision    recall  f1-score   support

         GPE       0.96      0.96      0.96       661
         LAW       0.85      0.71      0.77       110
         ORG       0.91      0.88      0.90       838
      PERSON       0.89      0.91      0.90       216
     PRODUCT       0.75      0.70      0.72       116

   micro avg       0.91      0.89      0.90      1941
   macro avg       0.87      0.83      0.85      1941
weighted avg       0.91      0.89      0.90      1941

Lenient Score Report (Token-Level):
              precision    recall  f1-score   support

         GPE       0.98      0.96      0.97       913
         LAW       0.95      0.77      0.85       556
         ORG       0.95      0.92      0.93      2912
      PERSON       0.90      0.92      0.91       460
     PRODUCT       0.71      0.67      0.69       208

   micro avg       0.94