#### This file will concern itself with the *Named Entity Recognition (NER)* part of the project.

The pre-trained model is loaded only with the EntityRecognizer pipeline enabled to improve loading and inference speed. Other pipelines are disabled such as ones concerned with POS tagging, lemmatization, parsing, etc.

In [1]:
import spacy
from spacy import displacy

spacy.prefer_gpu()
model_name = "en_core_web_trf"

if model_name == "en_core_web_trf":
    nlp = spacy.load(model_name, enable = ['transformer', 'ner'])
else:
    nlp = spacy.load(model_name, enable = ['tok2vec', 'ner'])
    
# nlp = spacy.load(model_name, disable = ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'])
print("Spacy NLP model named '{}' successfully loaded".format(model_name))

Spacy NLP model named 'en_core_web_trf' successfully loaded


The *Named Entity Recognition* pipeline that the model is equipped with is able to detect the following tags by default.

In **our** case, we are only interested in the GPE, ORG, LAW, PERSON, and PRODUCT tags.

In [2]:
print("Pretrained NER model's supported labels: " + ', '.join(nlp.pipe_labels['ner']))

Pretrained NER model's supported labels: CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART


A pretrained model is loaded from the NLP library *Spacy* which takes sentences as input and performs several sentence tagging tasks including NER which we are interested in.

A great visualization of the entity recognition process is displayed by the *displacy* suite.

In [3]:
example_text_1 = "klevio is a singer from Albania who usually goes to Greece and works in UBS. He lives in Lake Geneva and owns a Mercedes car."
doc1 = nlp(example_text_1)
print('Example I: ')
displacy.render(doc1, style = 'ent')

doc2 = nlp('The government in Senegal just passed a law on the 2nd of February regarding universal healthcare, named Universal Care Act, passed in parliament also in French')
print('Example II: ')
displacy.render(doc2, style = 'ent')

doc3 = nlp('World Health Organization in Geneva')
print('Example III: ')
displacy.render(doc3, style = 'ent')

Example I: 


Example II: 


Example III: 


In [4]:
from ner_train_functions import add_pos_tags_to_sentence

add_pos_tags_to_sentence(nlp, example_text_1) 

klevio is a singer from Albania who usually goes to Greece and works in UBS. He lives in Lake Geneva and owns a Mercedes car . 

An example of how entities found in the text are saved.

In [5]:
for ent in doc1.ents:
    print("Entity: {}, Label: {}, Label ID: {} ".format(ent.text, ent.label_, ent.label))

Entity: klevio, Label: PERSON, Label ID: 380 
Entity: Albania, Label: GPE, Label ID: 384 
Entity: Greece, Label: GPE, Label ID: 384 
Entity: UBS, Label: ORG, Label ID: 383 
Entity: Lake Geneva, Label: LOC, Label ID: 385 
Entity: Mercedes, Label: ORG, Label ID: 383 


Our custom dataset containing different sentences related to global digital health organizations, products, people, countries, and laws will be loaded from Prodigy in a format which is friendly to the Spacy library.

This dataset will be used for our supervised entity recognition learning task.

In [6]:
import random
from prodigy.components.db import connect


#### Load and shuffle dataset
prodigy_dataset_name = 'ner_1000_health'
seed = 596
random.seed(seed)
db = connect()
ner_dataset = db.get_dataset(prodigy_dataset_name)
random.shuffle(ner_dataset)
print('Custom Health NER Dataset (named {}) loaded and shuffled'.format(prodigy_dataset_name))

Custom Health NER Dataset (named ner_1000_health) loaded and shuffled


One sample from the loaded dataset. Sentences with annotated entities are saved in a *JSONL* format with the *'text'* field holding the text input and the *'spans'* field holding the annotated entity spans. 

In [7]:
test_sample = ner_dataset[0]
test_sample

{'text': 'Compared to many countries, the United Kingdom’s facility for COVID-19 RT-PCR testing has been very limited.',
 '_input_hash': -214486349,
 '_task_hash': -781034035,
 'tokens': [{'text': 'Compared', 'start': 0, 'end': 8, 'id': 0, 'ws': True},
  {'text': 'to', 'start': 9, 'end': 11, 'id': 1, 'ws': True},
  {'text': 'many', 'start': 12, 'end': 16, 'id': 2, 'ws': True},
  {'text': 'countries', 'start': 17, 'end': 26, 'id': 3, 'ws': False},
  {'text': ',', 'start': 26, 'end': 27, 'id': 4, 'ws': True},
  {'text': 'the', 'start': 28, 'end': 31, 'id': 5, 'ws': True},
  {'text': 'United', 'start': 32, 'end': 38, 'id': 6, 'ws': True},
  {'text': 'Kingdom', 'start': 39, 'end': 46, 'id': 7, 'ws': False},
  {'text': '’s', 'start': 46, 'end': 48, 'id': 8, 'ws': True},
  {'text': 'facility', 'start': 49, 'end': 57, 'id': 9, 'ws': True},
  {'text': 'for', 'start': 58, 'end': 61, 'id': 10, 'ws': True},
  {'text': 'COVID-19', 'start': 62, 'end': 70, 'id': 11, 'ws': True},
  {'text': 'RT', 'st

##### NER Evaluation

The part of the code that concerns itself with the model's evaluation on a testing set. It outputs a classification report based on the true entity spans and predicted entity spans following **IOB2** schema format.

In [8]:
for sample in ner_dataset[:5]:
    print(sample)

{'text': 'Compared to many countries, the United Kingdom’s facility for COVID-19 RT-PCR testing has been very limited.', '_input_hash': -214486349, '_task_hash': -781034035, 'tokens': [{'text': 'Compared', 'start': 0, 'end': 8, 'id': 0, 'ws': True}, {'text': 'to', 'start': 9, 'end': 11, 'id': 1, 'ws': True}, {'text': 'many', 'start': 12, 'end': 16, 'id': 2, 'ws': True}, {'text': 'countries', 'start': 17, 'end': 26, 'id': 3, 'ws': False}, {'text': ',', 'start': 26, 'end': 27, 'id': 4, 'ws': True}, {'text': 'the', 'start': 28, 'end': 31, 'id': 5, 'ws': True}, {'text': 'United', 'start': 32, 'end': 38, 'id': 6, 'ws': True}, {'text': 'Kingdom', 'start': 39, 'end': 46, 'id': 7, 'ws': False}, {'text': '’s', 'start': 46, 'end': 48, 'id': 8, 'ws': True}, {'text': 'facility', 'start': 49, 'end': 57, 'id': 9, 'ws': True}, {'text': 'for', 'start': 58, 'end': 61, 'id': 10, 'ws': True}, {'text': 'COVID-19', 'start': 62, 'end': 70, 'id': 11, 'ws': True}, {'text': 'RT', 'start': 71, 'end': 73, 'id': 

Testing Loop

In [9]:
import time
from spacy.training import Example
from ner_train_functions import get_entities_from_jsonl, single_token_tags, visualize_predictions


all_examples = []
all_tags = {"true" : [], "predicted": []}
test_set = ner_dataset

##### Enable or Disable entity prediction visualization using displacy
##### NOTE: Preemptively decide which small sub-sample of the dataset to visualize to avoid bloating
visualization_set = ner_dataset[5:15]
visualization = True

##### Enable or Disable the pipelines to also perform POS tagging in parallel with NER
automatic_pos_tagging = True
if automatic_pos_tagging:
        nlp.enable_pipe('tagger')
        nlp.enable_pipe('parser')
        nlp.enable_pipe('attribute_ruler')


start_time = time.time()
print('Model evaluation on the test set started | Visualization: {} | POS Tagging: {}'.format(visualization, automatic_pos_tagging))
for sample in test_set:
        true_entity_spans, true_entities = get_entities_from_jsonl(sample)
        sentence = sample['text']
        
        ##### Perform model prediction (Sentence input can be either a Spacy Doc object or a Text String)
        prediction = nlp(sentence)
        
        ##### Debug: Uncomment if you want to print POS Tags (The case that automatic_pos_tagging is set to TRUE)
        # for i, token in enumerate(prediction):
        #         print(token.text, token.pos_)
        
        ##### Converting each true and predicted entity span to the IOB2 tag schema (instead of START and END indexes of entities)
        predicted_ent_spans = [(ent.start_char, ent.end_char, ent.label_) for ent in prediction.ents]  
        predicted_ent_tags = single_token_tags(prediction, predicted_ent_spans)
        true_ent_tags = single_token_tags(prediction, true_entity_spans)
        
        ##### Create Examples
        example = Example.from_dict(nlp.make_doc(sentence), {"entities": true_entity_spans})
        example.predicted = prediction
        all_examples.append(example)
        
        ##### Uncomment for debug        
        # print('For Sentence: {}'.format(sentence))
        # print('True Entity Span: {}'.format(true_entity_spans))
        # print('Model Prediction: {}'.format(predicted_ent_spans))
        # print('Predicted Tags: {}'.format(predicted_ent_tags))
        # print('True Tags: {}\n'.format(true_ent_tags))

        # all_examples.append(Example.from_dict(prediction, {'entities': true_entity_spans}))
        all_tags['true'].append(true_ent_tags)
        all_tags['predicted'].append(predicted_ent_tags)        

if visualization:
##### Visualization Debug
        visualize_predictions(visualization_set, nlp)
        

end_time = time.time()
print('Testing finished...')
print("Total runtime in seconds: {:.4f}".format(end_time - start_time))
print('Average Time to process one single sentence: {:.4f} seconds'.format((end_time - start_time) / len(test_set)))

Model evaluation on the test set started | Visualization: True | POS Tagging: True
Visualization: 
Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 




True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Predicted Entity Spans: 


True Entity Spans: 


Testing finished...
Total runtime in seconds: 80.1717
Average Time to process one single sentence: 0.0764 seconds


##### Score Classification Report

TODO: Discuss the lenient approach

In [11]:
##### NOTE: Experimental, removing the prefixes of the IOB tags (B-, I-) alltogether to create a more lenient scoring system based solely on accuracy per LABEL.
##### Example: While much progress has been made with tuberculosis control, the World Health Organization (WHO) estimates that 9 million people developed tuberculosis.
##### True: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'ORG', 'ORG', 'ORG', 'ORG', 'ORG', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
##### Prediction: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'ORG', 'ORG', 'ORG', 'ORG', 'O', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

def remove_iob(tags_list):
    for tags in tags_list:
        for i, tag in enumerate(tags):
            if tag != 'O':
                tags[i] = tag[2:]
    return tags_list

Both strict and lenient scores

In [12]:
from sklearn.metrics import classification_report
from ner_score_report import set_tags_to_fixed_labels, compute_and_print_strict_scores

##### Set any IOB tag that contains a different entity label than our wanted ones to 'O' (Others)
##### Both the strict and lenient score functions need lists of these fixed labels
our_labels = ['GPE', 'LAW', 'ORG', 'PERSON', 'PRODUCT']
all_tags['true'] = set_tags_to_fixed_labels(our_labels, all_tags['true'])
all_tags['predicted'] = set_tags_to_fixed_labels(our_labels, all_tags['predicted'])

##### STRICT
print('For model name: {}'.format(model_name))
compute_and_print_strict_scores(all_tags)

##### LENIENT
true_tags_no_iob = remove_iob(all_tags['true'])
predicted_tags_no_iob = remove_iob(all_tags['predicted'])

true_tags_flat = [tag for seq in true_tags_no_iob for tag in seq]
predicted_tags_flat = [tag for seq in predicted_tags_no_iob for tag in seq]

lenient_report = classification_report(true_tags_flat, predicted_tags_flat, labels = our_labels)
print('\nLenient Score Report (Token-Level):')
print('\n'.join(lenient_report.splitlines()))

For model name: en_core_web_trf
Performance with respect to only our fixed labels: 

Strict Score Report (Entity Span-Level):
              precision    recall  f1-score   support

         GPE       0.86      0.75      0.80       661
         LAW       0.13      0.13      0.13       110
         ORG       0.38      0.49      0.43       838
      PERSON       0.76      0.75      0.75       216
     PRODUCT       0.52      0.30      0.38       116

   micro avg       0.54      0.57      0.56      1941
   macro avg       0.53      0.48      0.50      1941
weighted avg       0.58      0.57      0.57      1941

Lenient Score Report (Token-Level):
              precision    recall  f1-score   support

         GPE       0.87      0.74      0.80       913
         LAW       0.78      0.76      0.77       556
         ORG       0.77      0.86      0.81      2912
      PERSON       0.92      0.79      0.85       460
     PRODUCT       0.64      0.31      0.42       208

   micro avg       0.80

A more lenient scoring function based on token-level entities. 