Evaluate CRF models for person names, orgs and locations using the Presidio Evaluator framework

Data = `test_February_28_2020`

In [9]:
from tqdm import tqdm_notebook as tqdm
import logging
from presidio_evaluator import InputSample
from presidio_evaluator.data_generator import read_synth_dataset
import spacy
import pandas as pd
import pickle

pd.set_option('display.width', 10000)
pd.set_option('display.max_colwidth', -1)


%reload_ext autoreload
%autoreload 2



  # Remove the CWD from sys.path while we load stuff.


Select data for evaluation:

In [10]:
synth_samples = read_synth_dataset("../../data/synth_dataset.json")
print(len(synth_samples))


DATASET = synth_samples

300


In [11]:
from collections import Counter
entity_counter = Counter()
for sample in DATASET:
    for tag in sample.tags:
        entity_counter[tag]+=1

In [12]:
entity_counter

Counter({'O': 4721,
         'B-PERSON': 102,
         'L-PERSON': 102,
         'U-PERSON': 72,
         'U-CREDIT_CARD': 49,
         'B-LOCATION': 47,
         'I-LOCATION': 181,
         'L-LOCATION': 47,
         'B-ORGANIZATION': 43,
         'L-ORGANIZATION': 43,
         'I-ORGANIZATION': 28,
         'U-LOCATION': 28,
         'U-EMAIL': 11,
         'U-BIRTHDAY': 4,
         'B-TITLE': 2,
         'L-TITLE': 2,
         'U-TITLE': 2,
         'B-PHONE_NUMBER': 9,
         'L-PHONE_NUMBER': 9,
         'I-PHONE_NUMBER': 18,
         'U-ORGANIZATION': 5,
         'U-IBAN': 3,
         'U-NATIONALITY': 1,
         'I-PERSON': 5})

In [13]:
DATASET[1]

Full text: Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Frank Strauser â€“ go figure)
Spans: [Type: PERSON, value: Kotoya Negishi, start: 0, end: 14, Type: PERSON, value: Frank Strauser, start: 170, end: 184]
Tokens: [Kotoya, Negishi, listed, his, top, 20, songs, for, Entertainment, Weekly, and, had, the, balls, to, list, this, song, at, #, 15, ., (, What, did, he, put, at, #, 1, you, ask, ?, Answer:"Tube, Snake, Boogie, ", by, Frank, Strauser, â€, “, go, figure, )]
Tags: ['B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O']

In [14]:
#max length sentence
max([len(sample.tokens) for sample in DATASET])


79

Select models for evaluation:

In [15]:
crf_vanilla = "../../data/models/crf.pickle"
    
models = [crf_vanilla]

Run evaluation on all models:

In [16]:
from presidio_evaluator.crf_evaluator import CRFEvaluator

for model in models:
    print("-----------------------------------")
    print("Evaluating model {}".format(model))
    crf_evaluator = CRFEvaluator(model_pickle_path=model)
    evaluation_results = crf_evaluator.evaluate_all(DATASET)
    scores = crf_evaluator.calculate_score(evaluation_results)
    
    print("Confusion matrix:")
    print(scores.results)

    print("Precision and recall")
    scores.print()

-----------------------------------
Evaluating model ../../data/models/crf.pickle
Confusion matrix:
Counter({('O', 'O'): 4792, ('GPE', 'GPE'): 299, ('PERSON', 'PERSON'): 251, ('ORG', 'ORG'): 80, ('ORG', 'O'): 24, ('O', 'ORG'): 18, ('O', 'PERSON'): 14, ('PERSON', 'GPE'): 12, ('ORG', 'PERSON'): 10, ('PERSON', 'ORG'): 10, ('PERSON', 'O'): 8, ('O', 'GPE'): 6, ('ORG', 'GPE'): 5, ('GPE', 'O'): 3, ('GPE', 'PERSON'): 2})
Precision and recall
                        Entity                     Precision                        Recall
                           GPE                        98.36%                        92.86%
                        PERSON                        89.32%                        90.61%
                           ORG                        67.23%                        74.07%
                           PII                        95.03%                        94.63%
PII F measure: 0.9482636428065202


Evaluating <class 'presidio_evaluator.crf_evaluator.CRFEvaluator'>: 100%|██████████| 300/300 [00:00<00:00, 1556.27it/s]


#### Custom evaluation of the model

In [17]:
# Try out the model
def sent_to_features(model_path,sent):
    """
    Translates a sentence into a prediction using a saved CRF model
    """
    
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    
    tokenizer = spacy.blank('en')
    tokens = tokenizer(sent)
    tags = ['O' for token in tokens] # Placeholder: Not used but required. 
    metadata = {'Template#':1,'Gender':'1','Country':'2'} #Placeholder: Not used but required
    input_sample = InputSample(full_text=sent,masked="",spans=None,tokens=tokens,tags=tags,metadata=metadata,create_tags_from_span=False,)

    return CRFEvaluator.crf_predict(input_sample, model)

In [18]:
SENTENCE = "Michael is American"

sent_to_features(model_path=crf_vanilla, sent=SENTENCE)

['O', 'O', 'O']

#### False positives

1. Most false positive tokens:

In [19]:
errors = scores.model_errors

from presidio_evaluator import ModelEvaluator
ModelEvaluator.most_common_fp_tokens(errors)#[model_error for model_error in errors if model_error.error_type =='FP']


Most common false positive tokens:
[('Entertainment', 6), ('Weekly', 6), ('Answer:"Tube', 6), ('Snake', 6), ('Boogie', 6), (',', 3), ('Devil', 2), ('IL270126100000000544211', 2), ('and', 1)]
Example sentence with each FP token:
Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Frank Strauser â€“ go figure)
Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Frank Strauser â€“ go figure)
Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Frank Strauser â€“ go figure)
Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Fr

2. review false positives for entity 'PERSON'

In [20]:
fps_df = ModelEvaluator.get_fps_dataframe(errors,entity='PERSON')
fps_df[['full_text','token','prediction']]

Unnamed: 0,full_text,token,prediction
0,"Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Frank Strauser â€“ go figure)",Entertainment,PERSON
1,"Kotoya Negishi listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Frank Strauser â€“ go figure)",Weekly,PERSON
2,"Minik Jeremiassen listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Marisa Bisliev â€“ go figure)",Entertainment,PERSON
3,"Minik Jeremiassen listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Marisa Bisliev â€“ go figure)",Weekly,PERSON
4,"Nusa Weress listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Klimek Kozłowski â€“ go figure)",Entertainment,PERSON
5,"Nusa Weress listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Klimek Kozłowski â€“ go figure)",Weekly,PERSON
6,"Emilie Johansen listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Polona Ranković â€“ go figure)",Entertainment,PERSON
7,"Emilie Johansen listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:""Tube Snake Boogie"" by Polona Ranković â€“ go figure)",Weekly,PERSON
8,My IBAN is IL270126100000000544211,IL270126100000000544211,PERSON
9,My IBAN is IL270126100000000544211,IL270126100000000544211,PERSON


#### False negative examples

In [21]:
ModelEvaluator.most_common_fn_tokens(errors,n=50, entity='PERSON')

[('Souza', 1), ('Søren', 1), ('Victoria', 1), ('Charlotte', 1), ('Park', 1), ('tryggvadóttir', 1), ('erick', 1), ('searlait', 1), ('Marisa', 1), ('Martin', 1), ('Leidy', 1), ('Muris', 1), ('Aston', 1), ('Lind', 1), ('Houžvičková', 1), ('george', 1), ('schutt', 1), ('daniela', 1), ('jager', 1), ('zahra', 1), ('mattsson', 1), ('Raisová', 1), ('Kristian', 1), ('Enrico', 1), ('Tuomo', 1), ('Katie', 1), ('Miles', 1), ('Lewis', 1), ('Abbott', 1), ('den', 1)]
Token: Souza, Annotation: PERSON, Full text: Unlike the Souza novel, it's not about necrophilia. What it is about, I suppose is anyone's guess. A brilliant piece of baroque pop.
Token: Søren, Annotation: PERSON, Full text: I'm so jealous! said Donát to Søren
Token: Victoria, Annotation: PERSON, Full text: I'm so jealous! said Bárður to Victoria
Token: Charlotte, Annotation: PERSON, Full text: Dun Rite Lawn Care is the brainchild of our 3 founders: Charlotte Park, Oline Mikaelsen and Brodie Walker.  The idea was born (on the beach) while 

More FN analysis

In [22]:
fns_df = ModelEvaluator.get_fns_dataframe(errors,entity='PERSON')

In [23]:
fns_df[['full_text','token','annotation','prediction']]

Unnamed: 0,full_text,token,annotation,prediction
0,"Unlike the Souza novel, it's not about necrophilia. What it is about, I suppose is anyone's guess. A brilliant piece of baroque pop.",Souza,PERSON,O
1,I'm so jealous! said Donát to Søren,Søren,PERSON,GPE
2,I'm so jealous! said Bárður to Victoria,Victoria,PERSON,GPE
3,"Dun Rite Lawn Care is the brainchild of our 3 founders: Charlotte Park, Oline Mikaelsen and Brodie Walker. The idea was born (on the beach) while they were constructing a website to be the basis of another start-up idea.",Charlotte,PERSON,ORG
4,"Dun Rite Lawn Care is the brainchild of our 3 founders: Charlotte Park, Oline Mikaelsen and Brodie Walker. The idea was born (on the beach) while they were constructing a website to be the basis of another start-up idea.",Park,PERSON,ORG
5,"tryggvadóttir spent a year at rogers peet as the assistant to margrét tryggvadóttir, and the following year at big wheel in begonte, which later became movie gallery in 1965.",tryggvadóttir,PERSON,O
6,"erick shouted at searlait: ""what are you doing here?""",erick,PERSON,O
7,"erick shouted at searlait: ""what are you doing here?""",searlait,PERSON,O
8,"Marisa shouted at Martin: ""What are you doing here?""",Marisa,PERSON,O
9,"Marisa shouted at Martin: ""What are you doing here?""",Martin,PERSON,GPE
