# Evaluate Presidio Analyzer using the Presidio Evaluator framework

In this notebook, we will go through the following steps:

1. Import the evaluation dataset into an InputSample format
2. Run the inference and metric at the token level using the Evaluator class
3. Analyze the performance at the token level
4. Run the inference and metric at the span level using the SpanEvaluator class
5. Analyze the performance at the span level

In [10]:
from pathlib import Path
from copy import deepcopy
from pprint import pprint
from collections import Counter

from presidio_evaluator import InputSample
from presidio_evaluator.evaluation import Evaluator, ModelError, SpanEvaluator
from presidio_evaluator.models import PresidioAnalyzerWrapper
from presidio_evaluator.experiment_tracking import get_experiment_tracker

import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

%reload_ext autoreload
%autoreload 2

stanza and spacy_stanza are not installed
Flair is not installed by default
Flair is not installed
stanza and spacy_stanza are not installed
Flair is not installed by default
Flair is not installed
stanza and spacy_stanza are not installed
Flair is not installed by default
Flair is not installed


1. Load Evaluation Dataset into InputSample format:

In [13]:
dataset_name = "synth_dataset_v2.json"
dataset = InputSample.read_dataset_json(Path(Path.cwd().parent, "data", dataset_name))
print(len(dataset))

tokenizing input:   0%|          | 0/1500 [00:00<?, ?it/s]

loading model en_core_web_sm


tokenizing input: 100%|██████████| 1500/1500 [00:10<00:00, 148.99it/s]

1500





In [14]:
entity_counter = Counter()
for sample in dataset:
    for tag in sample.tags:
        entity_counter[tag] += 1

In [15]:
print("Count per entity:")
pprint(entity_counter.most_common())

print("\nExample sentence:")
print(dataset[1])

print("\nMin and max number of tokens in dataset:")
print(
    f"Min: {min([len(sample.tokens) for sample in dataset])}, "
    f"Max: {max([len(sample.tokens) for sample in dataset])}"
)

print("\nMin and max sentence length in dataset:")
print(
    f"Min: {min([len(sample.full_text) for sample in dataset])}, "
    f"Max: {max([len(sample.full_text) for sample in dataset])}"
)

Count per entity:
[('O', 19626),
 ('STREET_ADDRESS', 3071),
 ('PERSON', 1369),
 ('GPE', 521),
 ('ORGANIZATION', 504),
 ('PHONE_NUMBER', 350),
 ('DATE_TIME', 219),
 ('TITLE', 142),
 ('CREDIT_CARD', 136),
 ('US_SSN', 80),
 ('AGE', 74),
 ('NRP', 55),
 ('ZIP_CODE', 50),
 ('EMAIL_ADDRESS', 49),
 ('DOMAIN_NAME', 37),
 ('IP_ADDRESS', 22),
 ('IBAN_CODE', 21),
 ('US_DRIVER_LICENSE', 9)]

Example sentence:
Full text: What are my options?
Spans: []
Tokens: What are my options?
Tags: ['O', 'O', 'O', 'O', 'O']


Min and max number of tokens in dataset:
Min: 3, Max: 78

Min and max sentence length in dataset:
Min: 9, Max: 407


2. Run evaluation to report the performance metrics at token level

In [16]:
print("Evaluating Presidio Analyzer")

experiment = get_experiment_tracker()
model_name = "Presidio Analyzer"
model = PresidioAnalyzerWrapper()

evaluator = Evaluator(model=model)
dataset = Evaluator.align_entity_types(
    deepcopy(dataset), entities_mapping=PresidioAnalyzerWrapper.presidio_entities_map
)

evaluation_results = evaluator.evaluate_all(dataset)
results = evaluator.calculate_score(evaluation_results)

# update params tracking
params = {"dataset_name": dataset_name, "model_name": model_name}
params.update(model.to_log())
experiment.log_parameters(params)
experiment.log_dataset_hash(dataset)
experiment.log_metrics(results.to_log())
entities, confmatrix = results.to_confusion_matrix()
experiment.log_confusion_matrix(matrix=confmatrix, labels=entities)

print("Confusion matrix:")
print(pd.DataFrame(confmatrix, columns=entities, index=entities))

print("Precision and recall")
print(results)

# end experiment
experiment.end()

Evaluating Presidio Analyzer
Entities supported by this Presidio Analyzer instance:
AU_ABN, DATE_TIME, US_DRIVER_LICENSE, CREDIT_CARD, PHONE_NUMBER, CRYPTO, US_SSN, MEDICAL_LICENSE, IBAN_CODE, NRP, US_ITIN, UK_NHS, AU_TFN, LOCATION, AU_MEDICARE, US_PASSPORT, SG_NRIC_FIN, AU_ACN, EMAIL_ADDRESS, PERSON, US_BANK_NUMBER, IP_ADDRESS, URL


Evaluating <class 'presidio_evaluator.models.presidio_analyzer_wrapper.PresidioAnalyzerWrapper'>: 100%|██████████| 1500/1500 [00:15<00:00, 95.84it/s] 


Confusion matrix:
                   CREDIT_CARD  DATE_TIME  EMAIL_ADDRESS  IBAN_CODE  \
CREDIT_CARD                105          0              0          0   
DATE_TIME                    0        199              0          0   
EMAIL_ADDRESS                0          0             49          0   
IBAN_CODE                    0          0              0         20   
IP_ADDRESS                   0          2              0          0   
LOCATION                     0          2              0          0   
NRP                          0          0              0          0   
O                            0        612              0          0   
PERSON                       0          0              0          0   
PHONE_NUMBER                 0          9              0          0   
US_DRIVER_LICENSE            0          0              0          0   
US_SSN                       0          0              0          0   

                   IP_ADDRESS  LOCATION  NRP      O  PERSO

### 2. Analysis the performance of presidio at token level

In [None]:
sent = "I am taiwanese but I live in Cambodia."
# sent = input("Enter sentence: ")
model.predict(InputSample(full_text=sent))

In [9]:
errors = results.model_errors

In [10]:
errors

[<ModelError type: FN, Annotation = PERSON, prediction = O, Token = Szabina, Full text = Krisztián Szöllösy listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Szabina J Gelencsér ג€“ go figure), Metadata = None,
 <ModelError type: FN, Annotation = PERSON, prediction = O, Token = J, Full text = Krisztián Szöllösy listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Szabina J Gelencsér ג€“ go figure), Metadata = None,
 <ModelError type: FN, Annotation = PERSON, prediction = O, Token = Gelencsér, Full text = Krisztián Szöllösy listed his top 20 songs for Entertainment Weekly and had the balls to list this song at #15. (What did he put at #1 you ask? Answer:"Tube Snake Boogie" by Szabina J Gelencsér ג€“ go figure), Metadata = None,
 <ModelError type: FP, Annotation = O, prediction = LOCA

#### False positives

1. Most false positive tokens:

In [7]:
ModelError.most_common_fp_tokens(errors)

Most common false positive tokens:
[('the', 62), ('year', 56), ('morning', 33), ('old', 32), ('years', 27), ('-', 27), ('\n', 26), ('a', 23), ('last', 23), ('months', 20)]
Example sentence with each FP token:
My birthday is on the weekend. I'll turn 23.
This 79 year old female complaining of stomach pain.
Mr. Leiva flew to LEPPEN on Tuesday morning.
This 79 year old female complaining of stomach pain.
He just turned 69 years old
Karin M. Beike
Leasing consultant
Fritz-Armstrong
Đoko and ul. Miła 53
Date: 1978-04-13 12:20:39
Name: Toshimi Arata
Phone: 0490 75 40 81
follow up with patricia desrosiers in a couple of months.
The letter arrived at 6400 12 Timms Drive Suite 215 NUNGATTA Australia last night.
follow up with patricia desrosiers in a couple of months.


In [8]:
fps_df = ModelError.get_fps_dataframe(errors, entity=["LOCATION"])
fps_df[["full_text", "token", "prediction"]]

Unnamed: 0,full_text,token,prediction
0,"The Exversion Orchestra was founded in 1977. Since then, it has grown from a volunteer community orchestra to a fully professional orchestra serving Southern Tunisia",Southern,LOCATION
1,"card number 347415977307943 is lost, can you please send a new one to 14 Crown Street Kishiev Squares\n Suite 321\n LONDON\n United Kingdom 75419? I am in Sutri for a business trip",LONDON,LOCATION
2,"card number 347415977307943 is lost, can you please send a new one to 14 Crown Street Kishiev Squares\n Suite 321\n LONDON\n United Kingdom 75419? I am in Sutri for a business trip",United,LOCATION
3,"card number 347415977307943 is lost, can you please send a new one to 14 Crown Street Kishiev Squares\n Suite 321\n LONDON\n United Kingdom 75419? I am in Sutri for a business trip",Kingdom,LOCATION
4,"The Davis, Reynolds and Williamson Orchestra was founded in 1977. Since then, it has grown from a volunteer community orchestra to a fully professional orchestra serving Southern Italy",Southern,LOCATION
5,"Please update the billing address with 27534 Þorsteinsgata 63\nMOSS\n, nan\n 51971 for this card: 4119268469462942",,LOCATION
6,"Please update the billing address with 27534 Þorsteinsgata 63\nMOSS\n, nan\n 51971 for this card: 4119268469462942",\n,LOCATION
7,"Please update the billing address with 27534 Þorsteinsgata 63\nMOSS\n, nan\n 51971 for this card: 4119268469462942",51971,LOCATION
8,"I need to add my addresses, here they are: 370 3911 Fourth Avenue Suite 697 Apt. 397, Mora de Rubielos, Spain 16200, and the corner of Postbox 21 and Elisabeth Parks",Spain,LOCATION
9,"The address of Compliance And Risks is 9554 62 Rue Gafsa Apt. 981\nToronto\n, ON\n 82269",Toronto,LOCATION


2. False negative examples

In [None]:
ModelError.most_common_fn_tokens(errors, n=50, entity=["PERSON"])

More FN analysis

In [None]:
fns_df = ModelError.get_fns_dataframe(errors, entity=["PHONE_NUMBER"])

In [None]:
fns_df[["full_text", "token", "annotation", "prediction"]]

In [None]:
print("All errors:\n")
[print(error, "\n") for error in errors]

## 4. Run the inference and metric at the span level using the SpanEvaluator class

In [None]:
# Initialize a SpanEvaluator class
evaluator_span = SpanEvaluator(model=model, entities_to_keep = ["PERSON", "DATE_TIME", "LOCATION", "TITLE"])
evaluation_span = evaluator_span.evaluate_span(dataset)