Evaluate Azure Text Analytics for PII detection using the Presidio Evaluator framework

Prerequisites: 
 - Azure subscription
 - Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
 - You'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
 - You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.
 - To use the Analyze feature, you'll need a Language resource with the standard (S) pricing tier.

In [1]:
from pathlib import Path
from copy import deepcopy
from pprint import pprint
from collections import Counter

from presidio_evaluator import InputSample
from presidio_evaluator.evaluation import Evaluator, ModelError
from presidio_evaluator.models import TextAnalyticsWrapper
from presidio_evaluator.experiment_tracking import get_experiment_tracker
import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

%reload_ext autoreload
%autoreload 2

stanza and spacy_stanza are not installed
Flair is not installed by default
Flair is not installed


Select data for evaluation

In [17]:
dataset_name = "synth_dataset_v2.json"
dataset = InputSample.read_dataset_json(Path(Path.cwd().parent.parent, "data", dataset_name))
print(len(dataset))

tokenizing input: 100%|██████████| 1500/1500 [00:09<00:00, 153.03it/s]

1500





In [4]:
entity_counter = Counter()
for sample in dataset:
    for tag in sample.tags:
        entity_counter[tag] += 1

Dataset exploration

In [13]:
print("Count per entity:")
pprint(entity_counter.most_common())

print("\nExample sentence:")
print(dataset[1])

print("\nMin and max number of tokens in dataset:")
print(
    f"Min: {min([len(sample.tokens) for sample in dataset])}, "
    f"Max: {max([len(sample.tokens) for sample in dataset])}"
)

print("\nMin and max sentence length in dataset:")
print(
    f"Min: {min([len(sample.full_text) for sample in dataset])}, "
    f"Max: {max([len(sample.full_text) for sample in dataset])}"
)

Count per entity:
[('O', 19626),
 ('STREET_ADDRESS', 3071),
 ('PERSON', 1369),
 ('GPE', 521),
 ('ORGANIZATION', 504),
 ('PHONE_NUMBER', 350),
 ('DATE_TIME', 219),
 ('TITLE', 142),
 ('CREDIT_CARD', 136),
 ('US_SSN', 80),
 ('AGE', 74),
 ('NRP', 55),
 ('ZIP_CODE', 50),
 ('EMAIL_ADDRESS', 49),
 ('DOMAIN_NAME', 37),
 ('IP_ADDRESS', 22),
 ('IBAN_CODE', 21),
 ('US_DRIVER_LICENSE', 9)]

Example sentence:
Full text: What are my options?
Spans: []
Tokens: What are my options?
Tags: ['O', 'O', 'O', 'O', 'O']


Min and max number of tokens in dataset:
Min: 3, Max: 78

Min and max sentence length in dataset:
Min: 9, Max: 407


Run evaluation

In [8]:
model_name = "Text analytics Analyzer"
# Paste your Azure Text Analytics key and endpoint here
key = "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
endpoint = "https://xxxxxxxxxxx.cognitiveservices.azure.com/"
model = TextAnalyticsWrapper(ta_key=key, ta_endpoint=endpoint)

In [18]:
print("Evaluating Azure Text Analytics.")

experiment = get_experiment_tracker()

# Mapping from dataset Entities to Text Analytics Entities. 
# All supported PII entity categories in Text Analytics are listed in this link: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/personally-identifiable-information/concepts/conversations-entity-categories
i2b2_entities_to_text_analytics =  {"PERSON":"Person",
                                "STREET_ADDRESS":"Address",
                                "GPE": "O",
                                "PHONE_NUMBER":"PhoneNumber",
                                "ORGANIZATION":"Organization",
                                "DATE_TIME": "DateTime",
                                "TITLE":"O",
                                "CREDIT_CARD":"CreditCardNumber",
                                "US_SSN":"USSocialSecurityNumber",
                                "AGE": "Age",
                                "NRP":"O",
                                "ZIP_CODE":"O",
                                "EMAIL_ADDRESS":"Email",
                                "DOMAIN_NAME":"URL",
                                "IP_ADDRESS":"IPAddress",
                                "IBAN_CODE":"InternationalBankingAccountNumber",   
                                "US_DRIVER_LICENSE":"USDriversLicenseNumber"
                                }
evaluator = Evaluator(model=model)
dataset_ = Evaluator.align_entity_types(
    deepcopy(dataset), entities_mapping=i2b2_entities_to_text_analytics
)

evaluation_results = evaluator.evaluate_all(dataset_)
results = evaluator.calculate_score(evaluation_results)

# update params tracking
params = {"dataset_name": dataset_name, "model_name": model_name}
params.update(model.to_log())
experiment.log_parameters(params)
experiment.log_dataset_hash(dataset)
experiment.log_metrics(results.to_log())
entities, confmatrix = results.to_confusion_matrix()
experiment.log_confusion_matrix(matrix=confmatrix, labels=entities)

# end experiment
experiment.end()

Evaluating Azure Text Analytics.


Evaluating <class 'presidio_evaluator.models.text_analytics_wrapper.TextAnalyticsWrapper'>: 100%|██████████| 1500/1500 [01:36<00:00, 15.61it/s]

saving experiment data to experiment_20221125-162355.json





In [19]:
print("Confusion matrix:")
print(pd.DataFrame(confmatrix, columns=entities, index=entities))

Confusion matrix:
                                   Address  Age  CreditCardNumber  DateTime  \
Address                               1522    0                 0         9   
Age                                      0    0                 0         0   
CreditCardNumber                         0    0                70         0   
DateTime                                 0    0                 0       219   
Email                                    0    0                 0         0   
IPAddress                                0    0                 0         0   
InternationalBankingAccountNumber        0    0                 0         0   
O                                      110    0                 0       395   
Organization                             1    0                 0         0   
Person                                   0    0                 0         0   
PhoneNumber                              0    0                 0         3   
URL                               

In [20]:
print("Precision and recall")
print(results)

Precision and recall
              Entity           Precision              Recall   Number of samples
              Person              87.81%              97.88%                1369
                 Age                nan%               0.00%                  74
               Email             100.00%              57.14%                  49
                 URL             100.00%             100.00%                  37
InternationalBankingAccountNumber             100.00%             100.00%                  21
        Organization              67.65%              77.58%                 504
             Address              93.20%              49.56%                3071
USSocialSecurityNumber             100.00%             100.00%                  80
    CreditCardNumber             100.00%              51.47%                 136
           IPAddress              91.67%             100.00%                  22
            DateTime              34.98%             100.00%             