# Esempio anonimizzazione dati con MS Presidio

In questo notebook sono indicati degli esempi di utilizzo di MS Presidio per intercettare e anonimizzare dati PII

## Caricamento librerie ed import recognizers

In [7]:
!pip install presidio-analyzer
!pip install presidio_anonymizer



In [8]:
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_analyzer.nlp_engine import NlpEngineProvider
from presidio_analyzer import PatternRecognizer, Pattern
from presidio_anonymizer.entities import OperatorConfig, RecognizerResult

## Configurazione della lingua

In [6]:
# NLP engine configuration (Spacy)
nlp_config = {
    "nlp_engine_name": "spacy",
    "models": [{"lang_code": "it", "model_name": "it_core_news_lg"}],
}
provider = NlpEngineProvider(nlp_configuration=nlp_config)
nlp_engine_with_italian = provider.create_engine()




[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('it_core_news_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


## Definizione di un custom pattern per le targhe italiane

In [9]:
# Custom recognizer for Italian vehicle plates
plate_pattern = Pattern(name="IT_VEHICLE_PLATE", regex=r"\b([A-Z]{2}\d{3}[A-Z]{2}|\d{2}[A-Z]{2}\d{2}|[A-Z]{2}\d{5}|\d{2}[A-Z]{3}\d{2})\b", score=0.8)

plate_recognizer = PatternRecognizer(patterns=[plate_pattern], supported_entity="IT_VEHICLE_PLATE", name="IT_VEHICLE_PLATE", supported_language="it")


## Inizializzazione di Presidio

In [10]:
# Initialize Presidio with the custom recognizer
analyzer = AnalyzerEngine(
    supported_languages=["en", "it"],
    nlp_engine=nlp_engine_with_italian,
)
analyzer.registry.add_recognizer(plate_recognizer)

anonymizer = AnonymizerEngine()

# Entities to detect (including IT_VEHICLE_PLATE)
entities = [
    "IT_VEHICLE_PLATE",  # Include Italian vehicle plates
    "IT_FISCAL_CODE",
    "IT_DRIVER_LICENSE",
    "IT_VAT_CODE",
    "IT_PASSPORT",
    "IT_IDENTITY_CARD",
    "CREDIT_CARD",
    "DATE_TIME",
    "EMAIL_ADDRESS",
    "IBAN_CODE",
    "PERSON",
    "PHONE_NUMBER",
]



## Analyzer & Anonymizer

In [18]:
# Example text
text = "La targa della mia auto è AB123CD e il codice fiscale di Cristiano Sticca nato il 15/01/1983 è STCCST83A15L113V."

# Text analysis
results = analyzer.analyze(text=text, entities=entities, language="it")

for result in results:
    print(f"Entity: {result.entity_type}, Start: {result.start}, End: {result.end}, Score: {result.score}")
    print(f"Extracted text: '{text[result.start:result.end]}'")
    print("--------------------------------")
    print("--------------------------------")

# Text anonymization
anonymized_result = anonymizer.anonymize(text, results)

# Print anonymized text
print(anonymized_result.text)
print("--------------------------------")
print("--------------------------------")

# Define anonymization operators
operators = {
    "IT_VEHICLE_PLATE": OperatorConfig(
        "mask",
        {
            "type": "mask",
            "masking_char": "*",
            "chars_to_mask": 2,
            "from_end": True,
        },
    ),
    "IT_FISCAL_CODE": OperatorConfig(
        "mask",
        {
            "type": "mask",
            "masking_char": "*",
            "chars_to_mask": 10,
            "from_end": True,
        },
    ),
}

# Text anonymization
anonymized_result = anonymizer.anonymize(text, results,operators=operators)

print("Testo offuscato con caratteri *")

# Print anonymized text
print(anonymized_result.text)




Entity: IT_FISCAL_CODE, Start: 42, End: 58, Score: 1.0
Extracted text: 'STCCST83A15L113V'
--------------------------------
--------------------------------
Entity: IT_VEHICLE_PLATE, Start: 0, End: 7, Score: 0.8
Extracted text: 'AB123CD'
--------------------------------
--------------------------------
Entity: DATE_TIME, Start: 31, End: 41, Score: 0.6
Extracted text: '15/01/1983'
--------------------------------
--------------------------------
<IT_VEHICLE_PLATE> Cristiano Nato nato il <DATE_TIME> <IT_FISCAL_CODE>.
--------------------------------
--------------------------------
Testo offuscato con caratteri *
AB123** Cristiano Nato nato il <DATE_TIME> STCCST**********.
