### Presidio PII Detection
##### Presidio is an open-source PII detection and anonymization library developed by Microsoft.
##### This script makes use of the Presidio Analyzer Engine to detect PII in the given text and print the detected entities. Then, it uses the Presidio Anonymizer Engine to replace the detected PII with the <REDACTED> placeholder.

In [7]:
# To use Presidio, the analyzer and anonymizer need to be installed first. 
!pip install presidio-analyzer 
!pip install presidio-anonymizer
!pip install spacy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [9]:
!python -m spacy download en_core_web_lg

2023-03-17 16:35:06.281852: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-17 16:35:06.281944: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-17 16:35:07.952656: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-lg==3.4.1
  Downloading https://github.com/explosion/spacy-models/releases/download

In [13]:
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
from presidio_analyzer import AnalyzerEngine, PatternRecognizer
from presidio_analyzer.nlp_engine import NlpEngineProvider

# Define the text to analyze
text = "My name is John Doe, and my phone number is 123-456-7890. My email is john@example.com."

# Set up the AnalyzerEngine
nlp_engine_provider = NlpEngineProvider()
nlp_engine = nlp_engine_provider.create_engine()
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["en"])

# Analyze the text
analyzer_results = analyzer_engine.analyze(text=text, language='en', entities=[], return_decision_process=False)

# Set up the AnonymizerEngine
anonymizer_engine = AnonymizerEngine()

# Define the operators to use for anonymization
operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "<NAME>"}),
    "PHONE_NUMBER": OperatorConfig("replace", {"new_value": "<PHONE>"}),
    "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
}

# Anonymize the text
anonymized_result = anonymizer_engine.anonymize(text=text, analyzer_results=analyzer_results, operators=operators)

print(anonymized_result)




text: My name is <NAME>, and my phone number is <PHONE>. My email is <EMAIL>.
items:
[
    {'start': 63, 'end': 70, 'entity_type': 'EMAIL_ADDRESS', 'text': '<EMAIL>', 'operator': 'replace'},
    {'start': 42, 'end': 49, 'entity_type': 'PHONE_NUMBER', 'text': '<PHONE>', 'operator': 'replace'},
    {'start': 11, 'end': 17, 'entity_type': 'PERSON', 'text': '<NAME>', 'operator': 'replace'}
]

