# Overview

This notebook provides an overview of the package and its functionality.

In [1]:
# used to hide warnings
import warnings

warnings.filterwarnings("ignore")

Let us first define the text, from which we will showcase the package's functionality.

In [2]:
original_text = """\
Medical Record

Patient Name: John Doe
Date of Birth: 15-01-1985
Date of Examination: 20-05-2024
Social Security Number: 123-45-6789

Examination Procedure:
John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.

Medication Prescribed:

Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024
"""

## Extract personal information from text

The `anonipy` has implemented entity extraction components, that can be used to extract personal information from text.

More can be found in the chapter [Extractors](/documentation/notebooks/01-extractors).

### Language detector

In [3]:
from anonipy.utils.language_detector import LanguageDetector
lang_detector = LanguageDetector()

In [4]:
# identify the language of the original text
language = lang_detector(original_text)
language

('en', 'English')

### Extract personal information

In [5]:
from anonipy.anonymize.extractors import EntityExtractor

In [6]:
# define the labels to be extracted and anonymized
labels = [
    {"label": "name", "type": "string"},
    {"label": "social security number", "type": "custom"},
    {"label": "date of birth", "type": "date"},
    {"label": "date", "type": "date"},
]

In [7]:
# language taken from the language detector
entity_extractor = EntityExtractor(labels, lang=language, score_th=0.5)

In [8]:
# extract the entities from the original text
doc, entities = entity_extractor(original_text)

In [9]:
# display the entities in the original text
entity_extractor.display(doc)

## Anonymize the original text

The `anonipy` has implemented generators for different types of information, that can be used 
to generate replacements for the original text.

More on generators can be found in the chapter [Generators](/documentation/notebooks/02-generators),
while chapter [Strategies](/documentation/notebooks/03-strategies) provides strategies for anonymizing
the original text.

### Prepare generators for generating replacements

In [10]:
from anonipy.anonymize.generators import (
    LLMLabelGenerator,
    DateGenerator,
    NumberGenerator,
)

In [11]:
# initialize the generators
llm_generator = LLMLabelGenerator()
date_generator = DateGenerator()
number_generator = NumberGenerator()

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [12]:
# prepare the anonymization mapping
def anonymization_mapping(text, entity):
    if entity.type == "string":
        return llm_generator.generate(entity, temperature=0.7)
    if entity.label == "date":
        return date_generator.generate(entity, output_gen="middle_of_the_month")
    if entity.label == "date of birth":
        return date_generator.generate(entity, output_gen="middle_of_the_year")
    if entity.label == "social security number":
        return number_generator.generate(entity)
    return "[REDACTED]"

### Anonymize the original text

In [13]:
from anonipy.anonymize.strategies import PseudonymizationStrategy

In [14]:
# initialize the pseudonymization strategy
pseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)

In [15]:
# anonymize the original text
anonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)

The anonymized text is:

In [16]:
print(anonymized_text)

Medical Record

Patient Name: Ethan Thompson
Date of Birth: 01-07-1985
Date of Examination: 15-05-2024
Social Security Number: 867-38-6549

Examination Procedure:
Ethan Thompson underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.

Medication Prescribed:

Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024



And the associated replacements are:

In [17]:
replacements

[{'original_text': 'John Doe',
  'start_index': 30,
  'end_index': 38,
  'anonymized_text': 'Ethan Thompson'},
 {'original_text': '15-01-1985',
  'start_index': 54,
  'end_index': 64,
  'anonymized_text': '01-07-1985'},
 {'original_text': '20-05-2024',
  'start_index': 86,
  'end_index': 96,
  'anonymized_text': '15-05-2024'},
 {'original_text': '123-45-6789',
  'start_index': 121,
  'end_index': 132,
  'anonymized_text': '867-38-6549'},
 {'original_text': 'John Doe',
  'start_index': 157,
  'end_index': 165,
  'anonymized_text': 'Ethan Thompson'},
 {'original_text': '15-11-2024',
  'start_index': 717,
  'end_index': 727,
  'anonymized_text': '15-11-2024'}]