GitHub - eriknovak/anonipy: Data anonymization package, supporting different anonymization strategies

Data anonymization package, supporting different anonymization strategies

Documentation: https://eriknovak.github.io/anonipy

Source code: https://github.com/eriknovak/anonipy

The anonipy package is a python package for data anonymization. It is designed to be simple to use and highly customizable, supporting different anonymization strategies. Powered by LLMs.

✅ Requirements

Before starting the project make sure these requirements are available:

python. The python programming language (v3.8, v3.9, v3.10, v3.11).

💾 Install

pip install anonipy

⬆️ Upgrade

pip install anonipy --upgrade

🔎 Example

The details of the example can be found in the Overview.

original_text = """\
Medical Record

Patient Name: John Doe
Date of Birth: 15-01-1985
Date of Examination: 20-05-2024
Social Security Number: 123-45-6789

Examination Procedure:
John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.

Medication Prescribed:

Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024
"""

Use the language detector to detect the language of the text:

from anonipy.utils.language_detector import LanguageDetector

lang_detector = LanguageDetector()
language = lang_detector(original_text)

Prepare the entity extractor and extract the personal infomation from the original text:

from anonipy.anonymize.extractors import EntityExtractor

# define the labels to be extracted and anonymized
labels = [
    {"label": "name", "type": "string"},
    {"label": "social security number", "type": "custom"},
    {"label": "date of birth", "type": "date"},
    {"label": "date", "type": "date"},
]

# language taken from the language detector
entity_extractor = EntityExtractor(labels, lang=language, score_th=0.5)

# extract the entities from the original text
doc, entities = entity_extractor(original_text)

# display the entities in the original text
entity_extractor.display(doc)

Use generators to create substitutes for the entities:

from anonipy.anonymize.generators import (
    LLMLabelGenerator,
    DateGenerator,
    NumberGenerator,
)

# initialize the generators
llm_generator = LLMLabelGenerator()
date_generator = DateGenerator()
number_generator = NumberGenerator()

# prepare the anonymization mapping
def anonymization_mapping(text, entity):
    if entity.type == "string":
        return llm_generator.generate(entity, temperature=0.7)
    if entity.label == "date":
        return date_generator.generate(entity, output_gen="middle_of_the_month")
    if entity.label == "date of birth":
        return date_generator.generate(entity, output_gen="middle_of_the_year")
    if entity.label == "social security number":
        return number_generator.generate(entity)
    return "[REDACTED]"

Anonymize the text using the anonymization mapping:

from anonipy.anonymize.strategies import PseudonymizationStrategy

# initialize the pseudonymization strategy
pseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)

# anonymize the original text
anonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)

📖 Acknowledgements

Anonipy is developed by the Department for Artificial Intelligence at the Jozef Stefan Institute, and other contributors.

The project has received funding from the European Union's Horizon Europe research and innovation programme under Grant Agreement No 101080288 (PREPARE).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
anonipy		anonipy
docs		docs
test		test
.githooks.ini		.githooks.ini
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✅ Requirements

💾 Install

⬆️ Upgrade

🔎 Example

📖 Acknowledgements

About

Releases 8

Packages

Languages

License

eriknovak/anonipy

Folders and files

Latest commit

History

Repository files navigation

✅ Requirements

💾 Install

⬆️ Upgrade

🔎 Example

📖 Acknowledgements

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages