# GDPR Anonymization Example

This notebook uses generative AI-based models to anonymize sensitive data to meet GDPR standards. Try running the example below and compare the synthetic data vs the real world data on the example datasets.

**Example datasets provided:**
* Google Meet logs
* E-commerce dataset of bike orders
* Electronic health records



In [None]:
# Install dependencies

%%capture
!git clone https://github.com/gretelai/gdpr-helpers.git
!cd gdpr-helpers; pip install -Uqq .

import os
if not os.getcwd().endswith('gdpr-helpers'):
    os.chdir('gdpr-helpers')

# Anonymize a directory of CSV files
1. Detect and remove PII automatically using named entity recognition.
2. Train synthetic model and generate new anonymized, synthetic dataset that is resistant to privacy attacks.
3. Review output reports, and update the transform and synthetics configs as necessary for your use cases.

In [None]:
import glob
from gdpr_helpers import Anonymizer

search_pattern = "data/*.csv"

am = Anonymizer(
    project_name="gdpr-workflow",
    run_mode="cloud",
    transforms_config="src/config/transforms_config.yaml",
    synthetics_config="synthetics/tabular-actgan" # use synthetics/amplify for CPU
    )

for dataset_path in glob.glob(search_pattern):
    am.anonymize(dataset_path=dataset_path)

In [None]:
# Download the anonymized datasets
import shutil
from google.colab import files

output_filename = 'anonymized_data'
print(shutil.make_archive(output_filename, 'zip', "artifacts"))

files.download(f"{output_filename}.zip")