<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/safe-synthetics/free-text-transform-synthesize-dp.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# ✍ Using Safe Synthetics for free text data

This notebook leverages tailor-made configurations for Safe Synthetics on free text data. You can try with the sample dataset, or test it out using your own dataset.

After specifying a dataset, this notebook will holdout 5% to use for calculating quality & privacy metrics at the end. It will then redact true identifiers in your dataset such as names and addresses, and synthesize your data to obfuscate quasi-identifiers. While synthesizing, it will apply differential privacy to provide mathematical guarantees of privacy. Finally, it will generate a report for you to measure the quality & privacy of your synthetic data.

## 💾 Install Gretel SDK

In [None]:
%%capture

%pip install -U git+https://github.com/gretelai/gretel-python-client@main

## 🌐 Configure your Gretel Session

In [None]:
from gretel_client.navigator_client import Gretel

gretel = Gretel(api_key="prompt")

## 🔬 Preview input data

In [None]:
import pandas as pd
ds = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/ontonotes5-replaced-names.csv"
df = pd.read_csv(ds)
test_df = None

df = df.head(5000)

print(f"Number of rows: {len(df)}")
df.head()

## 🏃 Run Safe Synthetics

In [None]:
import yaml

synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(df) \
    .transform("transform/ner_only") \
    .synthesize("text_ft/differential_privacy", num_records=1000) \
    .create()

In [None]:
synthetic_dataset.wait_until_done()

## 🔬 Preview output data

In [None]:
synthetic_dataset.dataset.df.head()

## 📊 Evaluate quality & privacy results

In [None]:
synthetic_dataset.report.table

In [None]:
synthetic_dataset.report.display_in_notebook()