# 🔒 NeMo Safe Synthesizer: PII Replacement Only

> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.

<br>

In this notebook, we demonstrate how to use the NeMo Microservices Python SDK to replace PII in a tabular dataset. The notebook should take about 15 minutes to run.

After completing this notebook, you'll be able to:
- **Use the NeMo Microservices SDK** to interact with Safe Synthesizer
- **Run a job to perform PII replacement only** (no novel data generation)


#### 💾 Install dependencies

Ensure you have a NeMo Microservices Platform deployment available. If you're using a managed or remote deployment, have the correct base URLs and tokens ready.

In [None]:
from nemo_microservices import NeMoMicroservices
from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder

import logging
logging.basicConfig(level=logging.WARNING)
logging.getLogger("httpx").setLevel(logging.WARNING)

### ⚙️ Initialize the NeMo Safe Synthesizer Client

- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.
- `http://localhost:8080` is the default URL for `base_url` in quickstart.
- If using a managed or remote deployment, ensure you use the correct base URLs and tokens.

In [None]:
client = NeMoMicroservices(
    base_url="http://localhost:8080",
)

NeMo DataStore is launched as one of the services. We'll use it to manage storage, so set the following:

In [None]:
datastore_config = {
    "endpoint": "http://localhost:3000/v1/hf",
    "token": "",
}

## 📥 Load input data

Safe Synthesizer processes your input dataset and returns the same rows with PII replaced. For this tutorial we load a small public sample dataset. Replace it with your own data if desired.

The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstorming. The text in each of the first three fields sometimes contains Personally Identifiable Information, such as names, birth dates, and locations.

In [None]:
import pandas as pd

df = pd.read_json(
    "hf://datasets/databricks/databricks-dolly-15k/databricks-dolly-15k.jsonl",
    lines=True,
)
print(df.head())

## 🏗️ Create a Safe Synthesizer job

The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.

This job will:
- Initialize the builder with the NeMo Microservices client.
- Use the loaded DataFrame as the input data source.
- Configure the job to use the specified datastore for model storage.
- Enable automatic replacement of personally identifiable information (PII).
- Submit the job to the microservices platform.

In [None]:
job = (
    SafeSynthesizerBuilder(client)
    .from_data_source(df)
    .with_datastore(datastore_config)
    .with_replace_pii()
    .create_job()
)

print(f"job_id = {job.job_id}")
job.wait_for_completion()

print(f"Job finished with status {job.fetch_status()}")

In [None]:
# If your notebook shuts down, it's okay, your job is still running on the microservices platform.
# You can get the same job object and interact with it again by uncommenting the following code
# snippet, and modifying it with the job id from the previous cell output.

# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob
# job = SafeSynthesizerJob(job_id="<job id>", client=client)

## 👀 View output data

After the job completes, fetch the output with PII replaced.

In [None]:
# Fetch the job output data with PII replaced
output_df = job.fetch_data()
output_df

## 📊 View PII report

A report summarizing the PII replacement is created automatically for every job.

You can download the full HTML report or display it inline below.

In [None]:
# Download the full evaluation report to your local machine
job.save_report("evaluation_report.html")

In [None]:
# Fetch and display the full evaluation report inline
job.display_report_in_notebook()