# 06: Private Evolution for Tabular Data

This notebook implements Private Evolution (PE) adapted for the DCA telemetry wide table, following Lin et al. (2024) and Swanberg et al. (2025). Instead of training a generative model with DP-SGD, PE uses black-box API access to a foundation model (GPT-5 nano) and a DP nearest-neighbor histogram to iteratively select synthetic candidates that best approximate the real data distribution.

## Outline

1. Load the wide training table (from notebook 05)
2. Configure PE parameters and privacy budget
3. Run PE (RANDOM_API -> DP histogram -> selection -> VARIATION_API)
4. Decompose synthetic wide table into reporting tables
5. Run benchmark queries and compare with ground truth and DP-SGD results

In [1]:
import sys
import os
from pathlib import Path

import numpy as np
import pandas as pd
from IPython.display import display, Markdown
from dotenv import load_dotenv

load_dotenv(Path("../.env"))

sys.path.insert(0, str(Path("..").resolve()))

REPORTING = Path("../data/reporting")
QUERIES_DIR = Path("../docs/queries")
REAL_RESULTS = Path("../data/results/real")
PE_REPORTING = Path("../data/reporting/pe")
PE_RESULTS = Path("../data/results/pe")
MODEL = "gpt-5-nano"

---
## Step 1: Load the wide training table

In [2]:
wide = pd.read_parquet(REPORTING / "wide_training_table.parquet")

cat_cols = ["chassistype", "countryname_normalized", "modelvendor_normalized",
            "os", "cpuname", "cpucode", "cpu_family", "persona", "processornumber"]
numeric_cols = [c for c in wide.columns if c != "guid" and c not in cat_cols]

display(Markdown(
    f"Wide table: {len(wide):,} rows x {len(wide.columns)} columns\n\n"
    f"Categorical: {len(cat_cols)} columns, Numeric: {len(numeric_cols)} columns"
))

Wide table: 1,000,000 rows x 69 columns

Categorical: 9 columns, Numeric: 59 columns

---
## Step 2: Configure PE and privacy budget

Following Swanberg et al. (2025), we use T=1 iteration as the primary setting (their finding that T=1 is optimal for tabular PE). We match the DP-SGD privacy budget: epsilon=4.0, delta=1e-5.

The noise multiplier sigma is calibrated via the analytic Gaussian mechanism (Balle and Wang, 2018) with adaptive composition (Dong et al., 2019): T iterations with noise sigma each compose to a single Gaussian mechanism with effective sensitivity sqrt(T).

In [3]:
from src.pe.privacy import calibrate_sigma, compute_epsilon

N_SYNTH = 50000
T = 1
L = 3
EPSILON = 4.0
DELTA = 1e-5
MODEL = "gpt-5-nano"

sigma = calibrate_sigma(EPSILON, DELTA, T)

display(Markdown(
    f"PE configuration:\n\n"
    f"- Model: `{MODEL}`\n"
    f"- N_synth: {N_SYNTH:,}\n"
    f"- T (iterations): {T}\n"
    f"- L (variations per candidate + 1): {L}\n"
    f"- Target epsilon: {EPSILON}, delta: {DELTA}\n"
    f"- Calibrated sigma: {sigma:.4f}\n"
    f"- Initial population: {N_SYNTH * L:,} (N_synth x L)\n"
    f"- Privacy guarantee: (epsilon={EPSILON}, delta={DELTA})-DP via analytic Gaussian mechanism"
))

PE configuration:

- Model: `gpt-5-nano`
- N_synth: 50,000
- T (iterations): 1
- L (variations per candidate + 1): 3
- Target epsilon: 4.0, delta: 1e-05
- Calibrated sigma: 1.0812
- Initial population: 150,000 (N_synth x L)
- Privacy guarantee: (epsilon=4.0, delta=1e-05)-DP via analytic Gaussian mechanism

---
## Step 3: Run Private Evolution

The PE loop:
1. RANDOM_API generates 150,000 initial candidates (N_synth x L = 50K x 3)
2. Each of the 1M real records votes for its nearest synthetic candidate under the workload-aware distance
3. Gaussian noise (sigma) is added to the histogram to ensure DP
4. Top 50,000 candidates are selected by rank

With T=1, there is no VARIATION_API call (selection is the final step).

In [4]:
import importlib
import src.pe.api, src.pe.distance, src.pe.privacy, src.pe.histogram
importlib.reload(src.pe.api)
importlib.reload(src.pe.distance)
importlib.reload(src.pe.privacy)
importlib.reload(src.pe.histogram)
from src.pe.histogram import private_evolution
from src.pe.api import PEApi

api = PEApi(wide, model=MODEL, max_concurrent=50)

USE_BATCH = True
WORK_DIR = Path("../data/batch_jobs")
CHECKPOINT_DIR = Path("../data/pe_checkpoints")
WORK_DIR.mkdir(parents=True, exist_ok=True)
CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)

synth_wide, pe_history = await private_evolution(
    real_df=wide,
    api=api,
    n_synth=N_SYNTH,
    T=T,
    L=L,
    epsilon=EPSILON,
    delta=DELTA,
    real_chunk=5000,
    synth_chunk=10000,
    batch_size=10,
    variation_batch_size=5,
    use_batch=USE_BATCH,
    work_dir=WORK_DIR,
    checkpoint_dir=CHECKPOINT_DIR,
)

display(Markdown(
    f"PE complete:\n\n"
    f"- Synthetic records: {len(synth_wide):,}\n"
    f"- Total time: {pe_history['total_time']:.1f}s\n"
    f"- Actual epsilon: {pe_history['actual_epsilon']:.4f}\n"
    f"- Sigma: {pe_history['sigma']:.4f}\n"
    f"- Mode: {'Batch API (50% cheaper)' if USE_BATCH else 'Realtime API'}"
))

PE config: N_synth=50000, T=1, L=3, epsilon=4.0, delta=1e-05, sigma=1.0812, voting_records=1,000,000, mode=Batch API (50% cheaper)

Resuming from checkpoint: stage=population_generated, iteration=-1
Loaded population from checkpoint: 150000 records

--- Iteration 1/1 ---
Computing DP nearest-neighbor histogram (1000000 real x 150000 synth)...
  NN progress: 20/200 chunks (10%)
  NN progress: 40/200 chunks (20%)
  NN progress: 60/200 chunks (30%)
  NN progress: 80/200 chunks (40%)
  NN progress: 100/200 chunks (50%)
  NN progress: 120/200 chunks (60%)
  NN progress: 140/200 chunks (70%)
  NN progress: 160/200 chunks (80%)
  NN progress: 180/200 chunks (90%)
  NN progress: 200/200 chunks (100%)
Histogram computed in 3555.2s
Nonzero bins: 81499/150000
Selected top 50000 candidates (0.1s)

PE complete: 50000 synthetic records in 3555.4s
Actual epsilon: 4.0000


PE complete:

- Synthetic records: 50,000
- Total time: 3555.4s
- Actual epsilon: 4.0000
- Sigma: 1.0812
- Mode: Batch API (50% cheaper)

In [5]:
synth_wide.to_parquet(REPORTING / "pe_wide_table.parquet", index=False)

display(Markdown(f"Saved PE synthetic wide table: {len(synth_wide):,} rows x {len(synth_wide.columns)} columns"))
display(synth_wide.head())

Saved PE synthetic wide table: 50,000 rows x 69 columns

Unnamed: 0,guid,chassistype,countryname_normalized,modelvendor_normalized,os,cpuname,cpucode,cpu_family,persona,processornumber,...,psys_rap_nrs,psys_rap_avg,pkg_c0_nrs,pkg_c0_avg,avg_freq_nrs,avg_freq_avg,temp_nrs,temp_avg,pkg_power_nrs,pkg_power_avg
0,pe_0000000,Notebook,Other,Gigabyte,Win10,7th Gen i5,i5-7500U,Core i5,Casual User,14 nm,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,pe_0000001,Desktop,Other,Unknown,Win10,3rd Gen i5,i5-7200U,Core i5,Office/Productivity,14 nm,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,pe_0000002,Notebook,Other,Gigabyte,Win10,4th Gen i5,i5-6200U,Core i5,Casual User,22 nm,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,pe_0000003,Intel NUC/STK,United States of America,Intel,Win10,6th Gen i5,i5-6200U,Core i5,Office/Productivity,45 nm,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,pe_0000004,Notebook,China,Lenovo,Win10,8th Gen i5,i5-8250U,Core i5,Web User,14 nm,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Inspect sparsity patterns

A key question: does the LLM generate realistic sparsity patterns?

In [6]:
sparsity_rows = []
for c in numeric_cols:
    real_nz = (wide[c] > 0).mean() * 100
    synth_nz = (synth_wide[c] > 0).mean() * 100 if c in synth_wide.columns else 0
    sparsity_rows.append({"column": c, "real_nonzero_pct": round(real_nz, 1), "synth_nonzero_pct": round(synth_nz, 1)})

sparsity_df = pd.DataFrame(sparsity_rows)
display(Markdown("Nonzero percentage comparison (real vs PE synthetic):"))
display(sparsity_df)

Nonzero percentage comparison (real vs PE synthetic):

Unnamed: 0,column,real_nonzero_pct,synth_nonzero_pct
0,ram,99.9,100.0
1,net_nrs,3.7,0.1
2,net_received_bytes,3.7,0.1
3,net_sent_bytes,3.7,0.1
4,mem_nrs,6.9,0.3
5,mem_avg_pct_used,6.9,0.3
6,mem_sysinfo_ram,6.9,0.2
7,batt_num_power_ons,2.0,0.0
8,batt_duration_mins,2.0,0.0
9,web_chrome_duration,5.3,12.3


---
## Step 4: Decompose into reporting tables

In [7]:
from src.eval.decompose import decompose_wide_table

counts = decompose_wide_table(synth_wide, PE_REPORTING)

rows = "\n".join(f"| {t} | {c:,} |" for t, c in counts.items())
display(Markdown(f"Decomposed into {len(counts)} synthetic reporting tables:\n\n| Table | Rows |\n|---|---|\n{rows}"))

Decomposed into 12 synthetic reporting tables:

| Table | Rows |
|---|---|
| sysinfo | 50,000 |
| network_consumption | 140 |
| memory_utilization | 133 |
| system_psys_rap_watts | 1,165 |
| system_pkg_C0 | 564 |
| system_pkg_avg_freq_mhz | 465 |
| system_pkg_temp_centigrade | 364 |
| system_hw_pkg_power | 292 |
| batt_dc_events | 20 |
| web_cat_usage | 6,927 |
| web_cat_pivot_duration | 1,249 |
| on_off_suspend_time_day | 238 |

---
## Step 5: Benchmark evaluation

Run the same 8 benchmark queries evaluated for DP-SGD.

In [8]:
from src.eval.benchmark import run_benchmark

eval_queries = [
    "avg_platform_power_c0_freq_temp_by_chassis",
    "Xeon_network_consumption",
    "pkg_power_by_country",
    "ram_utilization_histogram",
    "battery_power_on_geographic_summary",
    "persona_web_cat_usage_analysis",
    "popular_browsers_by_count_usage_percentage",
    "most_popular_browser_in_each_country_by_system_count",
]

pe_results = run_benchmark(eval_queries, QUERIES_DIR, PE_REPORTING, PE_RESULTS)

display(Markdown(f"{len(pe_results)}/{len(eval_queries)} queries executed on PE synthetic data."))
for name, df in pe_results.items():
    display(Markdown(f"### `{name}` ({len(df)} rows)"))
    display(df.head(10))

8/8 queries executed on PE synthetic data.

### `avg_platform_power_c0_freq_temp_by_chassis` (6 rows)

Unnamed: 0,chassistype,number_of_systems,avg_psys_rap_watts,avg_pkg_c0,avg_freq_mhz,avg_temp_centigrade
0,Desktop,14,34.595528,6.15974,2.5867,58.347703
1,Workstation,1,1.0,2.0,2.0,2.0
2,Tablet,1,3.2,1.5,1.2,42.0
3,Server/WS,305,51.586523,105.510225,3.903,53.710107
4,Notebook,11,19.152586,23.119444,108.118391,60.380282
5,2 in 1,2,1.205882,59.619205,2.0,61.0


### `Xeon_network_consumption` (3 rows)

Unnamed: 0,processor_class,os,number_of_systems,avg_bytes_received,avg_bytes_sent
0,Non-Server Class,Win10,9,2154892.0,1117243.0
1,Non-Server Class,Win11,52,24656970.0,25603720.0
2,Non-Server Class,Win Server,9,612835.1,657225.1


### `pkg_power_by_country` (11 rows)

Unnamed: 0,countryname_normalized,number_of_systems,avg_pkg_power_consumed
0,United States of America,23,499.05123
1,"Korea, Republic of",93,228.818714
2,China,43,212.646006
3,Germany,1,180.0
4,Russian Federation,45,121.863089
5,Russia Federation,1,95.0
6,Japan,34,68.693231
7,Russia,1,65.0
8,Brazil,10,51.67079
9,United Kingdom of Great Britain and Northern I...,36,19.787935


### `ram_utilization_histogram` (5 rows)

Unnamed: 0,ram_gb,count(DISTINCT guid),avg_percentage_used
0,4.0,8,59.0
1,8.0,44,64.0
2,16.0,46,58.0
3,32.0,21,64.0
4,64.0,11,68.0


### `battery_power_on_geographic_summary` (0 rows)

Unnamed: 0,country,number_of_systems,avg_number_of_dc_powerons,avg_duration


### `persona_web_cat_usage_analysis` (8 rows)

Unnamed: 0,persona,number_of_systems,days,content_creation_photo_edit_creation,content_creation_video_audio_edit_creation,content_creation_web_design_development,education,entertainment_music_audio_streaming,entertainment_other,entertainment_video_streaming,...,productivity_project_management,productivity_spreadsheets,productivity_word_processing,recreation_travel,reference,search,shopping,social_social_network,social_communication,social_communication_live
0,Casual Gamer,8,8.0,12.5,0.0,12.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Casual User,45,45.0,52.799,3.576,2.222,23.815,0.0,0.0,2.032,...,0.0,0.0,0.0,0.0,0.0,4.444,0.0,2.222,0.0,0.0
2,Content Creator/IT,424,424.0,58.635,30.758,6.351,0.775,0.266,0.0,0.0,...,0.236,0.059,0.295,0.0,0.0,0.236,0.0,0.236,0.0,0.0
3,Entertainment,2,2.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Office/Productivity,107,107.0,10.465,2.816,2.138,14.703,0.935,0.0,0.0,...,0.234,6.272,6.638,0.0,0.0,2.181,0.935,1.869,0.0,0.0
5,Gamer,65,65.0,1.538,1.538,1.538,3.077,11.154,0.0,0.769,...,0.0,0.0,0.0,0.0,0.0,4.872,0.0,4.615,0.0,0.0
6,Entertainment,567,567.0,0.497,0.0,0.0,0.0,97.412,0.066,1.648,...,0.0,0.0,0.0,0.176,0.0,0.0,0.0,0.176,0.0,0.0
7,Web User,31,31.0,21.447,1.133,6.452,12.903,19.355,0.0,3.226,...,0.0,0.0,3.226,0.0,0.0,12.903,0.0,6.452,0.0,0.0


### `popular_browsers_by_count_usage_percentage` (3 rows)

Unnamed: 0,browser,percent_systems,percent_instances,percent_duration
0,edge,10.18,9.28,6.04
1,firefox,2.04,1.86,0.83
2,chrome,97.45,88.86,93.12


### `most_popular_browser_in_each_country_by_system_count` (19 rows)

Unnamed: 0,country,browser
0,Bangladesh,chrome
1,Brazil,chrome
2,Canada,chrome
3,China,chrome
4,France,chrome
5,Germany,chrome
6,India,chrome
7,Italy,chrome
8,Japan,chrome
9,Japanese,chrome


---
## Step 6: Comparison with ground truth and DP-SGD

In [9]:
DPSGD_RESULTS = Path("../data/results/synthetic")

comparison_rows = []
for name in eval_queries:
    real_path = REAL_RESULTS / f"{name}.csv"
    dpsgd_path = DPSGD_RESULTS / f"{name}.csv"
    pe_path = PE_RESULTS / f"{name}.csv"

    if not real_path.exists():
        continue
    real_df = pd.read_csv(real_path)

    for col in real_df.select_dtypes(include=[np.number]).columns:
        real_mean = real_df[col].mean()
        if abs(real_mean) < 1e-10:
            continue

        row = {"query": name.replace("_", " "), "column": col, "real_mean": real_mean}

        if dpsgd_path.exists():
            dpsgd_df = pd.read_csv(dpsgd_path)
            if col in dpsgd_df.columns:
                dpsgd_mean = dpsgd_df[col].mean()
                row["dpsgd_mean"] = dpsgd_mean
                row["dpsgd_rel_error"] = abs(real_mean - dpsgd_mean) / abs(real_mean)

        if pe_path.exists():
            pe_df = pd.read_csv(pe_path)
            if col in pe_df.columns:
                pe_mean = pe_df[col].mean()
                row["pe_mean"] = pe_mean
                row["pe_rel_error"] = abs(real_mean - pe_mean) / abs(real_mean)

        comparison_rows.append(row)

comp_df = pd.DataFrame(comparison_rows)
display(Markdown("Column-level mean comparison (real vs DP-SGD vs PE):"))
display(comp_df)

Column-level mean comparison (real vs DP-SGD vs PE):

Unnamed: 0,query,column,real_mean,dpsgd_mean,dpsgd_rel_error,pe_mean,pe_rel_error
0,avg platform power c0 freq temp by chassis,number_of_systems,26.0,23352.285714,897.164835,55.66667,1.141026
1,avg platform power c0 freq temp by chassis,avg_psys_rap_watts,4.291388,0.001956,0.999544,18.45675,3.300882
2,avg platform power c0 freq temp by chassis,avg_pkg_c0,42.63076,0.022237,0.999478,32.98477,0.226268
3,avg platform power c0 freq temp by chassis,avg_freq_mhz,2692.871,0.007033,0.999997,19.96802,0.992585
4,avg platform power c0 freq temp by chassis,avg_temp_centigrade,44.7122,0.002905,0.999935,46.23968,0.034163
5,Xeon network consumption,number_of_systems,4653.0,50179.785714,9.784394,23.33333,0.994985
6,Xeon network consumption,avg_bytes_received,7.359645e+16,1.130403,1.0,9141566.0,1.0
7,Xeon network consumption,avg_bytes_sent,7.359475e+16,0.698113,1.0,9126064.0,1.0
8,pkg power by country,number_of_systems,16.32653,10665.78,652.279025,26.54545,0.625909
9,pkg power by country,avg_pkg_power_consumed,21.36177,0.002582,0.999879,141.1224,5.606307


In [10]:
browser_query = "most_popular_browser_in_each_country_by_system_count"
real_browsers = pd.read_csv(REAL_RESULTS / f"{browser_query}.csv")

pe_browsers_path = PE_RESULTS / f"{browser_query}.csv"
if pe_browsers_path.exists():
    pe_browsers = pd.read_csv(pe_browsers_path)
    merged = real_browsers.merge(pe_browsers, on="country", suffixes=("_real", "_pe"), how="inner")
    matches = (merged["browser_real"] == merged["browser_pe"]).sum()
    total = len(merged)
    display(Markdown(
        f"Browser ranking accuracy (PE): {matches}/{total} countries correct "
        f"({100*matches/total:.0f}%)"
    ))

    dpsgd_browsers_path = DPSGD_RESULTS / f"{browser_query}.csv"
    if dpsgd_browsers_path.exists():
        dpsgd_browsers = pd.read_csv(dpsgd_browsers_path)
        merged_dpsgd = real_browsers.merge(dpsgd_browsers, on="country", suffixes=("_real", "_dpsgd"), how="inner")
        dpsgd_matches = (merged_dpsgd["browser_real"] == merged_dpsgd["browser_dpsgd"]).sum()
        dpsgd_total = len(merged_dpsgd)
        display(Markdown(
            f"Browser ranking accuracy (DP-SGD): {dpsgd_matches}/{dpsgd_total} countries correct "
            f"({100*dpsgd_matches/dpsgd_total:.0f}%)"
        ))

Browser ranking accuracy (PE): 13/15 countries correct (87%)

Browser ranking accuracy (DP-SGD): 42/50 countries correct (84%)

---
## Summary

In [11]:
summary_lines = [
    "| | DP-SGD (VAE) | Private Evolution |",
    "|---|---|---|",
    f"| Model | DP-VAE (505K params) | GPT-5 nano (API) |",
    f"| Privacy | (3.996, 1e-5)-DP | ({pe_history['actual_epsilon']:.3f}, 1e-5)-DP |",
    f"| Synthetic records | 1,000,000 | {len(synth_wide):,} |",
    f"| Training/generation time | 360 min (CPU) | {pe_history['total_time']:.0f}s |",
    f"| Iterations | 20 epochs | {T} PE iteration(s) |",
]

display(Markdown("\n".join(summary_lines)))

| | DP-SGD (VAE) | Private Evolution |
|---|---|---|
| Model | DP-VAE (505K params) | GPT-5 nano (API) |
| Privacy | (3.996, 1e-5)-DP | (4.000, 1e-5)-DP |
| Synthetic records | 1,000,000 | 50,000 |
| Training/generation time | 360 min (CPU) | 3555s |
| Iterations | 20 epochs | 1 PE iteration(s) |