# Differential Privacy (DP) Aggregate Seeded Synthesizer

> Example based on [Synthetic Data Showcase - _pac-synth_](https://github.com/microsoft/synthetic-data-showcase/blob/main/packages/lib-pacsynth/samples/dp_aggregate_seeded_short_example.ipynb).

In [13]:
from snsynth.aggregate_seeded import (
    AggregateSeededSynthesizer,
    AccuracyMode,
    FabricationMode,
    AggregateSeededDataset,
)

from utils import gen_data_frame

## Generating an example data frame with random data

In [14]:
number_of_records_to_generate = 6000

sensitive_df = gen_data_frame(number_of_records_to_generate)

## Generating the synthetic data

In [15]:
reporting_length = 4

synth = AggregateSeededSynthesizer(
    reporting_length=reporting_length,
    epsilon=4.0,
    accuracy_mode=AccuracyMode.prioritize_long_combinations(),
    fabrication_mode=FabricationMode.uncontrolled(),
    use_synthetic_counts=True,
)

synth.fit(sensitive_df)

synthetic_df = synth.sample(synth.get_dp_number_of_records())

## Generating/exporting aggregate data

This illustrates how to generate aggregates directly from the sensitive and synthetic data, as well as how to access the DP aggregates.

In [16]:
sensitive_aggregates = synth.get_sensitive_aggregates(";")

dp_aggregates = synth.get_dp_aggregates(";")

synthetic_aggregates = AggregateSeededDataset.from_data_frame(
    synthetic_df
).get_aggregates(reporting_length, ";")

## Evaluating

In [17]:
sensitive_df.replace('', '0').astype('int').describe()

Unnamed: 0,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10
count,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0
mean,1.006167,2.6005,4.5455,0.507833,0.492833,0.512,0.492167,0.496833,0.499667,0.502167
std,0.812756,2.111076,3.32432,0.49998,0.49999,0.499898,0.49998,0.500032,0.500042,0.500037
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,2.0,4.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
75%,2.0,4.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,2.0,6.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [18]:
synthetic_df.replace('', '0').astype('int').describe()

Unnamed: 0,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10
count,6007.0,6007.0,6007.0,6007.0,6007.0,6007.0,6007.0,6007.0,6007.0,6007.0
mean,0.976028,2.484435,4.328783,0.468953,0.458132,0.481272,0.460962,0.470618,0.473448,0.476611
std,0.824466,2.123881,3.384596,0.499077,0.498285,0.499691,0.498515,0.499177,0.499336,0.499494
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,4.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,2.0,6.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
