# Differential Privacy (DP) Private Aggregate Seeded Synthesizer


> Example based on [Synthetic Data Showcase - _pac-synth_](https://github.com/microsoft/synthetic-data-showcase/blob/main/packages/lib-pacsynth/samples/dp_aggregate_seeded_short_example.ipynb).


In [1]:
from snsynth.aggregate_seeded import (
    AggregateSeededSynthesizer,
    AccuracyMode,
    FabricationMode,
    AggregateSeededDataset,
)
from snsynth.transform.table import NoTransformer

from utils import gen_data_frame


## Generating an example data frame with random data


In [2]:
number_of_records_to_generate = 6000

sensitive_df = gen_data_frame(number_of_records_to_generate)


## Generating the synthetic data


In [3]:
reporting_length = 4

synth = AggregateSeededSynthesizer(
    reporting_length=reporting_length,
    epsilon=4.0,
    accuracy_mode=AccuracyMode.prioritize_long_combinations(),
    fabrication_mode=FabricationMode.uncontrolled(),
    use_synthetic_counts=True,
)

synth.fit(sensitive_df, transformer=NoTransformer())

synthetic_df = synth.sample(synth.get_dp_number_of_records())


## Generating/exporting aggregate data

This illustrates how to generate aggregates directly from the sensitive and synthetic data, as well as how to access the DP aggregates.


In [4]:
sensitive_aggregates = synth.get_sensitive_aggregates(";")

dp_aggregates = synth.get_dp_aggregates(";")

synthetic_aggregates = AggregateSeededDataset.from_data_frame(
    synthetic_df
).get_aggregates(reporting_length, ";")


## Evaluating


In [5]:
sensitive_df.replace("", "0").astype("int").describe()


Unnamed: 0,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10
count,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0
mean,1.002167,2.629333,4.549833,0.500333,0.504,0.498,0.499333,0.4925,0.503,0.492
std,0.818906,2.095075,3.331421,0.500042,0.500026,0.500038,0.500041,0.499985,0.500033,0.499978
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0
75%,2.0,4.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,2.0,6.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [6]:
synthetic_df.replace("", "0").astype("int").describe()


Unnamed: 0,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10
count,5998.0,5998.0,5998.0,5998.0,5998.0,5998.0,5998.0,5998.0,5998.0,5998.0
mean,0.945315,2.498833,4.345782,0.46899,0.476659,0.461487,0.473158,0.46949,0.471991,0.457819
std,0.826651,2.133274,3.372906,0.499079,0.499497,0.498556,0.499321,0.49911,0.499256,0.498259
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,4.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,2.0,6.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
