# `smartnoise-synth` Demo

https://docs.smartnoise.org/synth/index.html

https://github.com/opendp/smartnoise-sdk

`ydata-synthetic` is a GAN-oriented SD library. It provides different GANs to synthesise tabular and sequential data. At this moment, it doesn't support GANs with DP. The following list is all the models the library includes (2024-01-11):

* GAN
* CGAN (Conditional GAN)
* WGAN (Wasserstein GAN)
* WGAN-GP (Wassertein GAN with Gradient Penalty)
* DRAGAN (Deep Regret Analytic GAN)
* Cramer GAN (Cramer Distance Solution to Biased Wasserstein Gradients)
* CWGAN-GP (Conditional Wassertein GAN with Gradient Penalty)
* CTGAN (Conditional Tabular GAN)
* TimeGAN (specifically for time-series data)
* DoppelGANger (specifically for time-series data)

Besides, it also supports one probabilistic model, GMM, which is based on the mixture of several Gaussian distributions. Compared to GANs, GMMs are fast and easy to train. However they may suffer from the complexity of the real world data distribution.

Please be aware of the error when importing `pandas`, `numpy`, `matplotlib` and `seaborn` after installing `ydata-synthetic`, due to the inconsistent dependency.

In [1]:
from snsynth import Synthesizer

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
%cd /Users/alex/PETsARD

/Users/alex/PETsARD


In [5]:
df = pd.read_csv('[Adt Income] adult.csv')

In [8]:
df.dtypes

age                 int64
workclass          object
fnlwgt              int64
education          object
educational-num     int64
marital-status     object
occupation         object
relationship       object
race               object
gender             object
capital-gain        int64
capital-loss        int64
hours-per-week      int64
native-country     object
income             object
dtype: object

In [9]:
from tabnanny import verbose


synth = Synthesizer.create('dpctgan', epsilon=1.0, verbose=True)
synth_data = synth.fit_sample(df, preprocessor_eps=0.5)

Spent 0.5 epsilon on preprocessor, leaving 0.5 for training




Epoch 1, Loss G: 0.6885, Loss D: 1.3886
epsilon is 0.17837682139144928, alpha is 63.0
Epoch 2, Loss G: 0.6976, Loss D: 1.3875
epsilon is 0.20520467877255033, alpha is 63.0
Epoch 3, Loss G: 0.7029, Loss D: 1.3909
epsilon is 0.23203253615365138, alpha is 63.0
Epoch 4, Loss G: 0.6979, Loss D: 1.3901
epsilon is 0.25886039353475243, alpha is 63.0
Epoch 5, Loss G: 0.7027, Loss D: 1.3886
epsilon is 0.2856882509158534, alpha is 63.0
Epoch 6, Loss G: 0.7172, Loss D: 1.3828
epsilon is 0.3125161082969545, alpha is 63.0
Epoch 7, Loss G: 0.7070, Loss D: 1.3927
epsilon is 0.3393439656780555, alpha is 63.0
Epoch 8, Loss G: 0.7105, Loss D: 1.3865
epsilon is 0.3661718230591566, alpha is 63.0
Epoch 9, Loss G: 0.7135, Loss D: 1.3891
epsilon is 0.3924151985062195, alpha is 60.0
Epoch 10, Loss G: 0.7208, Loss D: 1.3804
epsilon is 0.4171742677984172, alpha is 57.0
Epoch 11, Loss G: 0.7243, Loss D: 1.3809
epsilon is 0.4406523915074723, alpha is 54.0
Epoch 12, Loss G: 0.7169, Loss D: 1.3876
epsilon is 0.46306

In [10]:
synth_data

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,41,Private,22805.262695,7th-8th,11,Divorced,Craft-repair,Other-relative,White,Male,1537.597656,0.230347,43,United-States,>50K
1,41,Private,22005.888672,7th-8th,10,Divorced,Craft-repair,Other-relative,White,Male,1448.738281,0.182373,50,United-States,>50K
2,41,Private,23601.314453,Bachelors,10,Never-married,Craft-repair,Other-relative,White,Male,1542.714844,0.155884,60,Jamaica,>50K
3,41,Private,22922.495605,HS-grad,10,Never-married,Craft-repair,Unmarried,White,Male,1756.492188,0.343872,43,Thailand,>50K
4,24,Private,18057.068359,Bachelors,9,Divorced,Craft-repair,Unmarried,White,Male,691.214844,0.065186,50,United-States,>50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48837,41,Private,18247.483398,11th,10,Married-civ-spouse,Craft-repair,Husband,White,Male,753.480469,0.071899,50,United-States,>50K
48838,41,Private,22713.254395,7th-8th,11,Never-married,Farming-fishing,Other-relative,White,Male,1800.132812,0.519653,50,Honduras,<=50K
48839,41,Private,22363.893555,7th-8th,10,Married-civ-spouse,Craft-repair,Own-child,Black,Male,1019.437500,0.063843,50,United-States,>50K
48840,52,Federal-gov,19580.788574,7th-8th,10,Never-married,Adm-clerical,Unmarried,White,Male,1093.492188,0.148560,50,United-States,>50K


preprocessing 也會用掉 $\epsilon$