Skip to content

opendp/smartnoise-sdk

main
Switch branches/tags
Code

License: MIT

SmartNoise SDK: Tools for Differential Privacy on Tabular Data

The SmartNoise SDK includes 2 packages:

To get started, see the examples below. Click into each project for more detailed examples.

SQL

Python

Install

pip install smartnoise-sql

Query

import snsql
from snsql import Privacy
import pandas as pd

csv_path = 'PUMS.csv'
meta_path = 'PUMS.yaml'

data = pd.read_csv(csv_path)
privacy = Privacy(epsilon=1.0, delta=0.01)
reader = snsql.from_connection(data, privacy=privacy, metadata=meta_path)

result = reader.execute('SELECT sex, AVG(age) AS age FROM PUMS.PUMS GROUP BY sex')

print(result)

See the SQL project

Synthesizers

Python

Install

pip install smartnoise-synth

MWEM

import pandas as pd
import numpy as np

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
nf = pums.to_numpy().astype(int)

synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1]) 
synth.fit(nf)

sample = synth.sample(10) # get 10 synthetic rows
print(sample)

PATE-CTGAN

import pandas as pd
import numpy as np
from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)

synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns.values.tolist())

sample = synth.sample(10) # synthesize 10 rows
print(sample)

See the Synthesizers project

Communication

Releases and Contributing

Please let us know if you encounter a bug by creating an issue.

We appreciate all contributions. Please review the contributors guide. We welcome pull requests with bug-fixes without prior discussion.

If you plan to contribute new features, utility functions or extensions to this system, please first open an issue and discuss the feature with us.