# Constructing an $f$-DP Estimator for the Gaussian Mechanism from kNN classifier

## Description

This notebook demonstrates how to construct a $f$-differential privacy ($f$-DP) estimator for the Gaussian Mechanism. It shows given a pair of Gaussian distribution $N(0, 1)$ and $N(1, 1)$, and an $\eta >0$, how to construct a $f$-DP estimator that outputs an estimate of the point $(\alpha(\eta), (\beta(\eta))$ in the trade-off curve for distribution pairs $P \sim N(0, 1)$ and $Q \sim N(1, 1)$.

The program implements the BayBox algorithm given in Algorithm 1.

### Step 1: Import Packages

In [1]:
import numpy as np
from scipy.special import erf
import os
import sys
import time
import matplotlib.pyplot as plt

# Navigate to the parent directory of the project structure
project_dir = os.path.abspath(os.path.join(os.getcwd(), '../..'))
src_dir = os.path.join(project_dir, 'src')

# Add the src directory to sys.path
sys.path.append(src_dir)

from analysis.tradeoff_Gaussian import Gaussian_compute_tradeoff_curve
from analysis.tradeoff_Laplace import Laplace_compute_tradeoff_curve

from mech.GaussianDist import *
from classifier.kNN import train_kNN_model

### Step 2: Instantiate Gaussian Distribution Sampler

In [2]:
dim = 1
kwargs = generate_params(mean0 = np.zeros(dim), cov0 = np.identity(dim), mean1 = np.ones(dim), cov1 = np.identity(dim))
sampler = GaussianDistSampler(kwargs)

## 3 Estimate $(\alpha(\eta), (\beta(\eta))$ for $\eta>=1$

### Step 3.1 Show the theoretical result

In [3]:
eta = 1.2
Gaussian_compute_tradeoff_curve(eta)

(np.float64(0.37536443231311667), np.float64(0.24751782253992788))

### Step 3.2 Train a kNN classifier for Bayesian Problem $P[[P]_\eta, Q]$

In [4]:
start_time = time.time()
num_train_samples = 1000000
train_samples= sampler.gen_samples(eta=eta, num_samples=num_train_samples)
model = train_kNN_model(train_samples, dim)

print(f"Generated model in {time.time() - start_time:.2f}s with {num_train_samples} samples")

Generated model in 0.41s with 1000000 samples


### Step 3.3 Estimate the risk of the kNN classifier

In [5]:
start_time = time.time()
num_test_samples = 100000
samples = sampler.gen_samples(eta=1, num_samples=num_test_samples)
print(f"Generated {num_test_samples} testing samples in {time.time() - start_time:.2f}s")

Generated 100000 testing samples in 0.00s


### Step 3.4 Convert the risk to an estimate of $(\alpha(\eta), (\beta(\eta))$

In [6]:
start_time = time.time()
alpha = 1 - model.score(samples['X'][:num_test_samples], samples['y'][:num_test_samples])
beta = 1 - model.score(samples['X'][num_test_samples:], samples['y'][num_test_samples:])
print(f"(alpha, beta) w.r.t {eta} is ({alpha}, {beta}) [Computation time is {time.time() - start_time:.2f}]")

(alpha, beta) w.r.t 1.2 is (0.37859, 0.24680000000000002) [Computation time is 19.19]


## 3 Estimate $(\alpha(\eta), (\beta(\eta))$ for $\eta<1$

In [7]:
eta=0.4
Gaussian_compute_tradeoff_curve(eta)

(np.float64(0.07834520013706386), np.float64(0.6614013631638339))

In [8]:
start_time = time.time()
num_train_samples = 100000
train_samples= sampler.gen_samples(eta=eta, num_samples=num_train_samples)
model = train_kNN_model(train_samples, dim)

print(f"Generated model in {time.time() - start_time:.2f}s with {num_train_samples} samples")

Generated model in 0.02s with 100000 samples


In [9]:
start_time = time.time()
num_test_samples = 10000
samples = sampler.gen_samples(eta=1, num_samples=num_test_samples)
print(f"Generated {num_test_samples} testing samples in {time.time() - start_time:.2f}s")

Generated 10000 testing samples in 0.00s


In [10]:
start_time = time.time()
alpha = 1 - model.score(samples['X'][:num_test_samples], samples['y'][:num_test_samples])
beta = 1 - model.score(samples['X'][num_test_samples:], samples['y'][num_test_samples:])
print(f"(alpha, beta) w.r.t {eta} is ({alpha}, {beta}) [Computation time is {time.time() - start_time:.2f}]")

(alpha, beta) w.r.t 0.4 is (0.0806, 0.6563) [Computation time is 1.83]
