# Differential Privacy

Material courtesy of Joseph Near, University of Vermont

## The Laplace Mechanism

Differential privacy is typically used to answer specific queries. Let's consider a query on the census data, *without* differential privacy.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
adult = pd.read_csv("adult_with_pii.csv")

"How many individuals in the dataset are 40 years old or older?"

In [2]:

adult[adult['Age'] >= 40].shape[0]

14237

Let's answer this query again, but with a differentially private answer. And let's use the Laplace mechanism. Recall: we will need to specify the *sensitivity* of our query and our choice of $\epsilon$.

In [3]:
sensitivity = 1
epsilon = 0.1

adult[adult['Age'] >= 40].shape[0] + np.random.laplace(loc=0, scale=sensitivity/epsilon)

14228.567571176209

You can see the effect of the noise by running the proceeding cell multiple times. Each time, the output changes, but most of the time, the answer is close enough to the true answer (14,235) to be useful.