In [1]:
import diffprivlib
import numpy as np
import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

# IBM differential privacy library

The library comes with an MIT license. The library contains notebooks (https://github.com/IBM/differential-privacy-library/blob/main/notebooks/README.md) that demonstrate a number of aspects, including
- scikit-learn algos on differentially private data
- a BudgetAccountant class to track privacy budget spend across multiple operations

### References

[1] https://github.com/IBM/differential-privacy-library


### Simple examples

Consider some simple data, consisting of male and female genders.

In [2]:
n_male = 30
n_female = 50
data = ['M']*n_male + ['F']*n_female
random.shuffle(data)
df = pd.DataFrame({'Gender': data})
print(df)

   Gender
0       M
1       M
2       M
3       F
4       F
..    ...
75      F
76      M
77      F
78      M
79      M

[80 rows x 1 columns]


One can perform count queries against the data.

In [3]:
f_array = np.array(df['Gender'] == 'F').astype(int)
np.sum(f_array)

50

And also perform count queries that employ a Laplacian mechanism to give differential privacy; $\epsilon$ is the same $\epsilon$ in the mathematical definition, and the bounds yield the sensitivity.

In [4]:
diffprivlib.tools.sum(f_array, epsilon=0.1, bounds=(0,1))

55.61868447476087

### Privacy budget accounting

The library has a BudgetAccountant class to keep track of how many queries one can make before the privacy budget is blown.

In [5]:
acc = diffprivlib.BudgetAccountant(epsilon=0.25, delta=0)
print("Total spent: %r" % (acc.total(),))
print("Remaining budget (for 1 query): %r" % (acc.remaining(),))
print("Number of queries recorded: %d" % len(acc))
print("SUM: "+str(diffprivlib.tools.sum(f_array, epsilon=0.1, bounds=(0,1), accountant=acc)))
print("Total spent: %r" % (acc.total(),))
print("Remaining budget (for 1 query): %r" % (acc.remaining(),))
print("Number of queries recorded: %d" % len(acc))
print("SUM: "+str(diffprivlib.tools.sum(f_array, epsilon=0.1, bounds=(0,1), accountant=acc)))
print("Total spent: %r" % (acc.total(),))
print("Remaining budget (for 1 query): %r" % (acc.remaining(),))
print("Number of queries recorded: %d" % len(acc))

Total spent: (epsilon=0, delta=0.0)
Remaining budget (for 1 query): (epsilon=0.25, delta=0.0)
Number of queries recorded: 0
SUM: 48.49475230557343
Total spent: (epsilon=0.1, delta=0.0)
Remaining budget (for 1 query): (epsilon=0.15000000000000002, delta=0.0)
Number of queries recorded: 1
SUM: 61.32975485873175
Total spent: (epsilon=0.2, delta=0.0)
Remaining budget (for 1 query): (epsilon=0.04999999999999999, delta=0.0)
Number of queries recorded: 2


Going beyond the budget raises an Exception

In [6]:
print("SUM: "+str(diffprivlib.tools.sum(f_array, epsilon=0.1, bounds=(0,1), accountant=acc)))

BudgetError: Privacy spend of (0.1,0) not permissible; will exceed remaining privacy budget. Use BudgetAccountant.remaining() to check remaining budget.