# Local Differential Privacy In Practice

In [None]:
import numpy as np

Our population has 10000 individuals and there is a likelyhood of an individual having driven while drunk of `drunk_driving = 30%`.

- If the value of `population[i] == 1` the individual drunk drove. 
- If the value of `population[i] == 0` the individual was always sober.



In [30]:
drunk_driving = 0.3
population_size = 10_000

In [31]:
for i in range(10):
    a = np.random.choice(2, population_size, p=[1 - drunk_driving,drunk_driving])
    print("--" * 10 + " %d " %(i) + "--" * 10)
    print("In population %d: %d individuals have ever drunk while driving" % (i, a.sum()))
    print("In population %d: %d individuals have never drunk while driving" % (i, (1 - a).sum()))

-------------------- 0 --------------------
In population 0: 3113 individuals have ever drunk while driving
In population 0: 6887 individuals have never drunk while driving
-------------------- 1 --------------------
In population 1: 2934 individuals have ever drunk while driving
In population 1: 7066 individuals have never drunk while driving
-------------------- 2 --------------------
In population 2: 2987 individuals have ever drunk while driving
In population 2: 7013 individuals have never drunk while driving
-------------------- 3 --------------------
In population 3: 3014 individuals have ever drunk while driving
In population 3: 6986 individuals have never drunk while driving
-------------------- 4 --------------------
In population 4: 3038 individuals have ever drunk while driving
In population 4: 6962 individuals have never drunk while driving
-------------------- 5 --------------------
In population 5: 3021 individuals have ever drunk while driving
In population 5: 6979 indiv

In [32]:
population = np.random.choice(2, population_size, p=[1 - drunk_driving,drunk_driving])


The procedure for random coin flipping is the following:

- If `coin_1 == 0`: The individual answers honestly.
- If `coin_1 == 1`: the individual answers based on coin 2
  - If `coin_2 == 0`: He will answer negatively (He always drove sober).
  - If `coin_2 == 1`: He will answer affirmatively (He drunk drive at some point of his life).

In [44]:
def random_coin_flipping(p = 0.5):
    coin1 = np.random.normal(0, 1, population_size) > p
    coin2 = np.random.normal(0, 1, population_size) > p
    return coin1, coin2

In [67]:
coin_1, coin_2 = random_coin_flipping()
private_population = population * (1 - coin_1) + (coin_1) * coin_2

If we now calculate any measurement over the modified population, we will see how it does not alter the initial measurements.
The power of Differential Privacy comes from having ensured the privacy of individuals.

In [72]:
m1 = private_population.mean()
m2 = population.mean()
print("The average of the population is: %f" % (m1))
print("The average of the privatized population is: %f" % (m2))
print("The average or mean did only change by: %f" % (m1 - m2))

The average of the population is: 0.303400
The average of the privatized population is: 0.304600
The average or mean did only change by: -0.001200


In [73]:
m1 = private_population.std()
m2 = population.std()
print("The standard deviation of the population is: %f" % (m1))
print("The standard deviation of the privatized population is: %f" % (m2))
print("The standard deviation did only change by: %f" % (m1 - m2))

The standard deviation of the population is: 0.459726
The standard deviation of the privatized population is: 0.460238
The standard deviation did only change by: -0.000511


In [116]:
def test_on_population(population_size, drunk_driving, coin_flipping_p=0.5):
    population = np.random.choice(2, population_size, p=[1 - drunk_driving,drunk_driving])

    coin_1 = np.random.normal(0, 1, population_size) > coin_flipping_p
    coin_2 = np.random.normal(0, 1, population_size) > coin_flipping_p
    private_population = population * (1 - coin_1) + (coin_1) * coin_2
    
    print("For a population of %d individuals, where %d %% drunk drove:" %(population_size, int(drunk_driving * 100)))
    m1 = private_population.mean()
    m2 = population.mean()
    
    print("\tThe average of the population is: %f" % (m1))
    print("\tThe average of the privatized population is: %f" % (m2))
    print("\tThe average or mean did only change by: %f" % abs(m1 - m2))
    m1 = private_population.std()
    m2 = population.std()
    print("\tThe standard deviation of the population is: %f" % (m1))
    print("\tThe standard deviation of the privatized population is: %f" % (m2))
    print("\tThe standard deviation did only change by: %f" % abs(m1 - m2))

    

The bigger we make the population, the less the changes affect our measurements.

In [117]:
test_on_population(100, drunk_driving)

For a population of 100 individuals, where 30 % drunk drove:
	The average of the population is: 0.330000
	The average of the privatized population is: 0.290000
	The average or mean did only change by: 0.040000
	The standard deviation of the population is: 0.470213
	The standard deviation of the privatized population is: 0.453762
	The standard deviation did only change by: 0.016451


In [118]:
test_on_population(1_000, drunk_driving)

For a population of 1000 individuals, where 30 % drunk drove:
	The average of the population is: 0.304000
	The average of the privatized population is: 0.335000
	The average or mean did only change by: 0.031000
	The standard deviation of the population is: 0.459983
	The standard deviation of the privatized population is: 0.471990
	The standard deviation did only change by: 0.012008


In [119]:
test_on_population(100_000, drunk_driving)

For a population of 100000 individuals, where 30 % drunk drove:
	The average of the population is: 0.304710
	The average of the privatized population is: 0.299910
	The average or mean did only change by: 0.004800
	The standard deviation of the population is: 0.460284
	The standard deviation of the privatized population is: 0.458218
	The standard deviation did only change by: 0.002066


In [120]:
test_on_population(1_000_000, drunk_driving)

For a population of 1000000 individuals, where 30 % drunk drove:
	The average of the population is: 0.302921
	The average of the privatized population is: 0.299886
	The average or mean did only change by: 0.003035
	The standard deviation of the population is: 0.459521
	The standard deviation of the privatized population is: 0.458208
	The standard deviation did only change by: 0.001314


In [121]:
test_on_population(47_000_000, drunk_driving)

For a population of 47000000 individuals, where 30 % drunk drove:
	The average of the population is: 0.302695
	The average of the privatized population is: 0.300108
	The average or mean did only change by: 0.002587
	The standard deviation of the population is: 0.459424
	The standard deviation of the privatized population is: 0.458305
	The standard deviation did only change by: 0.001120
