## **3. Local and Global Differencial Privacy**

### **Implementing Local Differential Privacy**

Let's picture the following situation. We have a group of people we want to suvey about a negative behaviour (let's say that is if they have stolen something). Of course some might refuse to answer.

Well, by using the next steps, we are able to protect the privacy of our subjects.
*   flip a coin 2 times
*   if the first coin flip is heads ,tell the truth
*   if it doesn't, answer according the second coind (yes for heads, no for tails)

Now, we a person answers, we have no way to tell if their answer is true or if it comes from the second coin (therefore we protect their privacy).

On top of that, we now that wathever the probability of stealing is, we're averaging it with a 50% prob. from the coins. This means that if the prob we have after our survey is 60%, the we now the real prob is 70%, because the averaging 70 and 50 we have 60.

However, the protection of our users has it costs. The greater privacy protection, the less accurate the results.

In [0]:
def query(db):
  true_result = torch.mean(db.float())
  
  first_coin_flip = (torch.rand(len(db)) > 0.5).float()
  second_coin_flip = (torch.rand(len(db)) > 0.5).float()
  
  augmented_database = db.float() * first_coin_flip + (1 - first_coin_flip) * second_coin_flip
  
  db_result = torch.mean(augmented_database.float()) * 2 - 0.5
  
  return db_result, true_result

Let's see how this noise affect our db depending on the number of entries it has.

In [0]:
db, pdbs = create_db_and_parallels(10)
priv_res , true_res = query(db)
print("With noise: " + str(priv_res))
print("Without noise: " + str(true_res))

With noise: tensor(0.7000)
Without noise: tensor(0.5000)


In [0]:
db, pdbs = create_db_and_parallels(100)
priv_res , true_res = query(db)
print("With noise: " + str(priv_res))
print("Without noise: " + str(true_res))

With noise: tensor(0.4600)
Without noise: tensor(0.4700)


In [0]:
db, pdbs = create_db_and_parallels(100000)
priv_res , true_res = query(db)
print("With noise: " + str(priv_res))
print("Without noise: " + str(true_res))

With noise: tensor(0.5013)
Without noise: tensor(0.4991)


When we add noise, we are corrupting our dataset. Howhever, the more datapoints we have, the more this noise will tend to average out.

### **Varying the amount of noise**

Now, we'll modify our query function to vary the chances of the first coin.

In [0]:
def query(db, percent):
  true_result = torch.mean(db.float())
  
  first_coin_flip = (torch.rand(len(db)) > percent).float()
  second_coin_flip = (torch.rand(len(db)) > 0.5).float()
  
  augmented_database = db.float() * first_coin_flip + (1 - first_coin_flip) * second_coin_flip
  
  sk_result = augmented_database.float().mean()
  
  db_result = ((sk_result/percent)-0.5)*percent/(1-percent)
  
  return db_result, true_result

In [0]:
db, pdbs = create_db_and_parallels(100)
priv_res , true_res = query(db, 0.2)
print("With noise: " + str(priv_res))
print("Without noise: " + str(true_res))

With noise: tensor(0.5875)
Without noise: tensor(0.5500)


In [0]:
db, pdbs = create_db_and_parallels(100)
priv_res , true_res = query(db,0.7)
print("With noise: " + str(priv_res))
print("Without noise: " + str(true_res))

With noise: tensor(0.5000)
Without noise: tensor(0.5300)


### **Global Differentially Privacy** 

The amount of noise we need to add to the output of a query depends on:
*   the type of noise (Gaussian/Laplacian)
*   the sensitivity of the query/function
*   the desired epsilon
*   the desired delta

For Laplacian noisse, delta is always 0 and we can choose the amount of noise using:


b = sensitivity(query)/epsilon

In [0]:
epsilon = 0.5 

The smaller we make this number (*epsilon*), the less information we're allowing to leak. This means we'll add more noise to our results.

In [0]:
import numpy as np

In [0]:
db, pdbs = create_db_and_parallels(100)

First, we'll do it for a sum query

In [0]:
def sum_query(db):
  return db.sum()

In [0]:
def laplacian_mechanism(db, query, sensitivity):
  
  beta = sensitivity/epsilon
  noise = torch.tensor(np.random.laplace(0, beta, 1))
  
  return query(db)+noise

In [0]:
laplacian_mechanism(db, sum_query, 1)

tensor([42.0475], dtype=torch.float64)

In [0]:
sum(db)  # real query

tensor(46, dtype=torch.uint8)

Now, we're doing it for a mean query

In [0]:
def mean_query(db):
  return torch.mean(db.float())

In [0]:
laplacian_mechanism(db, mean_query, 1/100)

tensor([0.4518], dtype=torch.float64)

In [0]:
mean_query(db)

tensor(0.4600)