<a href="https://colab.research.google.com/github/sylvia31096/Secure-and-Private-AI/blob/master/Differential_Privacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import torch

### Private dataset

Creating my own private database. Each record represent a different person with a value of 1 or 0

In [0]:
num_records = 5000 #number of 5000 records

For every person we will get a database without them

Create a function to create the database and parallel databases

In [0]:
 def get_parallelDB(db,removable_index):

    return torch.cat((db[0:removable_index],db[removable_index+1:]))

Create function to make a parallel database

In [0]:
def get_parallelDBs(db):

    parallelDBs = []
    for i in range(len(db)):
      parallelDB = get_parallelDB(db,i)
      parallelDBs.append(parallelDB)


    return parallelDBs

Create function that creates both a database and it's parallel database

In [0]:
def create_db_and_parallels(num_records):
  db = torch.rand(num_records)>0.5
  parallelDBs =  get_parallelDBs(db)
  
  return db,parallelDBs

## Evaluating differential privacy


Getting sensitivty : The change when a single record is removed

Create the databases and their parallel databases

In [0]:
db, pdbs = create_db_and_parallels(5000)

Create query function

In [0]:
#query mean of database
def query_mean(db):
    return db.float().mean()

In [0]:
#uery records that reach threshold
def query_threshold(db,threshold=5):
  return (db.sum()>threshold).float()


In [0]:
#query mean of database records
def query_sum(db):
  return db.float().sum()

In [0]:
full_db_result = query_sum(db)

Sensitivity function that gets the sensitivity given a query function and number of records

In [0]:
def sensitivity(query,n_entries):
    db, pdbs = create_db_and_parallels(n_entries)
    
    full_db_result = query(db)
    
    
    max_distance = 0
    for pdb in pdbs:
        pdb_result = query(pdb)

        db_distance = torch.abs(pdb_result - full_db_result)

        if(db_distance > max_distance):
            max_distance = db_distance
    return max_distance

In [22]:
sensitivity(query_mean,5000)

tensor(0.0001)

### Performing a diffferentiation attack

To do so:
1. query the database 
2. query database without John Doe

In [0]:
db,without_john_doe = create_db_and_parallels(100)

Let's query using sum

In [59]:
query_sum(db)-query_sum(without_john_doe[10])

tensor(0.)

Let's query using mean

In [63]:
(query_mean(db)/len(db))-(query_mean(without_john_doe[10])/len(without_john_doe[10]))

tensor(-9.5429e-05)

Let's query using L1

In [64]:
(query_threshold(db)>2499) - (query_threshold(without_john_doe[10])>2499)

tensor(0, dtype=torch.uint8)

### Local diffierential privacy

It involves adding noise before query is done. Create a randomized response function which returns the local diffencial privacy

In [0]:
def random_response_query(db,noise_parameter=0.2):
  
  true_result = torch.mean(db.float())
  
  first_coin_flip = (torch.rand(len(db))>noise_parameter).float()
                    
  second_coin_flip = (torch.rand(len(db))>0.5).float()
                      
  augmented_database = db.float()*first_coin_flip+(1-first_coin_flip)*second_coin_flip
                      
  
  sk_result = augmented_database.float().mean()
  
  private_result = ((sk_result/noise_parameter)*noise_parameter/(1-noise_parameter))
                      
  return true_result,private_result

In [33]:
for n in [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8]:
    db, pdbs = create_db_and_parallels(1000)
    print(random_response_query(db,n))

(tensor(0.4710), tensor(0.5244))
(tensor(0.4650), tensor(0.5800))
(tensor(0.5030), tensor(0.6857))
(tensor(0.5050), tensor(0.8517))
(tensor(0.5020), tensor(1.0420))
(tensor(0.4710), tensor(1.2250))
(tensor(0.4880), tensor(1.6667))
(tensor(0.4800), tensor(2.4900))


### Global differential privacy

Adding noise after a query

In [0]:
import numpy as np

In [0]:

def global_diff_query(query,n_entries,epsilon=0.2):
  
  b  = sensitivity(query,n_entries)/epsilon
  
  #using laplace noise
  return np.random.laplace(b)

In [0]:
db,_ = create_db_and_parallels(5000)

In [51]:
def query_mean_M(db):
    
    return db.float().mean()+global_diff_query(query_mean,len(db))

query_mean_M(db)

tensor(0.9681)

In [52]:

def query_sum_M(db):
  
  return db.float().sum()+global_diff_query(query_sum,len(db))

query_sum_M(db)

tensor(2506.2495)