<a href="https://colab.research.google.com/github/naraquev/Private-AI/blob/master/Week6/Differential_Privacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Differential Privacy Notebook Example.**

This notebook implements the first Project (Week 6) of the Private AI Facebook Scholarship run by Udacity. 

The idea is to demonstrate the PATE analysis that comes from this paper: https://arxiv.org/pdf/1610.05755.pdf

What are we doing? Suppose you are an organization with a DataSet of unlabeled data. You want to train an AI supervised model over this data, but the lack of labels is a problem. Some number X of organizations (Teachers) has data that could help you label your DataSet, but this data is private. 

We can train a model inside every organization, so we end up with X models that can then predict the data that our organization holds and give us the labels for our model. Even if the model is trained inside every organization, the definition of Differential Privacy states that it could be data leakage. So we perform a noisy mechanism over the X predictions to protect with  (ǫ, δ)-differential privacy the results. 

In the end, we compare the data independent and dependent Epsilon spent in the analysis and how different Hyperparameters can increase or reduce this difference.

Greatly inspire by: https://github.com/dimun/pate_torch/blob/master/PATE.ipynb

In [0]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
from torch.utils.data.sampler import SubsetRandomSampler
import numpy as np
from IPython.display import clear_output

# Here we define two classes:

**The teacher** is going to be one organization with their data and their model. In theory, we could use different models in every organization. 

**The model** is the class that represents the models that we're performing. It's built in PyTorch

In [0]:
class teacher():
  def __init__(self, dataL, device, batch_size = 32, epochs=5, print_every=120):
    self.dataLoader = dataL
    self.model = t_model()
    self.criterion = nn.NLLLoss()
    self.optimizer = optim.Adam(self.model.parameters(), lr=0.003)
    self.epochs = epochs
    self.dbug = print_every
    self.device =  device
                                           
                                           
                                           
  def train(self):
    self.model.to(self.device)
    #print('Aqui voy!')
    steps = 0
    running_loss = 0
    accuracy = 0
    for e in range(self.epochs):
        # Model in training mode, dropout is on
        self.model.train()
        accuracy=0
        running_loss = 0
        for images, labels in self.dataLoader:
            
            images, labels = images.to(self.device), labels.to(self.device)
            steps += 1
            
            self.optimizer.zero_grad()
            
            output = self.model.forward(images)
            loss = self.criterion(output, labels)
            loss.backward()
            self.optimizer.step()            
            running_loss += loss.item()          
            ps = torch.exp(output)
            top_p, top_class = ps.topk(1, dim=1)
            equals = top_class == labels.view(*top_class.shape)
            accuracy += torch.mean(equals.type(torch.FloatTensor))
        #if(e == self.epochs -1 or e==0):
         #   print("Epoch: {}/{}.. ".format(e+1, self.epochs),
          #    "Training Loss: {:.3f}.. ".format(running_loss/len(self.dataLoader)),              
          #    "Train Accuracy: {:.3f}".format(accuracy/len(self.dataLoader)))
    return accuracy/len(self.dataLoader)
    
  def eval(self, dataLoader):  
    outputs = torch.zeros(0, dtype=torch.long).to(self.device)
    self.model.to(self.device)
    # Model in test mode, dropout is off
    self.model.eval()
    result=[]
    for images, labels in dataLoader:
        images, labels = images.to(self.device), labels.to(self.device)
        output = self.model.forward(images)
        ps = torch.argmax(torch.exp(output), dim=1)
        outputs = torch.cat((outputs, ps))
    return outputs
  
  
  
class t_model(nn.Module):
    def __init__(self):
        super(t_model, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

Then we import the MNIST dataset and define the number of teachers that we want to train.

In [0]:
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, ), (0.5,))])
mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
device =  torch.device("cuda:0""cuda:0" if torch.cuda.is_available() else "cpu")
num_teachers = 100
train_len = len(mnist_trainset)
test_len = len(mnist_testset)

The function get_samples() subsets the data so every teacher has a unique and separate DataSet.

The funciont create_teachers() creates the teachers with their own data and model.

In [0]:
def get_samples(num_teachers):
  tam = len(mnist_trainset)
  split= int(tam/num_teachers)
  split_ini = split
  indices = list(range(tam))
  init=0
  samples = []
  for i in range(num_teachers):     
    t_idx = indices[init:split]
    t_sampler = SubsetRandomSampler(t_idx)
    samples.append(t_sampler)
    init = split
    split = split+split_ini
  return samples
def create_teachers(samples):
  teachers = []
  for sample in samples:
    loader = torch.utils.data.DataLoader(mnist_trainset, batch_size=32, sampler=sample)
    t = teacher(loader, device)
    teachers.append(t)    
  return teachers

In [0]:
samples = get_samples(num_teachers)
teachers = create_teachers(samples)

In [0]:
def train_teachers(teachers):
  accuracy = []
  for key, teacher in enumerate(teachers):
    accuracy.append(teacher.train())
    clear_output()
    print('Teacher ', key)
  return accuracy


In [0]:
ac = train_teachers(teachers)
print('The accuracy mean of all teachers is: ', np.mean(ac))

eval_data() takes all the teachers and applies the trained models to our public data in order to obtein the labels.

Function mechanism() applies Laplace mechanism to the predicted data that comes from the teachers in order to ensure (ǫ, δ)-differential privacy 

In [0]:
epsilon = 0.2
def eval_data(teachers, mnist_testset):
    preds = torch.torch.zeros((num_teachers, test_len), dtype=torch.long)
    loader = torch.utils.data.DataLoader(mnist_testset, batch_size=32)
    for key, teacher in enumerate(teachers):
      clear_output()
      print('Teacher:', key)      
      result = teacher.eval(loader)
      preds[key] = result
    return preds.numpy()
def mechanism(preds, epsilon=0.2):  
  beta = 1 / epsilon
  labels = np.array([]).astype(int)
  for image_preds in np.transpose(preds):
    label_counts = np.bincount(image_preds, minlength=10)
    ori_label_counts = np.bincount(image_preds, minlength=10)
    for i in range(10):
      label_counts[i] += np.random.laplace(0, beta, 1)
    new_label = np.argmax(label_counts)
    labels = np.append(labels, new_label)
  return labels

In [0]:
preds = eval_data(teachers, mnist_testset)
labels = mechanism(preds, epsilon)


In [0]:
#!pip install syft
from syft.frameworks.torch.differential_privacy import pate

Then we perform the PATE analysis. This analysis shows us that when for one observation the models have come to an agreement, is less the amount of data that could be leaked about the original observation that if the models are less in agreement.

This means that for more teachers, better model architectures, more epochs, more data (In general, anything that could improve the accuracy of the individual models so they can have a better agreement) the data dependent Epsilon is going to be much less than the data independent one. 

In [28]:
data_dep_eps, data_ind_eps = pate.perform_analysis(teacher_preds=preds, indices=labels, noise_eps=epsilon, delta=1e-5)
print("Data Independent Epsilon:", data_ind_eps)
print("Data Dependent Epsilon:", data_dep_eps)

Data Independent Epsilon: 1611.5129254649705
Data Dependent Epsilon: 34.607559293061705
