<a href="https://colab.research.google.com/github/drc10723/udacity_secure_private_AI/blob/master/Differential_Private_Deep_Learing_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training Deep Learning Model with Differntial Privacy

If you have a private dataset and you want to train a deep learning model for some predictions, in most of cases you can't train model due to non-availabity of true labels. Let's take example, a hosptial has lots of unlabelled data for particular disease (like lung cancer). Even if other hospitals have labelled data, they can't share with others. In most of cases one hosptial doesn't have good number of labelled examples to train good model.


## Problem Assumption:- 

*   N Hospitals ( Teachers) have some labelled data with same kind of labels
*   One Hosptial ( Student) have some unlabelled data

## Problem Solution :- 


*   Ask each of the N hospitals (Teachers) to train a model on their own datasets
*   Use the N teachers models to predict on your local dataset, generating N labels for each datapoints
*   Aggregate the N labels using a differential private (DP) query
*   Train model with new aggregated labels on your own dataset



Let's start by imports

In [0]:
!pip install -q syft

In [0]:
import torch
import torchvision
import numpy as np
from torch import nn, optim
from torchvision import datasets, transforms

In [3]:
# use cuda if available
DEVICE = torch.device("cuda" if torch.cuda.is_available()
                      else "cpu")
print(f"Using {DEVICE} backend")

# number of teacher models.  
# our student model accuracy will depend on this parameter
num_teachers = 100 #@param {type:"integer"}

Using cuda backend


## Teacher Models Training

We will use MNIST data as dummy data to train Teachers and Student Models.



*   MNIST Training Data will be divided in N( equal to number of teachers) subsets and each subset will train one teacher model.
*   MNIST Test Data will be used as private or student data and will be assumed unlabelled.



In [0]:
# convert to tensor and normalize 
train_transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize([.5],[.5])])
# load training data
mnsit_dataset = datasets.MNIST('./mnsit', train=True, transform=train_transform, download=True, )

In [0]:
# divide mnist train data to num_teachers partitions
total_size = len(mnsit_dataset)
# length of each teacher dataset
lengths = [int(total_size/num_teachers)]*num_teachers
# list of all teacher dataset
teacher_datasets = torch.utils.data.random_split(mnsit_dataset, lengths)

In [0]:
# We will create basic model, which will be used for teacher and student training both
# It is not necessary to have same model structure for all teahders and even student model
class Network(nn.Module):
  def __init__(self):
    super(Network,self).__init__()
    # sequential layer : input size (batch_size, 28*28)
    self.layer = nn.Sequential(nn.Linear(28*28, 256),
                               # out size (batch_size, 256)
                               nn.BatchNorm1d(256),
                               # out size (batch_size, 256)
                               nn.ReLU(),
                               # out size (batch_size, 256)
                               nn.Dropout(0.5),
                               # out size (batch_size, 256)
                               nn.Linear(256, 64),
                               # out size (batch_size, 64)
                               nn.BatchNorm1d(64),
                               # out size (batch_size, 64)
                               nn.ReLU(),
                               # out size (batch_size, 64)
                               nn.Dropout(0.5),
                               # out size (batch_size, 64)
                               nn.Linear(64, 10),
                               # out size (batch_size, 10)
                               # we will use logsoftmax instead softmax
                               # softmax has expoential overflow issues
                               nn.LogSoftmax(dim=1)
                               # out size (batch_size, 10)
                              )

  def forward(self,x):
    # x size : (batch_size, 1, 28, 28)
    x = x.view(x.shape[0], -1)
    # x size : (batch_size, 784)
    x = self.layer(x)
    # x size : (batch_size, 10)
    return x

In [0]:
def train_model(dataset, checkpoint_file, num_epochs=10, do_validation=False):
  """ 
  Train a model for given dataset for given number of epochs and
  save last epoch model checkpoint
  
  Parameters: 
    dataset (torch.dataset): training data
    checkpoint_file (str): filename for saving model
    num_epochs (int): number of training epoch
    do_validation (bool): perform validation by dividing dataset in 90:10 ratio
          
  Returns: None
  
  """
  # if validation divide dataset to train and test set 90:10 ratio
  if do_validation:
    dataset_size = len(dataset)
    train_set, test_set = torch.utils.data.random_split(dataset, [int(0.9*dataset_size), int(0.1*dataset_size)])
    # create train and test dataloader
    trainloader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)
    testloader = torch.utils.data.DataLoader(test_set, batch_size= 32, shuffle=True)
  else:
    # create train dataloader using full dataset
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

  # create model and send to gpu
  model = Network().to(DEVICE)
  # we have used logsoftmax, so now NLLLoss
  criterion = nn.NLLLoss()
  # adam optimizer for training
  optimizer = optim.Adam(model.parameters(), lr=0.005)

  # train for num_epochs
  for epoch in range(num_epochs):
    # training accuracy and loss for logging
    train_accuracy = 0
    train_loss = 0
    # training dataloader
    for images, labels in trainloader:
      # zero accumlated grads
      optimizer.zero_grad()
      # send images, labels to gpu
      images, labels = images.to(DEVICE), labels.to(DEVICE)
      # run forward propagation
      output = model.forward(images)
      # calculate loss
      loss = criterion(output, labels)
      train_loss += loss.item()
      # calculate accuracy 
      top_out, top_class = output.topk(1, dim=1)
      success = (top_class==labels.view(*top_class.shape))
      train_accuracy += success.sum().item()
      # do backward propagation
      loss.backward()
      optimizer.step()
      
    if do_validation:
      # set model to evaluation
      model.eval()
      test_accuracy = 0
      test_loss = 0
      # do forward pass and calculate loss and accuracy 
      with torch.no_grad():
        for images, labels in testloader:
          images, labels = images.to(DEVICE), labels.to(DEVICE)
          output = model.forward(images)
          loss = criterion(output, labels)
          test_loss += loss.item()
          top_out, top_class = output.topk(1, dim=1) 
          success = (top_class==labels.view(*top_class.shape))
          test_accuracy += success.sum().item()
      # log train and test metrics
      print("Epoch: {}".format(epoch+1),
            "Train Loss: {:.3f}".format(train_loss/len(trainloader)),
            "Train Accuracy: {:.3f}".format(train_accuracy/len(train_set)),
            "Test Loss: {:.3f}".format(test_loss/len(testloader)),
            "Test Accuracy: {:.3f}".format(test_accuracy/len(test_set))
           )
      # set model to train
      model.train()
    else:
      # log only training metrics if no validation
      print("Epoch: {}".format(epoch+1),
            "Train Loss: {:.3f}".format(train_loss/len(trainloader)),
            "Train Accuracy: {:.3f}".format(train_accuracy/len(dataset))
           )
    # save trained teacher model
    torch.save(model.state_dict(), checkpoint_file)

In [0]:
# train all teachers models on MNIST partition datasets
for teacher in range(num_teachers):
  print("############################### Teacher {} Model Training #############################".format(teacher+1))
  train_model(teacher_datasets[teacher], f"checkpoint_teacher_{teacher+1}.pth")

## Teacher Models Predictions

Now we have trained N teachers models and we can share those trained models for student training.


We have assumed MNIST test dataset, as student dataset

In [0]:
# student dataset transforms 
test_transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize([.5],[.5])])
# load private student dataset
private_dataset = datasets.MNIST('./mnsit', train=False, transform=test_transform, download=True)

# mnist test dataset have 10000 examples
private_data_size = len(private_dataset)

# create dataloader for private train dataset
private_dataloader = torch.utils.data.DataLoader(private_dataset, batch_size=32)


In [0]:
def predict_model(model_checkpoint, dataloader):
  """ 
  Load a trained model and make predictions
  
  Parameters: 
    checkpoint_file (str): filename for trained model checkpoint
    dataloader (DataLoader): dataloader instance
          
  Returns: 
    preds_list (torch.Tensor): predictions for whole dataset
  
  """
  # create model 
  model = Network()
  # load model from checkpoint
  state_dict = torch.load(model_checkpoint)
  model.load_state_dict(state_dict)
  # send model to gpu
  model = model.to(DEVICE)
  # list for batch predictions
  preds_list = []
  # set model to eval mode
  model.eval()
  # no gradients calculation needed
  with torch.no_grad():
    # iterate over dataset
    for images, labels in dataloader:
      images = images.to(DEVICE)
      # calculate predictions ( log of predictions)
      preds = model.forward(images)
      # calculate top_class
      top_preds, top_classes = preds.topk(k=1, dim=1)
      # append batch top_classes tensor
      preds_list.append(top_classes.view(-1))
  # concat all batch predictions
  preds_list = torch.cat(preds_list).cpu()
  # return predictions
  return preds_list 

In [11]:
# list of all teacher model predictions
teacher_preds = []
# predict for each teacher model
for teacher in range(num_teachers):
  teacher_preds.append(predict_model(f'checkpoint_teacher_{teacher+1}.pth', private_dataloader))
# stack all teacher predictions
teacher_preds = torch.stack(teacher_preds)
print(teacher_preds.shape)

torch.Size([100, 10000])


## Aggregating Teacher Predictions

We have N predictions for each datapoint from our private dataset. We can aggregate N predictions using max query on bin counts for different labels.

Can we train a model on those aggregated labels directly ? Yes, we can, but for increasing differenital privacy and keeping within some privacy budget, we will convert our aggreagte query to dp query. In dp query, we will add some amount of gaussian noise.

In [0]:
# epsilon budget for one aggregate dp query
epsilon = 0.1 #@param {type:"number"}
# number of labels
num_classes = 10

we have assumed, student data is unlabelled. For analysis purpose we will use real labels.

In [0]:
# real targets, will not available for private dataset in real scenerio
real_targets = private_dataset.targets

### Teacher Argmax Aggregation

Aggregate N teacher predictions using max query on bin counts for different labels

In [14]:
# teacher aggregation result
teachers_argmax = list()
for image_i in range(private_data_size):
  # calculate bin count
  label_counts = torch.bincount(teacher_preds[:, image_i], minlength=num_classes)
  # take maximum bin count label
  argmax_label = torch.argmax(label_counts)
  teachers_argmax.append(argmax_label)
# convert array to 
teachers_argmax = torch.tensor(teachers_argmax)
# correct predictions
argmax_correct = torch.sum(real_targets == teachers_argmax)
print("Teachers argmax labels accuracy", argmax_correct.item()/private_data_size)

Teachers argmax labels accuracy 0.9215


### Teacher Noisy Aggregation ( DP query)

We use laplacian noise and beta will equal to **(sensitivity / epsilon )**.

Sensitivity of argmax query will be one.

In [15]:
# dp query results
noisy_labels = list()
for image_i in range(private_data_size):
  # calculate bin count
  label_counts = torch.bincount(teacher_preds[:, image_i], minlength=num_classes)
  # calcuate beta for laplacian 
  beta = 1 / epsilon
  
  # add noise for each teacher predictions
  for i in range(len(label_counts)):
      label_counts[i] += np.random.laplace(0, beta, 1)[0]
  # calculate dp label
  noisy_label = torch.argmax(label_counts)
  noisy_labels.append(noisy_label)

noisy_labels = torch.tensor(noisy_labels)
# accuracy for noisy or dp query results
noisy_accuracy = torch.sum(real_targets == noisy_labels)

print("Noisy label accuracy", noisy_accuracy.item()/private_data_size)

Noisy label accuracy 0.9155


## PATE Analysis

**What is our epsilon budget, we have used ?** We will perform PATE analysis.

In [16]:
from syft.frameworks.torch.differential_privacy import pate

W0717 15:48:11.699311 139903046981504 secure_random.py:26] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/usr/local/lib/python3.6/dist-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.14.0.so'
W0717 15:48:11.724745 139903046981504 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tf_encrypted/session.py:26: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



In [0]:
# memory usage is getting pretty high with all predictions in PATE analysis,
# using subset of predictions ( subset of mnist test dataset)
# will help us understand importnace of private data size
num_student_train = 2000 #@param {type:"integer"}
teacher_preds1 = teacher_preds[:, :num_student_train].to(DEVICE)
noisy_labels1 = noisy_labels[:num_student_train].to(DEVICE)
teachers_argmax1 = teachers_argmax[:num_student_train].to(DEVICE)
real_targets1 = real_targets[:num_student_train].to(DEVICE)

### Noisy Labels PATE Analysis

In [18]:
# Data dependant and independant epsilon for noisy labels
data_dep_eps, data_ind_eps = pate.perform_analysis_torch(preds=teacher_preds1, indices=noisy_labels1,
                                                   noise_eps=epsilon, delta=1e-5, moments=10)
print(f"Data dependant epsilon {data_dep_eps.item()} data independent epsilon {data_ind_eps.item()}")

  torch.tensor(counts, dtype=torch.float) - torch.tensor(counts[winner], dtype=torch.float)


Data dependant epsilon 17.959531784057617 data independent epsilon 91.51292419433594


### Teacher Argmax PATE Analysis

In [19]:
# Data dependant and independant epsilon for argmax labels
data_dep_eps, data_ind_eps = pate.perform_analysis_torch(preds=teacher_preds1, indices=teachers_argmax1,
                                                   noise_eps=epsilon, delta=1e-5, moments=10)
print(f"Data dependant epsilon {data_dep_eps.item()} data independent epsilon {data_ind_eps.item()}")

  torch.tensor(counts, dtype=torch.float) - torch.tensor(counts[winner], dtype=torch.float)


Data dependant epsilon 17.83055305480957 data independent epsilon 91.51292419433594


### Real Labels PATE Analysis

In [20]:
# Data dependant and independant epsilon for argmax labels
data_dep_eps, data_ind_eps = pate.perform_analysis_torch(preds=teacher_preds1, indices=real_targets1,
                                                   noise_eps=epsilon, delta=1e-5, moments=10)
print(f"Data dependant epsilon {data_dep_eps.item()} data independent epsilon {data_ind_eps.item()}")

  torch.tensor(counts, dtype=torch.float) - torch.tensor(counts[winner], dtype=torch.float)


Data dependant epsilon 19.144325256347656 data independent epsilon 91.51292419433594


## Student Model Training

Differential privacy gaurantees that any amount of postprocessing can't increase epsilon value for given dataset, which means epsilon value will be less than or equal to PATE analysis values after training deep learning models. 

In [0]:
# save real labels
private_real_labels = private_dataset.targets
# replace real labels with noisy labels in private dataset
private_dataset.targets = noisy_labels

# create training and testing subset
train_private_set = torch.utils.data.Subset(private_dataset, range(0, num_student_train))
test_private_set = torch.utils.data.Subset(private_dataset, range(num_student_train, len(private_dataset)))

In [0]:
# train student model with noisy labels
student_model = train_model(train_private_set, f'checkpoint_student.pth', num_epochs=20)

In [23]:
# create test loader
private_testloader = torch.utils.data.DataLoader(test_private_set, batch_size=32)
# get test predictions 
test_preds = predict_model(f'checkpoint_student.pth', private_testloader)
# calculate test predictions 
correct = torch.sum(private_real_labels[num_student_train:] == test_preds)
# accuracy
print(f"student model test accuracy {correct.item()/(len(private_dataset)-num_student_train)}")

student model test accuracy 0.898375


## Conclusion 

As you can see, we are able to train a quite good accuracy model. 

Try different values of epsilon and number of teachers, you should able to observe following :- 

1.   More the numbers of teachers, less data dependent epsilon and more accuracy also
2.   By adding noise, we are able to reduce privacy budget hugely ( See difference between data dependent and Independent epsilon)
3.   Less the value of epsilon, more differntial privacy ( low data dependent and independent epsilon )
4.   Given enough examples, deep learning model will able to remove noise added during DP query without reducing differential privacy.
5.   More unlabelled student data, more accuracy

