<a href="https://colab.research.google.com/github/inspire-lab/SecurePrivateAI/blob/master/7_differantial_privacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Privacy Preserving Machine Learning

First things first. Let's run the package installations. They take quite a while.

Change the Runtime of this Notebook to GPU first. Otherwise, it will be pretty slow.

To do so go to Runtime -> Change Runtime Type and change it to GPU.

In [None]:
!pip install syft==0.2.9 keras==2.2.3 tensorflow_privacy==0.2.2

## Differential Privacy

Below we will train a model perform malware detection. Image classification on the MNIST data. will train it using Differentially Private SGD optimizer.

How does the privacy budget `epsilon` change when you tweak the parameters of the optimizer? How does it influence accuracy?

## Parameters for the optimizer

All the optimizers share some privacy-specific parameters that need to
be tuned in addition to any existing hyperparameter. There are currently four:

* `learning_rate` (float): The learning rate of the SGD training algorithm. The
  higher the learning rate, the more each update matters. If the updates are noisy
  (such as when the additive noise is large compared to the clipping
  threshold), the learning rate must be kept low for the training procedure to converge.
* `num_microbatches` (int): The input data for each step (i.e., batch) of your
  original training algorithm is split into this many micro-batches. Generally,
  increasing this will improve your utility but slow down your training in terms
  of wall-clock time. The total number of examples consumed in one global step
  remains the same. This number should evenly divide your input batch size.
* `l2_norm_clip` (float): The cumulative gradient across all network parameters
  from each micro-batch will be clipped so that its L2 norm is at most this
  value. You should set this to something close to some percentile of what
  you expect the gradient from each micro-batch to be. In previous experiments,
  we've found numbers from 0.5 to 1.0 to work reasonably well.
* `noise_multiplier` (float): This governs the amount of noise added during
  training. Generally, more noise results in better privacy and lower utility.
  This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
  but smaller values may still be acceptable for practical purposes.

In [0]:
from tensorflow import keras
import tensorflow as tf
from tensorflow_privacy.privacy.analysis import compute_dp_sgd_privacy
from tensorflow_privacy.privacy.optimizers.dp_optimizer import DPGradientDescentGaussianOptimizer
from tensorflow.keras.datasets import mnist
import numpy as np

(mnist_x_train, mnist_y_train), (mnist_x_test, mnist_y_test) = mnist.load_data()

mnist_x_train = mnist_x_train.astype( np.float32 ) / 255
mnist_x_test = mnist_x_test.astype( np.float32 ) / 255

mnist_x_train = mnist_x_train.reshape( -1, 28, 28, 1)
mnist_x_test = mnist_x_test.reshape( -1, 28, 28, 1)


mnist_y_train = keras.utils.to_categorical( mnist_y_train )
mnist_y_test = keras.utils.to_categorical( mnist_y_test )

EPOCHS = 10
BATCH_SIZE = 250


In [0]:
# training parameters
l2_norm_clip = 
noise_multiplier = 
num_microbatches = 
learning_rate = 


model = tf.keras.models.Sequential()
model.add( tf.keras.layers.Conv2D( 32, kernel_size=(3, 3), activation='relu', input_shape=mnist_x_train.shape[ 1: ]  ) )
model.add( tf.keras.layers.MaxPooling2D( pool_size=(2, 2) ) )
model.add( tf.keras.layers.Conv2D( 64, kernel_size=(3, 3), activation='relu' ) )
model.add( tf.keras.layers.Flatten() )
model.add( tf.keras.layers.Dense( 128, activation='relu' ) )
model.add( tf.keras.layers.Dense( 10, activation='softmax' ) )


optimizer = DPGradientDescentGaussianOptimizer(
    l2_norm_clip=l2_norm_clip,
    noise_multiplier=noise_multiplier,
    num_microbatches=num_microbatches,
    learning_rate=0.01)

loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE )

model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

model.fit(mnist_x_train, mnist_y_train,
          epochs=EPOCHS,
          batch_size=BATCH_SIZE)

print( 'test acc:', model.evaluate( mnist_x_test, mnist_y_test, batch_size=250 )[ 1 ] )

eps = compute_dp_sgd_privacy.compute_dp_sgd_privacy(n=60000, batch_size=250, noise_multiplier=noise_multiplier, epochs=15, delta=1e-5)
print( 'epsilon: ', eps )


# Pate

First we need to split up the data and train the teachers. For simplicity, we will work with 3 teachers and a small amount of data.


We are going to split the data into 4 partitions of 500 instances each. 3 partitions for the teachers and one for the students.



## Review Pate

![alt text](http://www.cleverhans.io/assets/pate-aggregation.png)

![alt text](http://www.cleverhans.io/assets/pate-full.png)

In [0]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

# number of instances each party has access to 
n_instances = 500 
#n number of teachers
n_teachers = 3

# load data and transform it
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.astype( float ) / 255.
x_test = x_test.astype( float ) / 255.

x_train = x_train.reshape( -1, 28, 28, 1)
x_test = x_test.reshape( -1, 28, 28, 1)

y_train = keras.utils.to_categorical( y_train )
y_test = keras.utils.to_categorical( y_test )

# shuffle data
idx = np.arange( len( x_train ) )
np.random.shuffle( idx )
x_train = x_train[ idx ]
y_train = y_train[ idx ]

# gather the teacher data
teacher_data_x = 
teacher_data_y = 

# gather the student data
student_data_x = 
student_data_y = 


In [0]:
# train the teacher models
def get_model():
  model = keras.models.Sequential()
  model.add( keras.layers.Conv2D( 32, 3, 2, activation='relu', input_shape=x_train.shape[ 1: ] ) )
  model.add( keras.layers.MaxPooling2D( ) )
  model.add( keras.layers.Conv2D( 16, 3, 2, activation='relu' ) )
  model.add( keras.layers.Flatten() )
  model.add( keras.layers.Dense(32, activation='relu') )
  model.add( keras.layers.Dense(10, activation='softmax') )

  model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] )

  return model

# list of teacher models
teacher_models = 

# train teacher models



## Train the student model

To train the student model we need to label the students training data using the teacher models. We'll use a majority voting with added noises to determine the label.

In [0]:
# label the data
labels = 
# preform the voting
votes = 

# transform the noisy votes into the student labels
student_data_y = 

# train model
student_model = get_model()
print( 'training student model' )
student_model.fit( x, y, epochs=16, verbose=0 )
print( 'test accuracy:', student_model.evaluate( student_data_x, student_data_y, verbose=0 )[ 1 ] )


In [None]:
# privacy analysis
from syft.frameworks.torch.dp import pate


teacher_preds = np.argmax( np.array( labels ), axis=2 )
print( teacher_preds.shape )

data_dep_eps, data_indep_eps = pate.perform_analysis( teacher_preds=teacher_preds,
                                                      indices=np.argmax( votes, axis=1 ),
                                                      noise_eps=0.2,
                                                      delta=1/1500
                                                     )

print(data_dep_eps, data_indep_eps)

What's more. Try by yourself with PyTorch

https://opacus.ai/tutorials/building_image_classifier

[Opacus](https://github.com/pytorch/opacus) is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance and allows the client to online track the privacy budget expended at any given moment.

Just click

<a href="https://colab.research.google.com/github/pytorch/opacus/blob/main/tutorials/building_image_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

It comes from
https://github.com/pytorch/opacus/blob/main/tutorials/building_image_classifier.ipynb

And another demo

https://github.com/erinqhu/differential-privacy-PATE/blob/master/PATE_analysis.ipynb

<a href="https://colab.research.google.com/github/erinqhu/differential-privacy-PATE/blob/master/PATE_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>