<a href="https://colab.research.google.com/github/sampath9dasari/GSU/blob/master/Adverserial_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Defense with adversarial training

In this section we will use adversarial training to harden our CNN against adversarial examples. 

In adversarial training the dataset get "augmented" with adversarial examples that are correctly labeled. This way the network learns that such pertubations are possible and can adapt to them. 

We will be using the IBM Adversarial Robustness Toolbox in this exercise. It offers a very easy-to-use implementation of adversarial training and a number of other defenses. 
https://github.com/IBM/adversarial-robustness-toolbox


We start out by importing most of the modules and functions we will need. 

In [1]:
%tensorflow_version 1.x
!pip install adversarial-robustness-toolbox
!git clone https://github.com/tensorflow/cleverhans.git
!pip install cleverhans/

Collecting adversarial-robustness-toolbox
[?25l  Downloading https://files.pythonhosted.org/packages/30/80/443c8bec5502c6315c9d089d7c8b8050ea337a7da72a957c15e86f013bf8/adversarial_robustness_toolbox-1.1.1-py3-none-any.whl (436kB)
[K     |▊                               | 10kB 24.3MB/s eta 0:00:01[K     |█▌                              | 20kB 6.2MB/s eta 0:00:01[K     |██▎                             | 30kB 7.4MB/s eta 0:00:01[K     |███                             | 40kB 5.7MB/s eta 0:00:01[K     |███▊                            | 51kB 6.7MB/s eta 0:00:01[K     |████▌                           | 61kB 7.9MB/s eta 0:00:01[K     |█████▎                          | 71kB 7.6MB/s eta 0:00:01[K     |██████                          | 81kB 7.2MB/s eta 0:00:01[K     |██████▊                         | 92kB 8.0MB/s eta 0:00:01[K     |███████▌                        | 102kB 8.6MB/s eta 0:00:01[K     |████████▎                       | 112kB 8.6MB/s eta 0:00:01[K     |████████

In [2]:
# most of our imports
import warnings
import numpy as np
import os
with warnings.catch_warnings():
    import keras # keras is still using some deprectade code
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense
from cleverhans.utils_keras import KerasModelWrapper
from cleverhans.attacks import BasicIterativeMethod, FastGradientMethod, CarliniWagnerL2
%matplotlib inline 
import matplotlib.pyplot as plt
import tensorflow as tf
from art.classifiers import KerasClassifier


# helper code 
def exract_ones_and_zeroes( data, labels ):
    data_zeroes = data[ np.argwhere( labels == 0 ).reshape( -1 ) ][ :200 ]
    data_ones = data[ np.argwhere( labels == 1 ).reshape( -1 ) ][ :200 ]
    x = np.vstack( (data_zeroes, data_ones) )

    x = x / 255.
    print( x.shape )

    labels_zeroes = np.zeros( data_zeroes.shape[ 0 ] )
    labels_ones = np.ones( data_ones.shape[ 0 ] )
    y = np.append( labels_zeroes, labels_ones )

    return x, y

def exract_two_classes( data, labels, classes=(0,1), no_instance=200 ):
    data_zeroes = data[ np.argwhere( labels ==  classes[0] ).reshape( -1 ) ][ :no_instance ]
    data_ones = data[ np.argwhere( labels == classes[1] ).reshape( -1 ) ][ :no_instance ]
    x = np.vstack( (data_zeroes, data_ones) )
    
    # normalize the data
    x = x / 255.

    labels_zeroes = np.zeros( data_zeroes.shape[ 0 ] )
    labels_ones = np.ones( data_ones.shape[ 0 ] )
    y = np.append( labels_zeroes, labels_ones )

    return x, y

def convert_to_keras_image_format( x_train, x_test ):
    if keras.backend.image_data_format( ) == 'channels_first':
        x_train = x_train.reshape( x_train.shape[ 0 ], 1, x_train.shape[ 1 ], x_train.shape[ 2 ] )
        x_test = x_test.reshape( x_test.shape[ 0 ], 1, x_train.shape[ 1 ], x_train.shape[ 2 ] )
    else:
        x_train = x_train.reshape( x_train.shape[ 0 ], x_train.shape[ 1 ], x_train.shape[ 2 ], 1 )
        x_test = x_test.reshape( x_test.shape[ 0 ], x_train.shape[ 1 ], x_train.shape[ 2 ], 1 )

    return x_train, x_test


def mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=2 ):
    # define the classifier
    clf = keras.Sequential( )
    clf.add( Conv2D( 32, kernel_size=(3, 3), activation='relu', input_shape=x_train.shape[ 1: ] ) )
    clf.add( Conv2D( 64, (3, 3), activation='relu' ) )
    clf.add( MaxPooling2D( pool_size=(2, 2) ) )
    clf.add( Dropout( 0.25 ) )
    clf.add( Flatten( ) )
    clf.add( Dense( 128, activation='relu' ) )
    clf.add( Dropout( 0.5 ) )
    clf.add( Dense( y_train.shape[ 1 ], activation='softmax' ) )

    clf.compile( loss=keras.losses.categorical_crossentropy,
                 optimizer='adam',
                 metrics=[ 'accuracy' ] )

    clf.fit( x_train, y_train,
             epochs=epochs,
             verbose=1 )
    clf.summary( )
    score = clf.evaluate( x_test, y_test )
    print( 'Test loss:', score[ 0 ] )
    print( 'Test accuracy:', score[ 1 ] )

    return clf


def show_image( img ):
    plt.imshow( img.reshape( 28, 28 ), cmap="gray_r" )
    plt.axis( 'off' )
    plt.show( )

Using TensorFlow backend.





We start out by loading the data, preparing it and training our CNN.

In [3]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# extract ones and zeroes
x_train, y_train = exract_ones_and_zeroes( x_train, y_train )
x_test, y_test = exract_ones_and_zeroes( x_test, y_test )

# we need to bring the data in to a format that our cnn likes
y_train = keras.utils.to_categorical( y_train, 2 )
y_test = keras.utils.to_categorical( y_test, 2 )

# convert it to a format keras can work with
x_train, x_test = convert_to_keras_image_format(x_train, x_test)

# need to some setup so everything gets excturted in the same tensorflow session
session = tf.Session( )
keras.backend.set_session( session )

# get and train our cnn
clf = mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=5)


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
(400, 28, 28)
(400, 28, 28)





Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Epoch 1/5




Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
______________

We want to know how robust our model is against an attack. To do this we are calculating the `empirical robustness`. This is equivalent to computing the minimal perturbation that the attacker must introduce for a    successful attack. We are following the approach of Moosavi-Dezfooli et al. 2016 (paper link: https://arxiv.org/abs/1511.04599).

The emperical robustness method supports two attacks at the moment. 
The `Fast Gradient Sign Method` and `Hop Skip and Jump`.

You can use them by passing either `fgsm` or `hsj` as parameters.
The default attack parameters are the following:
```
    "fgsm":{"eps_step": 0.1, "eps_max": 1., "clip_min": 0., "clip_max": 1.},
    "hsj" {'max_iter': 50, 'max_eval': 10000, 'init_eval': 100, 'init_size': 100}
```

In [4]:
from art.metrics import empirical_robustness

# wrap the model an calculte emperical robustnees
wrapper = KerasClassifier( model=clf, clip_values=(0., 1.) )
print( 'robustness of the undefended model', 
      empirical_robustness( wrapper, x_test, 'fgsm'))

robustness of the undefended model 0.19556192131066377


Try different attack parameters and compare the results. 

Tip:

For `hsj` use only a few examples otherwise it will take forever.

In [5]:
### your code goes here
x_small = x_test[ :10 ]
print( 'robustness for hsj', 
      empirical_robustness( wrapper, x_small, 'hsj'))

KeyboardInterrupt: ignored

In [0]:
print( 'robustness for fgsm2', 
  empirical_robustness(wrapper, x_test, "fgsm",{"eps_step": 0.3, "eps_max": 1., "clip_min": 0., "clip_max": 1.}))

Let's create an adversarial example and see how it looks.
We want to know how to the model performs on adversarial exampels. Let's create adversarial examples out of the training set and see how the model does with it.

Below you can the keyword arguments for the attack

```
norm=np.inf, eps=.3, eps_step=0.1, targeted=False, num_random_init=0, batch_size=1, minimal=False
        """
        :param norm: The norm of the adversarial perturbation. Possible values: np.inf, 1 or 2.
        :param eps: Attack step size (input variation)
        :param eps_step: Step size of input variation for minimal perturbation computation
        :param targeted: Indicates whether the attack is targeted (True) or untargeted (False)
        :param num_random_init: Number of random initialisations within the epsilon ball. For random_init=0 starting at
            the original input.
        :param batch_size: Size of the batch on which adversarial samples are generated.
        :param minimal: Indicates if computing the minimal perturbation (True). If True, also define `eps_step` for
                        the step size and eps for the maximum perturbation.
   
```

Find good parameters for the attack

In [0]:
np.zeros(x_test.shape).shape

In [0]:
# create an adversarial example with fgsm and plot it
from art.attacks import FastGradientMethod
fgsm = FastGradientMethod( wrapper, eps=0.3 )
x_adv = fgsm.generate( x_test[ 0 ].reshape( (1,28,28,1) ) )
# prediction for the adversarial example
print(clf.predict(x_adv))
# show the adverarial example
show_image( x_adv )

# x_test_adv = np.zeros(x_test.shape)
# create adversarial examples for the all of the set
# for i in range(len(x_test)):
x_test_adv = fgsm.generate( x_test )
clf.evaluate(x_test_adv,y_test)

## Adversarial Training

Let's create a new untrained model with the same architecture that we have been using so far. 

We will train the model using adversarial training framework. The idea is very simple:

1.   Train the model for 1 epoch
2.   Create adversarial examples using FGSM 
3.   Enhance training data by mixing it with the adversarial examples. (Only mix in the adversarial examples created in this iteartion)
4.   Goto 1

We will be using the FGSM attack from `art` this time.




In [0]:
# create a new untrained model and wrap it
new_model = mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=0 )
defended_model = KerasClassifier(clip_values=(0,1), model=new_model )
# define the attack we are using
fgsm = FastGradientMethod( defended_model, eps=.3 )

# parameters
epochs = 5 # number of iterations that we will perform training for
ratio = .5  # ratio of the test set that will get turned into adversarial examples
            # each iteration


# some helpers
idx = np.arange( x_train.shape[ 0 ], dtype=np.int )

# create varialbes to hold the training data.
# for now it is just the normal training data. we'll mix in the 
# adversarial examples in later
x_train_enhanced = x_train
y_train_enhanced = y_train


for i in range( epochs ):
  # train model for one epoch
  new_model.fit(x_train_enhanced,y_train_enhanced)

  # shuffle   
  np.random.shuffle(idx)

  # pick the subest of the train data to turn into adverarial examples
  x_sub = x_train[idx[:100]]
  y_sub = y_train[idx[:100]]


  # create adversarial examples
  x_sub_adv = fgsm.generate( x_sub )

  # add the adversarial examples to the training data
  x_train_enhanced = np.append(x_train,x_sub_adv,axis=0)
  y_train_enhanced = np.append(y_train,y_sub,axis=0)

# training is done. let's evaulate the performance on the test set 
# and adversarial examples
acc = defended_model._model.evaluate( x_test, y_test )[ 1 ]
print( 'acc on the test data: ', acc )

# and now on adversarial examples
x_test_adv = fgsm.generate( x_test )
acc =  wrapper._model.evaluate( x_test_adv, y_test )
print( 'accuracy on adversarial examples: ', acc )


To use the adversarial training that comes with `art` we need to pass our wrapped model to an `AdversarialTrainer` instance. The `AdversarialTrainer` also needs an instance of the attack that will be used to create the adversarial examples.


In [0]:
from art.defences import AdversarialTrainer

# get a new untrained model and warp it
new_model = mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=0 )
defended_model = KerasClassifier(clip_values=(0,1), model=new_model )
# define the attack we are using
fgsm = FastGradientMethod( defended_model )

Create the `AdversarialTrainer` instance. 
Train the model and evaluate it on the test data.

In [0]:
# define the adversarial trainer and train the new network
adversarial_tranier = AdversarialTrainer( defended_model, fgsm )
adversarial_tranier.fit( x_train, y_train, batch_size=100, nb_epochs=5 )

# evaluate how good our model is
defended_model._model.evaluate( x_test,y_test )

# and now on adversarial examples
x_test_adv = fgsm.generate( x_test )
acc =  wrapper._model.evaluate( x_test_adv, y_test )
print( 'accuracy on adversarial examples: ', acc )


Calculate the `empirical robustness` for our now hopefully more robust model.

In [0]:
# calculate the empiracal robustness
print( 'robustness of the defended model', 
      empirical_robustness( defended_model, x_test[0:], 'fgsm', {}) )
x_adv = fgsm.generate(x_test[0].reshape((1,28,28,1) ))
print( 'class prediction for the adversarial sample:',
       clf.predict( x_adv.reshape((1,28,28,1) ) ) 
     )
plt.imshow( x_adv.reshape( 28, 28 ), cmap="gray_r" )
plt.axis( 'off' )
plt.show( )

# Defensive Distillation

The idea behind defensive distiallation is to transfere robustness from one network to another. To do this we are training two networks. The first network, which we will call `one` is trained normally. We want to transfer some of the *experience* to our second network, called `two`. Both `one` and `two` have the same architecture. The way we achieve is this is by training `two` with the ouputs of `one`. An important change is that we are using a so called *temperature* `T` parameter in the softmax function.
The process is as follows:


1.   Train `one` at temprature `T`
2.   Create new labels for the training data using `one`
3.   Train `two` at temprature `T` using the new labels


Hints:


*   `tf.math.exp`
*   `keras.backend.in_train_phase`
*   kullback leibler divergence




In [0]:
import tensorflow as tf
# softmax with temprature
T = 10


# define the classifier one


# test the FGSM attack


# create new labels


# define the classifier two


# test the FGSM attack



# Black box attacks

Assume we do not have access to the internal workings of our target model. This means we can not easily calculate gradients.
Fortunatley or unfortunatle depending on how you are looking at it adversarial exampels created on one model can be also used against a different model. Given their learned descion boundary is similar enough. 

We do not know what the target model looks like but in most cases we no the domain that it works in, MNIST in our case, so we can make an educated guess. We then train our model with the architecture that we guessed and create adversarial examples using this model. If our model and the target model are similare enough the adversarial examples can be transferd.


In the code below we will be training two different models and see if the adversarial examples transfer from one to the other.

In [0]:
import keras
import keras.backend as k
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Reshape
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# extract ones and zeroes
x_train, y_train = exract_ones_and_zeroes( x_train, y_train )
x_test, y_test = exract_ones_and_zeroes( x_test, y_test )

# we need to bring the data in to a format that our cnn likes
y_train = keras.utils.to_categorical( y_train, 2 )
y_test = keras.utils.to_categorical( y_test, 2 )

# convert it to a format keras can work with
x_train, x_test = convert_to_keras_image_format(x_train, x_test)

# Create simple CNN
model_0 = mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=5 )
print( model_0.evaluate( x_test, y_test )[ 1 ] )
# create a simple DNN and train it


# compare how the models do on the test set



# compare how the models perform on adversarial examples



# let's see how the models do when we give them the adversarial examples 
# created against the other model



We do not always have access to the same training data though. We can collect our own data and use the victim model to label the data. 

Using `model_0` from the cell above as the victim model in a black box setting train you own substitue model on the training data provided in the cell below. Pick an architecture that you think will work well or that you are interested in trying. The paper desrcibing the attack can be found here: https://arxiv.org/abs/1602.02697

Hint: `cleverhans` provides a few helpful functions for performing the data augmentation.

 Also try the transferability of attacks other than FGSM. Hint: Don't use the too much data for more complex attacks or it will take a long time. Start with a smaller subset first to get a feeling how long it takes to generate advesarial examples.


 


In [0]:
# set up black box. should already be trained. if not run the cell above first.
black_box = model_0

# load data that is differen from the data that black box has been trained on.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# extract ones and zeroes
x_train, y_train = exract_two_classes( x_train, y_train, no_instance=400 )
x_test, y_test = exract_two_classes( x_test, y_test, no_instance=400 )

# pick few instances from the training data
x_train = x_train[ [0,1, 199, 200] ]
y_train = y_train[ [0,1, 199, 200] ]
# we need to bring the data in to a format that our cnn likes
y_train = keras.utils.to_categorical( y_train, 2 )
y_test = keras.utils.to_categorical( y_test, 2 )
print( x_train.shape )
# convert it to a format keras can work with
x_train, x_test = convert_to_keras_image_format(x_train, x_test)
print( x_train.shape )

# use the black box classifier to create labes for the training data

# define subsitute model

# create computational graph for data augmentation

# train your own substitute  model

  # train for a few epochs

  # perform data augmentation

    # get labels for new data


# create adverasarial examples on the substitute model
sub_wrapper = KerasClassifier(clip_values=(0,1), model=sub )
# define the attack we are using
fgsm = FastGradientMethod( sub )
x_adv = fgsm.generate( x_test )

# evaluate performance on adversarial exampales for the substitute model and the black box
black_box.evaluate( x_adv, y_test )
