<a href="https://colab.research.google.com/github/ShowLongYoung/SecurePrivateAILab/blob/solution/3_defend_cnn_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Defense with adversarial training

In this section we will use adversarial training to harden our CNN against adversarial examples.

In adversarial training the dataset get "augmented" with adversarial examples that are correctly labeled. This way the network learns that such pertubations are possible and can adapt to them.

We will be using the IBM Adversarial Robustness Toolbox in this exercise. It offers a very easy-to-use implementation of adversarial training and a number of other defenses.
https://github.com/IBM/adversarial-robustness-toolbox


We start out by importing most of the modules and functions we will need.

In [None]:
!pip install tensorflow-gpu==1.15.2 keras==2.2.3
!pip install adversarial-robustness-toolbox

In [None]:
# most of our imports
import warnings
import numpy as np
import os
import keras
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense
%matplotlib inline
import matplotlib.pyplot as plt
# import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from art.estimators.classification import KerasClassifier

Instructions for updating:
non-resource variables are not supported in the long term


In [None]:
# extract data with one and zero labels
def exract_ones_and_zeroes( data, labels ):
    data_zeroes = ...
    data_ones = ...
    x = np.vstack( ... )

    x = x / 255.
    print( x.shape )

    labels_zeroes = ...
    labels_ones = ...
    y = np.append( ... )

    return x, y

In [None]:
# convert image format to match the keras
def convert_to_keras_image_format( x_train, x_test ):
    if keras.backend.image_data_format( ) == 'channels_first':
        x_train = x_train.reshape( ... )
        x_test = x_test.reshape( ... )
    else:
        x_train = x_train.reshape( ... )
        x_test = x_test.reshape( ... )

    return x_train, x_test

In [None]:
# create cnn model
def mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=2 ):
    # define the classifier
    clf = keras.Sequential( )
    clf.add( Conv2D( 32, kernel_size=(3, 3), activation='relu', input_shape=x_train.shape[ 1: ] ) )
    clf.add( Conv2D( 64, (3, 3), activation='relu' ) )
    clf.add( MaxPooling2D( pool_size=(2, 2) ) )
    clf.add( Dropout( 0.25 ) )
    clf.add( Flatten( ) )
    clf.add( Dense( 128, activation='relu' ) )
    clf.add( Dropout( 0.5 ) )
    clf.add( Dense( y_train.shape[ 1 ], activation='softmax' ) )

    clf.compile( loss=keras.losses.categorical_crossentropy,
                 optimizer='adam',
                 metrics=[ 'accuracy' ] )

    clf.fit( ... )
    clf.summary( )
    score = ...
    print( 'Test loss:', ... )
    print( 'Test accuracy:', ... )

    return clf

We start out by loading the data, preparing it and training our CNN.

In [None]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# extract ones and zeroes data and labels
x_train, y_train = ...
x_test, y_test = ...

# convert labels to one-hot codes
y_train = ...
y_test = ...

# convert it to a format keras can work with
x_train, x_test = convert_to_keras_image_format(...)

# need to some setup so everything gets excturted in the same tensorflow session
session = tf.Session( )
keras.backend.set_session( session )

# get and train our cnn
clf = mnist_cnn_model( ... )

We want to know how robust our model is against an attack. To do this we are calculating the `empirical robustness`. This is equivalent to computing the minimal perturbation that the attacker must introduce for a    successful attack. We are following the approach of Moosavi-Dezfooli et al. 2016 (paper link: https://arxiv.org/abs/1511.04599).

The emperical robustness method supports two attacks at the moment.
The `Fast Gradient Sign Method` and `Hop Skip and Jump`.

You can use them by passing either `fgsm` or `hsj` as parameters.
The default attack parameters are the following:
```
    "fgsm":{"eps_step": 0.1, "eps_max": 1., "clip_min": 0., "clip_max": 1.},
    "hsj" {'max_iter': 50, 'max_eval': 10000, 'init_eval': 100, 'init_size': 100}
```

In [None]:
  from art.metrics import empirical_robustness

# wrap the model an calculte emperical robustnees
wrapper = KerasClassifier(...)
x_small = x_test[ :10 ]
print( 'robustness of the undefended model',
      empirical_robustness(...))
print( 'robustness of the undefended model',
      empirical_robustness(...))

Let's create an adversarial example and see how it looks.
We want to know how to the model performs on adversarial exampels. Let's create adversarial examples out of the training set and see how the model does with it.

Below you can the keyword arguments for the attack

```
norm=np.inf, eps=.3, eps_step=0.1, targeted=False, num_random_init=0, batch_size=1, minimal=False
        """
        :param norm: The norm of the adversarial perturbation. Possible values: np.inf, 1 or 2.
        :param eps: Attack step size (input variation)
        :param eps_step: Step size of input variation for minimal perturbation computation
        :param targeted: Indicates whether the attack is targeted (True) or untargeted (False)
        :param num_random_init: Number of random initialisations within the epsilon ball. For random_init=0 starting at
            the original input.
        :param batch_size: Size of the batch on which adversarial samples are generated.
        :param minimal: Indicates if computing the minimal perturbation (True). If True, also define `eps_step` for
                        the step size and eps for the maximum perturbation.
   
```

In [None]:
# create an adversarial example with fgsm
from art.attacks.evasion import FastGradientMethod
fgsm = FastGradientMethod(...)
x_adv = fgsm.generate(...)
print( 'class prediction for the adversarial sample:',
       clf.predict(...) )

# create adversarial examples for the all of the set
x_test_adv = ...
print( 'accuracy on adversarial examples:' )
print( wrapper._model.evaluate( ... )[ 1 ] )

class prediction for the adversarial sample: [[0.8604998  0.13950023]]
accuracy on adversarial examples:
0.9475


## Adversarial Training

Let's create a new untrained model with the same architecture that we have been using so far.

We will train the model using adversarial training framework. The idea is very simple:

1.   Train the model for 1 epoch
2.   Create adversarial examples using FGSM
3.   Enhance training data by mixing it with the adversarial examples. (Only mix in the adversarial examples created in this iteartion)
4.   Goto 1

We will be using the FGSM attack from `art` this time.




In [None]:
# create a new untrained model and wrap it
new_model = mnist_cnn_model( x_train, y_train, x_test, y_test, epochs=0 )
defended_model = KerasClassifier(...)
# define the attack we are using
fgsm = ...

# parameters
epochs = 5 # number of iterations that we will perform training for
ratio = .5  # ratio of the test set that will get turned into adversarial examples
            # each iteration


# some helpers
idx = np.arange( x_train.shape[ 0 ], dtype=np.int )

# create varialbes to hold the training data.
# for now it is just the normal training data. we'll mix in the
# adversarial examples in later
x_train_enhanced = x_train
y_train_enhanced = y_train


for i in range( epochs ):
  # train model for one epoch
  defended_model.fit( ... )

  # shuffle
  np.random.shuffle( idx )
  # pick the subest of the train data to turn into adverarial examples
  x_train_ = x_train[ idx[ int( idx.shape[ 0 ] * ratio ) : ]  ]
  y_train_ = y_train[ idx[ int( idx.shape[ 0 ] * ratio ) : ]  ]

  # create adversarial examples
  x_adv = ...
  # add the adversarial examples to the training data
  x_train_enhanced = np.vstack( ... )
  y_train_enhanced = np.vstack( ( ... )

# training is done. let's evaulate the performance on the test set
# and adversarial examples
acc = defended_model._model.evaluate( ... )[ 1 ]
print( 'acc on the test data: ', acc )

# and now on adversarial examples
x_test_adv = ...
acc =  ...
print( 'accuracy on adversarial examples: ', acc )


To use the adversarial training that comes with `art` we need to pass our wrapped model to an `AdversarialTrainer` instance. The `AdversarialTrainer` also needs an instance of the attack that will be used to create the adversarial examples.


In [None]:
from art.defences.trainer import AdversarialTrainer

# get a new untrained model and warp it
new_model = mnist_cnn_model( ... )
defended_model = KerasClassifier(...)
# define the attack we are using
fgsm = ...

# define the adversarial trainer and train the new network
adversarial_tranier = AdversarialTrainer(...)
adversarial_tranier.fit(...)

# evaluate how good our model is
defended_model._model.evaluate(...)

# and now on adversarial examples
x_test_adv = ...
acc =  ...
print( 'loss and accuracy on adversarial examples: ', acc )

# calculate the empiracal robustness
print( 'robustness of the defended model',
      empirical_robustness(...) )

x_adv = ...
print( 'class prediction for the adversarial sample:',
       clf.predict( ... )
     )