# Robustness to Adversarial Examples

In this notebook we test whether *patch augmentation* creates models that are more robust to adversarial examples.

To do this, we used the *CleverHans* library and created adversarial examples using the Fast Gradient Sign Method attack.

Notes:
- More methods will follow and will be added to this notebook.
- Portions of this code from COMPSCI 109B lab material, Harvard University. See <https://harvard-iacs.github.io/2019-CS109B/lecture/lab21/AdversarialNN/> for a tutorial.

## Fast Gradient Sign Method

The Fast Gradient Sign Method (FSGM)is a white-box attack, meaning you need access to the model in order for adversarial examples to be crafted. In FGSM the trained network's gradients are used to generate adversarial examples. For any given input image $x$, FGSM uses the gradients of the loss w.r.t. to $x$ to create a new image, $\tilde{x}$, that maximises the loss: 

$$
\tilde{x} = x + \epsilon \cdot \text{sign}( \nabla_x J(\theta, x, y) )
$$

See <https://www.tensorflow.org/tutorials/generative/adversarial_fgsm> for further details and a good tutorial on how to create such adversarial examples using FGSM.

However, in this notebook we will be using the CleverHans Python package rather than implementing FGSM. CleverHans allows us to interface with Keras produced models.

First we begin with imports. This model has been trained on the CIFAR-10 dataset.

In [1]:
import numpy as np
from keras.datasets import mnist
from keras.datasets import cifar10
from keras.models import load_model
import matplotlib.pyplot as plt
import tensorflow as tf
import keras

from cleverhans.utils_keras import KerasModelWrapper
from cleverhans.attacks import FastGradientMethod

session = tf.Session()
keras.backend.set_session(session)

Using TensorFlow backend.


Now we use Keras to fetch our CIFAR-10 image data, and preprocess it in the same way that was done for when the models were trained:

In [2]:
# We need the non one-hot-encoding labels later, so store these as y_train_nc and y_test_nc
(x_train, y_train_nc), (x_test, y_test_nc) = cifar10.load_data()
y_train = keras.utils.to_categorical(y_train_nc, 10)
y_test = keras.utils.to_categorical(y_test_nc, 10)

x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

x_train_mean = np.mean(x_train, axis=0)
x_train -= x_train_mean
x_test -= x_train_mean

Here we load the best performing patch augmentation model and the best performing model that used no augmentation (`network` is the model trained without any augmentation, `patch_network` was trained using 

In [3]:
model = load_model('./cifar10_no_aug_ResNet20v2_model.114.h5')
patch_model = load_model('./cifar10_p05_a05_ResNet20v2_model.105.h5')

Both networks were of the type ResNet20v2, the patch augmentation parameters were `probability=0.5` and `patch_area=0.5`.

Here we load in the network using the Keras model wrapper, and also set the $\epsilon$ value (`fgsm_rate`):

In [4]:
wrap = KerasModelWrapper(patch_model)

x = tf.placeholder(tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(tf.float32, shape=(None, 10))

fgsm = FastGradientMethod(wrap, sess=session)

# Here we set epsilon
fgsm_rate = 0.001

fgsm_params = {'eps': fgsm_rate,'clip_min': -1.,'clip_max': 1.}
adv_x = fgsm.generate(x, **fgsm_params)
adv_x = tf.stop_gradient(adv_x)
adv_prob = patch_model(adv_x)

Here we define the number of adversarial samples we wish to create using `test_cases`. We compare the network's performance on the adversarial examples on both networks to see how well each network can handle adversarial attacks. 

In our experiments, using all test samples would cause 'Resource exhausted' errors.

In [5]:
test_cases = 1000
fetches = [adv_prob]
fetches.append(adv_x)
outputs = session.run(fetches=fetches, feed_dict={x:x_test[:test_cases]}) 
adv_prob = outputs[0]
adv_examples = outputs[1]

Check how many of the adversarial examples were correctly predicted:

In [6]:
correct = 0
for i in range(test_cases):
    if np.argmax(adv_prob[i]) == np.argmax(y_test[i]):
        correct += 1
print("%s/%s correctly predicted." % (correct, test_cases))

725/1000 correctly predicted.


Compare this to the non-augmented network:

In [7]:
wrap = KerasModelWrapper(model)
x = tf.placeholder(tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(tf.float32, shape=(None, 10))
fgsm = FastGradientMethod(wrap, sess=session)
fgsm_params = {'eps': fgsm_rate,'clip_min': -1.,'clip_max': 1.}
adv_x = fgsm.generate(x, **fgsm_params)
adv_x = tf.stop_gradient(adv_x)
adv_prob = model(adv_x)
fetches = [adv_prob]
fetches.append(adv_x)
outputs = session.run(fetches=fetches, feed_dict={x:x_test[:test_cases]}) 
adv_prob = outputs[0]
adv_examples = outputs[1]
correct = 0
for i in range(test_cases):
    if np.argmax(adv_prob[i]) == np.argmax(y_test[i]):
        correct += 1
print("%s/%s correctly predicted." % (correct, test_cases))

643/1000 correctly predicted.


As can be seen, the network trained with patch augmentation is more resilient to this type of adversarial attack.

### Adjusting epsilon

If we adjust $\epsilon$ we can create adversarial images that are more difficult for the network to classify. The higher the value for $\epsilon$ the more noise they contain, until they become perceptible even to the human eye.

When we adjust $\epsilon$ to a higher value, let's say $0.03$ we see that both networks perform much worse, but the network trained using patch augmentation still performs better of the two:

In [8]:
fgsm_rate = 0.03

# Patch Augmentation Network:
wrap = KerasModelWrapper(patch_model)
x = tf.placeholder(tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(tf.float32, shape=(None, 10))
fgsm = FastGradientMethod(wrap, sess=session)
fgsm_params = {'eps': fgsm_rate,'clip_min': -1.,'clip_max': 1.}
adv_x = fgsm.generate(x, **fgsm_params)
adv_x = tf.stop_gradient(adv_x)
adv_prob = patch_model(adv_x)
fetches = [adv_prob]
fetches.append(adv_x)
outputs = session.run(fetches=fetches, feed_dict={x:x_test[:test_cases]}) 
adv_prob = outputs[0]
adv_examples = outputs[1]
correct = 0
for i in range(test_cases):
    if np.argmax(adv_prob[i]) == np.argmax(y_test[i]):
        correct += 1
print("Patch augmentation: %s/%s correctly predicted." % (correct, test_cases))

# No Augmentation Network:
wrap = KerasModelWrapper(model)
x = tf.placeholder(tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(tf.float32, shape=(None, 10))
fgsm = FastGradientMethod(wrap, sess=session)
fgsm_params = {'eps': fgsm_rate,'clip_min': -1.,'clip_max': 1.}
adv_x = fgsm.generate(x, **fgsm_params)
adv_x = tf.stop_gradient(adv_x)
adv_prob = model(adv_x)
fetches = [adv_prob]
fetches.append(adv_x)
outputs = session.run(fetches=fetches, feed_dict={x:x_test[:test_cases]}) 
adv_prob = outputs[0]
adv_examples = outputs[1]
correct = 0
for i in range(test_cases):
    if np.argmax(adv_prob[i]) == np.argmax(y_test[i]):
        correct += 1
print("No augmentation %s/%s correctly predicted." % (correct, test_cases))

Patch augmentation: 201/1000 correctly predicted.
No augmentation 138/1000 correctly predicted.
