# Creating Adversarial Examples

In this notebook, you'll learn about the field of adversarial learning, where you exploit the way deep learning systems work to:

- make the prediction false
- steer the prediction in a direction of your choosing

Although you can create many types of adversarial examples, we'll use [foolbox](https://github.com/bethgelab/foolbox) to generate adversarial images and test them with a [ResNet50](https://en.wikipedia.org/wiki/Residual_neural_network) model trained on [ImageNet](https://en.wikipedia.org/wiki/ImageNet) data -- so that we can then "see" the adversarial attack!

In [None]:
import foolbox
import keras
import numpy as np
import eagerpy as ep
import os
import matplotlib.pyplot as plt
from keras.applications.resnet50 import ResNet50, \
    preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array

from foolbox.criteria import Misclassification

In [None]:
# you will use this later to investigate image labels
import json
class_idx = json.load(open('data/imagenet_class_index.json'))


In [None]:
## Note: If you are using connected internet can use this line!

kmodel=ResNet50(weights='imagenet')

### Testing the ResNet Model

First, let's take a look at how the model works normally with some examples in this repository.

In [None]:
%matplotlib inline

img_path = 'data/img/cat.jpg'
img = load_img(img_path, target_size=(224, 224))

x = img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = kmodel.predict(x)

for pred in decode_predictions(preds, top=3)[0]:
    print(pred)

plt.imshow(img)
plt.axis('off')

### Your Turn

- Try a few of the other photos in the photo directory!

In [None]:
# %load solutions/0301_shark.py


### Preparing the Attack

Foolbox has a variety of attacks to choose from. First, you need to initialize the model within Foolbox to allow foolbox to access the model's underlying weights/biases and outputs in order to build attacks appropriately.

In [None]:
# initialize foolbox model

# ::-1 reverses the color channels, 
# because Keras ResNet50 expects BGR instead of RGB
preprocessing = dict(flip_axis=-1, mean=[104.0, 116.0, 123.0])

fmodel = foolbox.models.TensorFlowModel(kmodel, 
                                   bounds=(0, 255), 
                                   preprocessing=preprocessing)

Now you can grab a set of images and labels to use for the attack.

In [None]:
images, labels = ep.astensors(*foolbox.samples(fmodel, dataset="imagenet", batchsize=16))
clean_acc = foolbox.accuracy(fmodel, images, labels)
print(f"clean accuracy:  {clean_acc * 100:.1f} %")

## Fast Gradient Sign Method Attack

You'll use first a very popular attack called the Fast-Gradient Sign Method, that exploits the gradients of the model to "climb" rather than "descend" the gradient. It then generates an adversarial image by adding small noise perturbations to the image to make the model misclassify the image.


This attack was first discovered by [Goodfellow et al., 2014](https://arxiv.org/abs/1412.6572).

In [None]:
# these epsilons steer "how much noise": bigger epsilon, bigger noise 
# NOTE: the epsilons are expected to be floats between 0 and 1 and apply to the noise "clipping"

epsilons=[0.1, 0.3, 0.5]
attack = foolbox.attacks.FGSM()

In [None]:
raw_advs, adversarial_images, success = attack(fmodel, images, labels, epsilons=epsilons)

In [None]:
success

In [None]:
%matplotlib inline

plt.figure()

plt.subplot(1, 3, 1)
plt.title('Original')
example_image = images[1].numpy() / 255
# division by 255 to convert [0, 255] to [0, 1]
plt.imshow(example_image)  
plt.axis('off')

plt.subplot(1, 3, 2)
plt.title('Adversarial')
adversarial_image = adversarial_images[2][1].numpy() / 255 # you can edit the indexing here to view different attack successes
plt.imshow(adversarial_image)  
# ::-1 to convert BGR to RGB
plt.axis('off')

plt.subplot(1, 3, 3)
plt.title('Difference')
difference = adversarial_image - example_image
plt.imshow(difference / abs(difference).max() * 0.2 + 0.5)
plt.axis('off')

plt.show()

## Your Turn

- Did it work? What was the predicted label?

In [None]:
adv_x = np.expand_dims(adversarial_images[2][1].numpy().copy(), axis=0) # you can edit the indexing here to view different attack successes
adv_x = preprocess_input(adv_x)

img_x = np.expand_dims(images[1].numpy().copy(), axis=0)
img_x = preprocess_input(img_x)

In [None]:
# %load solutions/02_adversarial_vs_original_predictions.py


## Your Turn

- Now try and make an adversarial attack using a different attack method. See all attacks here: https://foolbox.readthedocs.io/en/latest/modules/attacks.html
- Some of the attacks will need you to define Criteria (example below)
- You may also try a targeted class attack: (see: https://foolbox.readthedocs.io/en/latest/modules/criteria.html#foolbox.criteria.TargetClass and classes with labels https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)


In [None]:
import random

index = random.choice(range(17, 1000, 1))
index

In [None]:
example_images, labels = ep.astensors(*foolbox.samples(fmodel, dataset="imagenet", batchsize=16, index=index))

for label in labels.numpy():
    print(class_idx[str(label)])

In [None]:
# pick which one you want and add it to the index
example_image, label = ep.astensors(*foolbox.samples(fmodel, dataset="imagenet", batchsize=1, index=index+4)) 

In [None]:
criterion = Misclassification(label)

Steps:

- Build your attack and generate the adversarial images (like you did above)
- Test the predictions on any successes
- Choose one or two to visually inspect

Reuse the code above, or write your own! There are also some small solution codes if you want to load them and play around.

In [None]:
# %load solutions/0303_carliniwagner_attack.py


In [None]:
# %load solutions/0304_test_predictions.py


In [None]:
# %load solutions/0305_plot_examples.py

