In [0]:
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

In [0]:
import os
import sys
import math
import time
import itertools

import tensorflow as tf
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tensorflow import keras
from sklearn.preprocessing import OneHotEncoder

%matplotlib inline

# Adversarial attacks

![](https://openai.com/content/images/2017/02/adversarial_img_1.png)

# Fast Gradient Sign Method(FGSM)

FGSM is a single step attack, ie.. the perturbation is added in a single step. 

## Untargetted attack

![](https://cv-tricks.com/wp-content/uploads/2018/05/fgsm.png)

## Targetted attack

![](https://cv-tricks.com/wp-content/uploads/2018/05/target_fgsm.png)

# Iterative methods

Instead of applying the perturbation in a single step, it is applied multiple times with a small step size. In this method, the pixel values of intermediate results are clipped after each step to ensure that they are in an 𝜺 neighbourhood of the original image ie.. within the range $[X_{i,j}−𝜺, X_{i,j}+𝜺]$, $X_{i,j}$ being the pixel value of the previous image.

![](https://cv-tricks.com/wp-content/uploads/2018/05/iterative.png)

# Load CIFAR10 dataset

In [0]:
NUM_CLASSES = 10
cifar = keras.datasets.cifar10

(X_train, y_train), (X_test, y_test) = cifar.load_data()

X_train = X_train / 255
X_test = X_test / 255

X_train = X_train.astype(np.float32)
X_test = X_test.astype(np.float32)

y_train = keras.utils.to_categorical(y_train, num_classes=NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, num_classes=NUM_CLASSES)

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
print(X_train.dtype, y_train.dtype, X_test.dtype, y_test.dtype)

In [0]:
cifar_classes = np.array([
    'airplane',								
    'automobile', 										
    'bird', 										
    'cat', 										
    'deer', 										
    'dog', 										
    'frog', 										
    'horse', 										
    'ship', 										
    'truck'
])

# Define and train ConvNet for CIFAR classification

You can experiment with various architectures.

In [0]:
x_in = keras.layers.Input(shape=(32,32,3))
x = keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation='relu')(x_in)
x = keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation='relu')(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation='relu')(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation='relu')(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(128, activation='relu')(x)
logits = keras.layers.Dense(10, activation="softmax")(x)

In [0]:
model = keras.models.Model(inputs=[x_in], outputs=[logits])

In [0]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam', 
              metrics=["accuracy"])

In [0]:
model.fit(X_train, y_train, epochs=10, batch_size=128,
          validation_data=(X_test, y_test))

# Select image to attack, plot it and see the network classification

In [0]:
def get_classification_report(image, k=3):
    pred_probas = model(image).numpy().flatten()
    pred_labels = np.argsort(pred_probas)[::-1]
    pred_classes = cifar_classes[pred_labels]
    for i in range(k):
        print("Predicted class: %s, with proba: %.3f" % (pred_classes[i], pred_probas[pred_labels[i]]))

In [0]:
img_ind = 1
test_image = X_test[img_ind:(img_ind+1)].astype(np.float32)
test_image = tf.convert_to_tensor(test_image)
print("True image class: %s" % cifar_classes[np.argmax(y_test[img_ind])])
plt.imshow(test_image[0])

In [0]:
get_classification_report(test_image, k=3)

# 1. Untargetted

![](https://cv-tricks.com/wp-content/uploads/2018/05/fgsm.png)

Define the cross entropy loss [keras.losses.CategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy)

In [0]:
cce = keras.losses.CategoricalCrossentropy()

Define the epsilon. You can experiment with different epsilon sizes.

In [0]:
eps = 0.02

**Define the attack**

Get the image, compute its logits predicted by the network, then compute the label of the most probable image (you could use [tf.math.argmax](https://www.tensorflow.org/api_docs/python/tf/math/argmax) function), make the one-hot-encoding vector (you could use [tf.one_hot](https://www.tensorflow.org/api_docs/python/tf/one_hot) function) and calculate the cross-entropy between one-hot-encoding and predicted logits.


We will need to compute the gradient of the cross-entropy loss with respect to the image. We are using the eager execution mode, so that you should remember about defining the [gradient taping](https://www.tensorflow.org/api_docs/python/tf/GradientTape) and watching the *test_image* tensor.

In [0]:
with tf.GradientTape() as g:
    # Watch the test image
    ###
    # Get the logits predicted by the network
    logits = ###
    # Get the label of the image predicted by the network 
    label = ###
    # Make one hot encoding from this label (you could use tf.one_hot function)
    one_hot_label = ###
    # Get the cross-entropy loss between one hot encoding and predicted logits
    image_cross_entropy = ###

Calculate the gradient of cross entropy loss with respect to the input image

In [0]:
grad = ###
grad

Make the attack! Add the epsilon times sign of the calculated gradient to your image. Don't forget about clipping the image (pixels should be in range $[0, 1]$).

In [0]:
image_adversarial = ###
image_adversarial

## Plot the image after attack and check the network classification

In [0]:
plt.imshow(image_adversarial[0])

In [0]:
get_classification_report(image_adversarial, k=3)

# 2. Targetted

![](https://cv-tricks.com/wp-content/uploads/2018/05/target_fgsm.png)

Define the target class. You can print all cifar classes in the correct order to make the decision. 

In [0]:
target_class = 0
cifar_classes

Define the epsilon. You can experiment with different epsilon sizes.


In [0]:
eps = 0.04

Define the attack. The code should look almost the same as the code for untargetted atack.

In [0]:
###

## Plot the image after attack and check the network classification

In [0]:
plt.imshow(image_adversarial[0])

In [0]:
get_classification_report(image_adversarial, k=3)

# 3. Targetted iterative

![](https://cv-tricks.com/wp-content/uploads/2018/05/iterative.png)

Define the target class. You can print all cifar classes in the correct order to make the decision. 

In [0]:
target_class = 0
cifar_classes

Define the epsilon and iterations number.

During this attack you will apply the gradient multiple times, so that the epsilon value should be lower.

In [0]:
eps = 0.01
n_iters = 10

Define the attack. The code should look almost the same as the code for FGSM atacks. The difference is that you should apply the gradient *n_iters* times.

In [0]:
###

## Plot the image after attack and check the network classification

In [0]:
plt.imshow(image_adversarial[0])

In [0]:
get_classification_report(image_adversarial, k=3)

# 4. Retrain with adversarial examples

We don't have good methods for defense against adversarial attacks.


One of the easiest and most brute-force way to defend against these attacks is to pretend to be the attacker, generate a number of adversarial examples against your own network, and then explicitly train the model to not be fooled by them. 

Define the function that takes the image and calculates the targeted iterative attack. This time the target class wont be defined by a random guess. Instead we will set it to the least likely class predicted by the network.

In [0]:
eps = 0.01
n_iters = 10


def iterative_least_likely_method(image):
    ###

    return image_adversarial

Define the function that apply *iterative_least_likely_method* to the given dataset, in batches.

In [0]:
def get_dataset_adversarials(X, batch_size):
    ###
    
    return X_adv

Check the accuracy on test adversarial dataset

In [0]:
X_test_adv = get_dataset_adversarials(X_test, 1000)

In [0]:
model.evaluate(X_test_adv, y_test)

Create the adversarial dataset using predefined functions and train the model on it.

In [0]:
X_train_adv = get_dataset_adversarials(X_train, 1000)

In [0]:
model.fit(X_train_adv, y_train, epochs=1, batch_size=128,
          validation_data=(X_test, y_test))

Check the accuracy on test adversarial dataset, after the adversarial training

In [0]:
model.evaluate(X_test_adv, y_test)

# Images sources

Images used in this notebook comes from the following web pages and papers:


1.   [Explaining and Harnessing Adversarial Examples](https://arxiv.org/pdf/1412.6572.pdf)
2.   [OpenAI blog - Attacking Machine Learning
with Adversarial Examples](https://openai.com/blog/adversarial-example-research/)
3.   [Breaking Deep Learning with Adversarial examples using Tensorflow](https://cv-tricks.com/how-to/breaking-deep-learning-with-adversarial-examples-using-tensorflow/)

