#  Adversarial Attack

We assume that you have read the [Adversarial Attacks Tutorial](./adversarial_attacks_tutorial.ipynb) carefully and run that notebook from scratch. 

In this notebook, you are required to process adversarial attacks for a small subset of [ImageNet Dataset](http://www.image-net.org/). We prepared 100 images from different categories (in `./input_dir/`), and the labels are encoded in `./input_dir/clean_image.list`.

For evaluation, each adversarial image generated by the attack model will be fed to an evaluation model, and we will calculate the successful rate of adversarial attacks. **The adversarial images that can fool the evaluation model and also the perturbations are less than *Max_Distance* will be considered as a success**, where the perturbations are measured by the L2 distance between the adversarial image and original image.

There are three tasks:
- **White-box attack**: the adversarial examples are crafted for the pretrained **MobileNetV2** model, and evaluated on the same **MobileNetV2** model.
- **Black-box attack**: the adversarial examples are crafted for the pretrained **MobileNetV2** model, but evaluated on the **MobileNet** model, which is different from MobileNetV2.
- **Black-box attack (after submission)**: you are required to submit the generated adversarial examples at last, and we will evaluate your adversarial examples on another model, which is invisible for you.

### Goal

We provide a simple FGSM example here, and you are required to implement your own attack methods to **achieve the attack successful rate as high as possible** (for all three tasks).

At last, you are required to submit this jupyter notebook and the generated adversarial images.
The final grade will be scored according to the **white-box successful rate**, **black-box successful rate**, **white-box (after submission) successful rate**.

In [2]:
import sys,os
from PIL import Image
import numpy as np
import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt
from time import perf_counter
from utils import *
import tensornets as nets

## Load Images
We provided 100 images from different categories in `./input_dir/`, and the labels are encoded in `./input_dir/clean_image.list`.

In [3]:
images = []
with open('./input_dir/clean_image.list', 'r') as f:
    img_lines = f.readlines()
    for img_line in img_lines:
        imgname, label = img_line.strip('\n').split(' ')
        images.append((imgname, int(label)))

## Image Processing

Each input image must be preprocessed before fed into the models, such as normalization(subtracting the mean and then dividing by the standard deviation). In addition, each generated adversarial image must be reversely processed.
Note that different pretrained models in Tensorflow require different preprocessing.
We provided several `preprocess` and `reverse_preprocess` function for different deep networks in `./utils.py`.

By default, the two functions are designed for mobilenet models.
```python
preprocess(image, model="mobilenet")
reverse_preprocess(image, model="mobilenet")
```

If you want to change to other models, see `./utils.py` for more details.

We have downloaded several popular pretrained models, you can adopt these models as the attacked model.
## Pretrained Models in tensornets (nets)
    'DenseNet121', 'DenseNet169', 'DenseNet201', 
    'Inception1', 'Inception2', 'Inception3', 'Inception4', 'InceptionResNet2',
    'MobileNet25', 'MobileNet50', 'MobileNet75', MobileNet100', 
    'MobileNet35v2', 'MobileNet50v2', 'MobileNet75v2', 'MobileNet100v2', 'MobileNet130v2', 'MobileNet140v2', 
    'NASNetAlarge', 'NASNetAmobile', 'PNASNetlarge',
    'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50v2', 'ResNet101v2', 'ResNet152v2', 'ResNet200v2', 
    'ResNeXt50c32', 'ResNeXt101c32', 'ResNeXt101c64', 'WideResNet50',
    'VGG16', 'VGG19', 
    'SqueezeNet'.

## Define the Attack Method

### TODO: implement your own attack methods.

###  Tips:
- We provide the simple FGSM attack method as an example here. You can try other attack methods (learned in this course), such as the iterative methods.
- For black-box attack, we adopt the `MobileNetV2` as the attacked model, and the generated adversarial images may failed in `MobileNet` (which indicates poor transferability). You can try other attacked models (except `MobileNet`) or model ensemble.

In [9]:
class Attack:
    def __init__(self, input_image):
        self.input_image = input_image
        
        # loss function
        self.loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
        
        # TODO: you may change your target model.
        # load the model which will be attacked
        self.attacked_model = nets.MobileNet50v2(input_image, reuse=tf.AUTO_REUSE)
        
    def generate_adversarial_example(self, input_label, attack_mode, attack_method, eps, T=0, miu=0):
        input_image = self.input_image
        prediction = self.attacked_model
        loss = self.loss_object(input_label, prediction)
        
        # TODO: implement your own attack methods.
        if attack_mode=='whitebox':
            if attack_method=='FGSM':
                # Get the gradients of the loss w.r.t to the input image.
                gradient = tf.gradients(loss, input_image)
                # Get the sign of the gradients to create the perturbation (FGSM)
                signed_grad = tf.sign(gradient)[0]
                # Epsilon in FGSM, you can try another value.
                adv_image = input_image + eps * signed_grad
                # Clip the generated image between -1 and 1. Note that different pretrained models require different ranges.
                adv_image = tf.clip_by_value(adv_image, -1, 1)

            elif attack_method=='IFGSM':
                print(str(T) + ' iterations')
                
                adv_x = input_image
                for i in range(1, T+1):
                    alpha = eps / float(i)
                    gradient = tf.gradients(loss, adv_x)
                    signed_grad = tf.sign(gradient)[0]
                    adv_x = adv_x + alpha * signed_grad
                    adv_x = tf.clip_by_value(adv_x, -1, 1)
                    prediction = nets.MobileNet50v2(adv_x, reuse=tf.AUTO_REUSE)
                    loss = self.loss_object(input_label, prediction)

                adv_image = adv_x

            elif attack_method=='MIFGSM':
                print(str(T) + ' iterations' + ' miu=' + str(miu))
                
                adv_x = input_image
                g=0
                for i in range(1,T+1):
                    alpha = eps / float(i)
                    gradient = tf.gradients(loss, adv_x)
                    norm_gradient = tf.norm(gradient, ord=1, axis=1) #TODO
                    g = miu*g + (gradient/norm_gradient)
                    
                    signed_grad = tf.sign(g)[0]
                    adv_x = adv_x + alpha * signed_grad
                    adv_x = tf.clip_by_value(adv_x, -1, 1)
                    prediction = nets.MobileNet50v2(adv_x, reuse=tf.AUTO_REUSE)
                    loss = self.loss_object(input_label, prediction)

                adv_image = adv_x
                
        elif attack_mode=='blackbox':
            if attack_method=='FGSM':
                # Get the gradients of the loss w.r.t to the input image.
                gradient = tf.gradients(loss, input_image)
                # Get the sign of the gradients to create the perturbation (FGSM)
                signed_grad = tf.sign(gradient)[0]
                # Epsilon in FGSM, you can try another value.
                adv_image = input_image + eps * signed_grad
                # Clip the generated image between -1 and 1. Note that different pretrained models require different ranges.
                adv_image = tf.clip_by_value(adv_image, -1, 1)

            elif attack_method=='IFGSM':
                print(str(T) + ' iterations')
                
                adv_x = input_image
                for i in range(1, T+1):
                    alpha = eps / float(i)
                    gradient = tf.gradients(loss, adv_x)
                    signed_grad = tf.sign(gradient)[0]
                    adv_x = adv_x + alpha * signed_grad
                    adv_x = tf.clip_by_value(adv_x, -1, 1)
                    prediction = nets.MobileNet50v2(adv_x, reuse=tf.AUTO_REUSE)
                    loss = self.loss_object(input_label, prediction)

                adv_image = adv_x

            elif attack_method=='MIFGSM':
                print(str(T) + ' iterations' + ' miu=' + str(miu))
                
                adv_x = input_image
                g=0
                for i in range(1,T+1):
                    alpha = eps / float(i)
                    gradient = tf.gradients(loss, adv_x)
                    norm_gradient = tf.norm(gradient, ord=1, axis=1) 
                    g = miu*g + (gradient/norm_gradient)
                    
                    signed_grad = tf.sign(g)[0]
                    adv_x = adv_x + alpha * signed_grad
                    adv_x = tf.clip_by_value(adv_x, -1, 1)
                    prediction = nets.MobileNet50v2(adv_x, reuse=tf.AUTO_REUSE)
                    loss = self.loss_object(input_label, prediction)

                adv_image = adv_x
                
            elif attack_method=='MIFGSM_ensemble':
                print(str(T) + ' iterations' + ' miu=' + str(miu))
                
                adv_x = input_image
                g=0
                for i in range(1,T+1):
                    alpha = eps / float(i)
                    gradient = tf.gradients(loss, adv_x)
                    norm_gradient = tf.norm(gradient, ord=1, axis=1) 
                    g = miu*g + (gradient/norm_gradient)
                    
                    signed_grad = tf.sign(gradient)[0]
                    adv_x = adv_x + alpha * signed_grad
                    adv_x = tf.clip_by_value(adv_x, -1, 1)
                    
                    prediction1 = nets.MobileNet25(input_image, reuse=tf.AUTO_REUSE)
                    prediction2 = nets.MobileNet50(input_image, reuse=tf.AUTO_REUSE)
                    prediction3 = nets.MobileNet75(input_image, reuse=tf.AUTO_REUSE)
                    prediction4 = nets.MobileNet100(input_image, reuse=tf.AUTO_REUSE)
                    prediction5 = nets.MobileNet35v2(input_image, reuse=tf.AUTO_REUSE)
                    prediction6 = nets.MobileNet50v2(input_image, reuse=tf.AUTO_REUSE)
                    prediction7 = nets.MobileNet75v2(input_image, reuse=tf.AUTO_REUSE)
                    prediction8 = nets.MobileNet100v2(input_image, reuse=tf.AUTO_REUSE)
                    prediction9 = nets.MobileNet130v2(input_image, reuse=tf.AUTO_REUSE)
                    prediction10 = nets.MobileNet140v2(input_image, reuse=tf.AUTO_REUSE)
                    
                    # Ensemble loss
                    loss1 = self.loss_object(input_label, prediction1)
                    loss2 = self.loss_object(input_label, prediction2)
                    loss3 = self.loss_object(input_label, prediction4)
                    loss4 = self.loss_object(input_label, prediction4)
                    loss5 = self.loss_object(input_label, prediction5)
                    loss6 = self.loss_object(input_label, prediction6)
                    loss7 = self.loss_object(input_label, prediction7)
                    loss8 = self.loss_object(input_label, prediction8)
                    loss9 = self.loss_object(input_label, prediction9)
                    loss10 = self.loss_object(input_label, prediction10)
                    
                    w = [0.25, 0.20, 0.35, 0.20]
                    loss = - w[0]*loss1 + w[1]*loss2 + w[2]*loss3 + w[3]*loss4 + w[4]*loss5 + w[5]*loss6 + 
                            w[6]*loss7 + w[7]*loss + w[8]*loss9 + w[9]*loss10
                
                adv_image = adv_x
        # END TODO
        
        return adv_image

# Evaluation
Define the evaluation functions for both white-box and black-box attack.
**You are not allowed to modify these codes.**

- For white-box attack, the adversarial images are evaluated on the `MobileNetv2` model.
- For black-box attack, the adversarial images are evaluated on the `MobileNet` model. Therefore, you can not use the same `MobileNet` model as the attacked model.

The `Max_Distance` equals to 5.0 here.

In [19]:
Max_Distance = 5.0

class WhiteBox_Evaluation:
    def __init__(self, adv_image):
        self.adv_image = adv_image
        self.eval_model = nets.MobileNet50v2(adv_image, reuse=tf.AUTO_REUSE)
        
    def get_adv_label(self):
        adv_probs  = self.eval_model
        adv_label = tf.argmax(adv_probs,1)
        return adv_label
    
class BlackBox_Evaluation:
    def __init__(self, adv_image):
        self.adv_image = adv_image
        self.eval_model = nets.MobileNet50(adv_image, reuse=tf.AUTO_REUSE)
        
    def get_adv_label(self):
        adv_probs  = self.eval_model
        adv_label = tf.argmax(adv_probs,1)
        return adv_label
    
# init the attacker
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.log_device_placement = True
config.allow_soft_placement = True
sess = tf.Session(config=config)

# load and preprocess image
input_path = tf.placeholder(dtype=tf.string)
input_label = tf.placeholder(shape=None, dtype=tf.int32)
image_raw = tf.io.read_file(input_path)
image = tf.image.decode_jpeg(image_raw, channels=3)
image = image[None, ...]

def Attack_Session(attack_method, attack_mode, eps, T=0, miu=0):
    print(attack_method, attack_mode, eps)
    input_image = preprocess(image)
    attacker = Attack(input_image)

    # generate adversarial example
    adv_image_t = attacker.generate_adversarial_example(input_label, attack_mode, attack_method, eps, T, miu)
    eval_model_white = WhiteBox_Evaluation(adv_image_t)
    eval_model_black = BlackBox_Evaluation(adv_image_t)

    # measured by L2 distance
    distance_t = tf.math.reduce_euclidean_norm(input_image - adv_image_t)

    adv_label_white_t = eval_model_white.get_adv_label()
    adv_label_black_t = eval_model_black.get_adv_label()

    saved_image_t = reverse_preprocess(adv_image_t)[0]

    sess.run(tf.global_variables_initializer())
    _ = sess.run([attacker.attacked_model.pretrained(), eval_model_white.eval_model.pretrained(), eval_model_black.eval_model.pretrained()])
    
    if attack_mode=='whitebox':
        success_cnt = 0
        for idx, (imgname, label) in enumerate(images):
            imgpath = './input_dir/' + imgname
            run_list = [adv_image_t, distance_t, adv_label_white_t, saved_image_t]
            feed_dict = {input_path: imgpath, input_label: label}

            adv_image, distance, adv_label, saved_image = sess.run(run_list, feed_dict)
            adv_label = adv_label[0]

            # if the adversarial image can successfully fool the attacked model, and the perturbations are less than Max_Distance
            if distance <= Max_Distance:
                success_cnt += 1 if adv_label != label else 0
            else:
                print('Max Distance Larger than '+ str(Max_Distance) + ' ' + str(distance))
                return
            
            print('{}: clean_label={:3d} adv_label={:3d} distance={:.2f}'.format(imgname,label,adv_label,distance))

            # save the generated images to './output_dir'
            saved_image = tf.image.encode_png(saved_image)
            write_ops = tf.io.write_file('./output_dir/' + imgname, saved_image)
            sess.run(write_ops)

        print()
        print('White-box attack successful rate: {}%'.format(success_cnt))
        
    elif attack_mode=='blackbox':
        success_cnt = 0
        for idx, (imgname, label) in enumerate(images):
            imgpath = './input_dir/' + imgname
            run_list = [adv_image_t, distance_t, adv_label_black_t, saved_image_t]
            feed_dict = {input_path: imgpath, input_label: label}

            adv_image, distance, adv_label, saved_image = sess.run(run_list, feed_dict)
            adv_label = adv_label[0]

            # if the adversarial image can successfully fool the attacked model, and the perturbations are less than Max_Distance
            if distance <= Max_Distance:
                success_cnt += 1 if adv_label != label else 0
            else:
                print('Max Distance Larger than '+ str(Max_Distance) + ' ' + str(distance))
                return

            print('{}: clean_label={:3d} adv_label={:3d} distance={:.2f}'.format(imgname,label,adv_label,distance))

            # save the generated images to './output_dir'
            saved_image = tf.image.encode_png(saved_image)
            write_ops = tf.io.write_file('./output_dir/' + imgname, saved_image)
            sess.run(write_ops)
    
        sess.close()
            
        print()
        print('Black-box attack successful rate: {}%'.format(success_cnt))

# White-Box Attack Evaluation

In [9]:
Attack_Session('FGSM', 'whitebox', 0.01286)

FGSM whitebox 0.01286
n02708093.JPEG: clean_label=409 adv_label=592 distance=4.24
n03000134.JPEG: clean_label=489 adv_label=353 distance=4.95
n03384352.JPEG: clean_label=561 adv_label=495 distance=4.73
n03777754.JPEG: clean_label=662 adv_label=882 distance=4.98
n03721384.JPEG: clean_label=642 adv_label=696 distance=4.94
n03424325.JPEG: clean_label=570 adv_label=518 distance=4.91
n03673027.JPEG: clean_label=628 adv_label=833 distance=4.99
n02229544.JPEG: clean_label=312 adv_label=456 distance=4.99
n07695742.JPEG: clean_label=932 adv_label=925 distance=4.98
n02018207.JPEG: clean_label=137 adv_label= 98 distance=4.98
n02107908.JPEG: clean_label=240 adv_label=178 distance=4.99
n04026417.JPEG: clean_label=748 adv_label=414 distance=4.95
n02444819.JPEG: clean_label=360 adv_label=  5 distance=4.99
n02259212.JPEG: clean_label=317 adv_label=462 distance=4.99
n02480495.JPEG: clean_label=365 adv_label=149 distance=4.97
n02095889.JPEG: clean_label=190 adv_label=212 distance=4.98
n03216828.JPEG: cl

In [35]:
Attack_Session('IFGSM', 'whitebox', eps=0.006, T=20)

IFGSM whitebox 0.006
20 iterations


KeyboardInterrupt: 

In [15]:
Attack_Session('MIFGSM', 'whitebox', eps=0.006, T=20, miu=0.45)

MIFGSM whitebox 0.006
20 iterations miu=0.45
n02708093.JPEG: clean_label=409 adv_label=606 distance=3.86
n03000134.JPEG: clean_label=489 adv_label=353 distance=4.96
n03384352.JPEG: clean_label=561 adv_label=495 distance=4.23
n03777754.JPEG: clean_label=662 adv_label=882 distance=3.90
n03721384.JPEG: clean_label=642 adv_label=696 distance=4.36
n03424325.JPEG: clean_label=570 adv_label=612 distance=4.24
n03673027.JPEG: clean_label=628 adv_label=833 distance=4.06
n02229544.JPEG: clean_label=312 adv_label=456 distance=3.86
n07695742.JPEG: clean_label=932 adv_label=925 distance=4.15
n02018207.JPEG: clean_label=137 adv_label=135 distance=4.22
n02107908.JPEG: clean_label=240 adv_label=178 distance=3.97
n04026417.JPEG: clean_label=748 adv_label= 83 distance=4.56
n02444819.JPEG: clean_label=360 adv_label=  5 distance=4.16
n02259212.JPEG: clean_label=317 adv_label=462 distance=3.83
n02480495.JPEG: clean_label=365 adv_label=341 distance=4.25
n02095889.JPEG: clean_label=190 adv_label=212 distance=

# Black-Box Attack Evaluation

In [20]:
Attack_Session('MIFGSM', 'blackbox', eps=0.006, T=20, miu=0.5)

MIFGSM blackbox 0.006
20 iterations miu=0.5
n02708093.JPEG: clean_label=409 adv_label=409 distance=3.86
n03000134.JPEG: clean_label=489 adv_label=489 distance=4.96
n03384352.JPEG: clean_label=561 adv_label=561 distance=4.23
n03777754.JPEG: clean_label=662 adv_label=662 distance=3.90
n03721384.JPEG: clean_label=642 adv_label=642 distance=4.36
n03424325.JPEG: clean_label=570 adv_label=570 distance=4.24
n03673027.JPEG: clean_label=628 adv_label=628 distance=4.06
n02229544.JPEG: clean_label=312 adv_label=872 distance=3.86
n07695742.JPEG: clean_label=932 adv_label=925 distance=4.15
n02018207.JPEG: clean_label=137 adv_label=137 distance=4.22
n02107908.JPEG: clean_label=240 adv_label=241 distance=3.97
n04026417.JPEG: clean_label=748 adv_label=748 distance=4.56
n02444819.JPEG: clean_label=360 adv_label=360 distance=4.16
n02259212.JPEG: clean_label=317 adv_label=317 distance=3.83
n02480495.JPEG: clean_label=365 adv_label=365 distance=4.25
n02095889.JPEG: clean_label=190 adv_label=190 distance=4