#  Adversarial Attack

We assume that you have read the [Adversarial Attacks Tutorial](./adversarial_attacks_tutorial.ipynb) carefully and run that notebook from scratch. 

In this notebook, you are required to process adversarial attacks for a small subset of [ImageNet Dataset](http://www.image-net.org/). We prepared 100 images from different categories (in `./input_dir/`), and the labels are encoded in `./input_dir/clean_image.list`.

For evaluation, each adversarial image generated by the attack model will be fed to an evaluation model, and we will calculate the successful rate of adversarial attacks. **The adversarial images that can fool the evaluation model and also the perturbations are less than *Max_Distance* will be considered as a success**, where the perturbations are measured by the L2 distance between the adversarial image and original image.

There are three tasks:
- **White-box attack**: the adversarial examples are crafted for the pretrained **MobileNetV2** model, and evaluated on the same **MobileNetV2** model.
- **Black-box attack**: the adversarial examples are crafted for the pretrained **MobileNetV2** model, but evaluated on the **MobileNet** model, which is different from MobileNetV2.
- **Black-box attack (after submission)**: you are required to submit the generated adversarial examples at last, and we will evaluate your adversarial examples on another model, which is invisible for you.

### Goal

We provide a simple FGSM example here, and you are required to implement your own attack methods to **achieve the attack successful rate as high as possible** (for all three tasks).

At last, you are required to submit this jupyter notebook and the generated adversarial images.
The final grade will be scored according to the **white-box successful rate**, **black-box successful rate**, **white-box (after submission) successful rate**.

In [3]:
# ! pip3 install cython
# ! pip3 install tensornets
# ! pip3 install numpy==1.16.1
! pip3 install Pillow



Collecting Pillow
[?25l  Downloading https://files.pythonhosted.org/packages/98/83/0fdb0910c909f40c090ba09184feb59001b7fcc89c676fb77986a262af85/Pillow-7.1.2-cp37-cp37m-macosx_10_10_x86_64.whl (2.2MB)
[K     |████████████████████████████████| 2.2MB 2.1MB/s eta 0:00:01
[?25hInstalling collected packages: Pillow
Successfully installed Pillow-7.1.2
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [9]:
import sys,os
from PIL import Image
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import matplotlib as mpl
import matplotlib.pyplot as plt
from time import perf_counter
from utils import *
import tensornets as nets

In [11]:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()


Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device



## Load Images
We provided 100 images from different categories in `./input_dir/`, and the labels are encoded in `./input_dir/clean_image.list`.

In [12]:
images = []
with open('./input_dir/clean_image.list', 'r') as f:
    img_lines = f.readlines()
    for img_line in img_lines:
        imgname, label = img_line.strip('\n').split(' ')
        images.append((imgname, int(label)))

## Image Processing

Each input image must be preprocessed before fed into the models, such as normalization(subtracting the mean and then dividing by the standard deviation). In addition, each generated adversarial image must be reversely processed.
Note that different pretrained models in Tensorflow require different preprocessing.
We provided several `preprocess` and `reverse_preprocess` function for different deep networks in `./utils.py`.

By default, the two functions are designed for mobilenet models.
```python
preprocess(image, model="mobilenet")
reverse_preprocess(image, model="mobilenet")
```

If you want to change to other models, see `./utils.py` for more details.

We have downloaded several popular pretrained models, you can adopt these models as the attacked model.
## Pretrained Models in tensornets (nets)
    'DenseNet121', 'DenseNet169', 'DenseNet201', 
    'Inception1', 'Inception2', 'Inception3', 'Inception4', 'InceptionResNet2',
    'MobileNet25', 'MobileNet50', 'MobileNet75', MobileNet100', 
    'MobileNet35v2', 'MobileNet50v2', 'MobileNet75v2', 'MobileNet100v2', 'MobileNet130v2', 'MobileNet140v2', 
    'NASNetAlarge', 'NASNetAmobile', 'PNASNetlarge',
    'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50v2', 'ResNet101v2', 'ResNet152v2', 'ResNet200v2', 
    'ResNeXt50c32', 'ResNeXt101c32', 'ResNeXt101c64', 'WideResNet50',
    'VGG16', 'VGG19', 
    'SqueezeNet'.

## Define the Attack Method

### TODO: implement your own attack methods.

###  Tips:
- We provide the simple FGSM attack method as an example here. You can try other attack methods (learned in this course), such as the iterative methods.
- For black-box attack, we adopt the `MobileNetV2` as the attacked model, and the generated adversarial images may failed in `MobileNet` (which indicates poor transferability). You can try other attacked models (except `MobileNet`) or model ensemble.

In [19]:
class Attack:
    def __init__(self, input_image):
        self.input_image = input_image
        
        # loss function
        self.loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
        
        # TODO: you may change your target model.
        # load the model which will be attacked
        self.attacked_model = nets.MobileNet50v2(input_image, reuse=tf.AUTO_REUSE)
        
    def generate_adversarial_example(self, input_label):
        input_image = self.input_image
        prediction = self.attacked_model
        loss = self.loss_object(input_label, prediction)

        # TODO: implement your own attack methods.
        # Get the gradients of the loss w.r.t to the input image.
        gradient = tf.gradients(loss, input_image)
       
        # Get the sign of the gradients to create the perturbation (FGSM)
        signed_grad = tf.sign(gradient)[0]
        # Epsilon in FGSM, you can try another value.
        eps = 0.05
        adv_image = input_image + eps * signed_grad
       
        # Clip the generated image between -1 and 1. Note that different pretrained models require different ranges.
        adv_image = tf.clip_by_value(adv_image, -1, 1)
       
        # END TODO
        input_image = self.input_image
        
        adv_image = input_image
        g = 0.0
        u = 1.0
        eps = 0.001
        
        for t in range(1,16):
            prediction = nets.MobileNet50v2(adv_image, reuse=tf.AUTO_REUSE)
            loss = self.loss_object(input_label, prediction)
            gradient = tf.gradients(loss, adv_image)
            g = u*g + gradient[0]/tf.norm(gradient,ord=1)
            adv_image = tf.clip_by_value(adv_image+eps/t*tf.sign(g)[0], -1, 1)
            
        return adv_image

# Evaluation
Define the evaluation functions for both white-box and black-box attack.
**You are not allowed to modify these codes.**

- For white-box attack, the adversarial images are evaluated on the `MobileNetv2` model.
- For black-box attack, the adversarial images are evaluated on the `MobileNet` model. Therefore, you can not use the same `MobileNet` model as the attacked model.

The `Max_Distance` equals to 5.0 here.

In [20]:
Max_Distance = 5.0

class WhiteBox_Evaluation:
    def __init__(self, adv_image):
        self.adv_image = adv_image
        self.eval_model = nets.MobileNet50v2(adv_image, reuse=tf.AUTO_REUSE)
        
    def get_adv_label(self):
        adv_probs  = self.eval_model
        adv_label = tf.argmax(adv_probs,1)
        return adv_label
    
class BlackBox_Evaluation:
    def __init__(self, adv_image):
        self.adv_image = adv_image
        self.eval_model = nets.MobileNet50(adv_image, reuse=tf.AUTO_REUSE)
        
    def get_adv_label(self):
        adv_probs  = self.eval_model
        adv_label = tf.argmax(adv_probs,1)
        return adv_label
    
# init the attacker

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.log_device_placement = True
config.allow_soft_placement = True
sess = tf.Session(config=config)

# load and preprocess image
input_path = tf.placeholder(dtype=tf.string)
input_label = tf.placeholder(shape=None, dtype=tf.int32)
image_raw = tf.io.read_file(input_path)
image = tf.image.decode_jpeg(image_raw, channels=3)
image = image[None, ...]

input_image = preprocess(image)
attacker = Attack(input_image)

# generate adversarial example
adv_image_t = attacker.generate_adversarial_example(input_label)
eval_model_white = WhiteBox_Evaluation(adv_image_t)
eval_model_black = BlackBox_Evaluation(adv_image_t)

# measured by L2 distance
distance_t = tf.math.reduce_euclidean_norm(input_image - adv_image_t)

adv_label_white_t = eval_model_white.get_adv_label()
adv_label_black_t = eval_model_black.get_adv_label()

saved_image_t = reverse_preprocess(adv_image_t)[0]

sess.run(tf.global_variables_initializer())
_ = sess.run([attacker.attacked_model.pretrained(), eval_model_white.eval_model.pretrained(), eval_model_black.eval_model.pretrained()])


Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device



# White-Box Attack Evaluation

In [27]:
success_cnt = 0

for idx, (imgname, label) in enumerate(images):
    imgpath = './input_dir/' + imgname
    run_list = [adv_image_t, distance_t, adv_label_white_t, saved_image_t]
    feed_dict = {input_path: imgpath, input_label: label}
    
    
    adv_image, distance, adv_label, saved_image = sess.run(run_list, feed_dict)
    adv_label = adv_label[0]
    
    # if the adversarial image can successfully fool the attacked model, and the perturbations are less than Max_Distance
    if distance <= Max_Distance:
        success_cnt += 1 if adv_label != label else 0
    
    print('{}: clean_label={:3d} adv_label={:3d} distance={:.2f}'.format(imgname,label,adv_label,distance))
    
    # save the generated images to './output_dir'
    saved_image = tf.image.encode_png(saved_image)
    write_ops = tf.io.write_file('./output_dir/' + imgname, saved_image)
    sess.run(write_ops)

print()
print('White-box attack successful rate: {}%'.format(success_cnt))

Tensor("Placeholder_5:0", dtype=int32)


InvalidArgumentError: labels must be 1-D, but got shape []
	 [[node sparse_categorical_crossentropy_32/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]]

Original stack trace for 'sparse_categorical_crossentropy_32/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits':
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/kernelapp.py", line 597, in start
    self.io_loop.start()
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/platform/asyncio.py", line 149, in start
    self.asyncio_loop.run_forever()
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
    self._run_once()
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
    handle._run()
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/gen.py", line 787, in inner
    self.run()
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/kernelbase.py", line 545, in execute_request
    user_expressions, allow_stdin,
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/ipkernel.py", line 300, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py", line 2858, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py", line 2886, in _run_cell
    return runner(coro)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py", line 3063, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py", line 3254, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/Users/rbouadjenek/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-9fb37e4464ac>", line 42, in <module>
    adv_image_t = attacker.generate_adversarial_example(input_label)
  File "<ipython-input-19-ac4a45924b8e>", line 40, in generate_adversarial_example
    loss = self.loss_object(input_label, prediction)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 126, in __call__
    losses = self.call(y_true, y_pred)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 221, in call
    return self.fn(y_true, y_pred, **self._fn_kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 978, in sparse_categorical_crossentropy
    y_true, y_pred, from_logits=from_logits, axis=axis)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 4546, in sparse_categorical_crossentropy
    labels=target, logits=output)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3477, in sparse_softmax_cross_entropy_with_logits_v2
    labels=labels, logits=logits, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3397, in sparse_softmax_cross_entropy_with_logits
    precise_logits, labels, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 11842, in sparse_softmax_cross_entropy_with_logits
    labels=labels, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3360, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3429, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1751, in __init__
    self._traceback = tf_stack.extract_stack()


# Black-Box Attack Evaluation

In [None]:
success_cnt = 0

for idx, (imgname, label) in enumerate(images):
    imgpath = './input_dir/' + imgname
    run_list = [adv_image_t, distance_t, adv_label_black_t, saved_image_t]
    feed_dict = {input_path: imgpath, input_label: label}
    
    adv_image, distance, adv_label, saved_image = sess.run(run_list, feed_dict)
    adv_label = adv_label[0]
    
    # if the adversarial image can successfully fool the attacked model, and the perturbations are less than Max_Distance
    if distance <= Max_Distance:
        success_cnt += 1 if adv_label != label else 0
    
    print('{}: clean_label={:3d} adv_label={:3d} distance={:.2f}'.format(imgname,label,adv_label,distance))
    
    # save the generated images to './output_dir'
    saved_image = tf.image.encode_png(saved_image)
    write_ops = tf.io.write_file('./output_dir/' + imgname, saved_image)
    sess.run(write_ops)

print()
print('Black-box attack successful rate: {}%'.format(success_cnt))