# Task 2. Extension of ATHENA


# Introduction
[ATHENA](https://arxiv.org/abs/2001.00308), due to Ming, Su, O'Kane, and Jamshidi, is a framework for building defenses against adversarial attacks in machine learning. We aim to evaluate the effectiveness of ATHENA in the context of a white-box threat model.

# Background and Methods
## Dataset Description
For the following adversarial investigation, we use the classic [MNIST](http://yann.lecun.com/exdb/mnist/) dataset containing handwritten digits. Samples in the MNIST Dataset are images of handwritten digits, and labels in the MNIST Dataset are labels corresponding to the digit. The input images are 28 pixels by 28 pixels, and the labels are taken from the set {0,1,2,3,5,6,7,8,9}. Generating an adversarial example in the context of the MNIST Dataset entails adding perturbations to the image in order to "fool" the neural network without greatly distorting the image to the human eye.
## Generating Adversarial Examples
In order to evaluate the effectiveness of ATHENA in the context of a white-box threat model, we first generate adversarial examples based on a subset of the attacks implemented by vanilla ATHENA. The following are the attacks implemented by ATHENA:

1. [FGSM](https://arxiv.org/abs/1412.6572)
2. [BIM (l2- and linf- norms)](https://arxiv.org/abs/1607.02533)
3. [CW (l2- and linf- norms)](https://ieeexplore.ieee.org/abstract/document/7958570)
4. [JSMA](https://ieeexplore.ieee.org/abstract/document/7467366)
5. [PGD](https://arxiv.org/pdf/1706.06083.pdf)
6. [MIM](https://openaccess.thecvf.com/content_cvpr_2018/papers/Dong_Boosting_Adversarial_Attacks_CVPR_2018_paper.pdf)
7. [DeepFool](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Moosavi-Dezfooli_DeepFool_A_Simple_CVPR_2016_paper.pdf)
8. [One-Pixel](https://arxiv.org/pdf/1710.08864.pdf) (black-box attack, not suitable for this task)
9. [Spatially Transformed Attack](https://arxiv.org/abs/1801.02612)
10. [Hop-Skip-Jump](https://arxiv.org/abs/1904.02144) (black-box attack, not suitable for this task)
11. [ZOO](https://arxiv.org/abs/1708.03999)

We implement the Fast Gradient Sign Method [FGSM](https://arxiv.org/abs/1412.6572) and the Projected Gradient Descent [PGD](https://arxiv.org/pdf/1706.06083.pdf) attacks for generating adversarial examples in the white-box context. Following are brief descriptions of these attacks:

### FGSM
The Fast Gradient Sign Method generates an adversarial example which maximizes loss by using gradients of the loss function with respect to the input image. With this method, we find out how much each input pixel contributes to the loss and add perturbations in order to maximize loss in the adversarial example.

### PGD
The Projected Gradient Descent is a type of attack where the attacker has knowledge of the model’s gradients/weights. (A matrix that corresponds to how the model weighs each particular feature it detects) This threat model focuses on finding the perturbation that maximises the error of particular feature gradient of an input without crossing some threshold labelled as epsilon. The goal here is to find the minimum gradient of features that an input must contain to be classified incorrectly. 

In the [attack configurations](../configs/attack.json), we explore all combinations of 2 types of attack (FGSM,PGD), 3 values of epsilon (0.1,0.3,0.5), and 2 variants of distribution (translation,rotation). For each translation distribution, we maintain the parameters of minimum offset set to -0.2 and maxmimum offset set to 0.2 throughout all variants. Similarly, for each rotation ditribution, we maintain the parameters of minimum angle set to -45 and maximum angle set to 45 throughout all variants. Thus, we count the total number of variants by enumerating all possible combinations of these parameters, i.e. 2x3x2=12 toal variants. We utilize all combinations of these attack methods, attack parameters, and distribution so we may find how each parameter change uniquely effects the error rate of the evaluated adversarial examples.





## Computation Cost and Subsampling
### Computation Cost
In a preliminary investigation of execution time, we chose to use the well-known Python module [TQDM](https://tqdm.github.io/). TQDM is a progress-bar module with an overhead of approximately 60ns per iteration, which is an order of magnitude speed up over the well-established [ProgressBar](https://github.com/niltonvolpato/python-progressbar) with an overhead of approximately 800ns per iteration. Implementing TQDM is as easy as
```
from tqdm import tqdm
for i in tqdm(range(10000)):
    ...
```
which produces the following output:
```
76%|████████████████████████████         | 7568/10000 [00:33<00:10, 229.00it/s]

```
TQDM displays current iteration, total iterations, current execution time, approximated total execution time, and average iterations per second live during the execution of a python script.

We infer the execution time of all 12 attack configurations by the following observations and inferences. The following execution times were approximated using the well-known Python module TQDM. First, we observed that TQDM approximated a 30 minute execution time for a subsample of size 200 over all 12 configs. Thus, as 10000 is 50 times 200, we infer that the execution time for a sample of size 10000 will take 30*50 minutes = 1500 minutes = 25 hours. Therefore, we decide to use a subsample of size 500 to descrease this execution time by a factor of 20.
### Subsampling
In order to achieve this subsampling, we use the following code snippet, which you can find commented at the bottom of [/Task2/scripts/craft_adversarial_examples.py]:
```
data, labels = subsampling(data = data,
                           labels = labels,
                           num_classes = 10,
                           ratio = .05,
                           filepath = "../data",
                           filename = '')
```
This function produces a .npy file of the subsamples[[/Task2/data/subsamples_0.05.npy]] and the sublabels[[Task2/data/sublabels_0.05.npy]]. Thus, for the following investigation, we used these specific aforementioned files for generating adversarial examples.


## EOT Attack
Finally, we choose to employ an adaptive approach which computes the loss expectation over a specific distribution. We use the well-known Expectation Over Transformation (EOT) algorithm, "a general framework allowing for the construction of adversarial examples that remain adversarial over a chosen transformation distribution T" [(Athalye, Synthesizing Robust Adversarial Examples)](https://arxiv.org/pdf/1707.07397.pdf). The transformation distribution T is specified by the "distribution" key in each config from the above configs description. We utilize a translation distribution and a rotation distribution. 
* paper: [Synthesizing Robust Adversarial Examples](https://arxiv.org/pdf/1707.07397.pdf)
* code GitHub repo: [EOT](https://github.com/prabhant/synthesizing-robust-adversarial-examples)
* ATHENA paper: [ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense](https://arxiv.org/abs/2001.00308)
We extend ``EOT`` for ensemble in the ``ATHENA`` paper (``Equation (7)``). Currently, we support ``FGSM`` and ``PGD`` as the adversarial optimizer. You will need to provide the configuration for the chosen distribution.

## Evaluation
We choose to evaluate the generated adversarial examples on the following three models:

    - Undefended Model
        No defense configuration.
    - Vanilla Athena Average Probability Model
        5 weak defenses contained in [/Task2/configs/athena-mnist.json]:
        - /models/cnn/model-mnist-cnn-shift_bottom_left.h5
        - /models/cnn/model-mnist-cnn-affine_both_stretch.h5
        - /models/cnn/model-mnist-cnn-cartoon_mean_type3.h5
        - /models/cnn/model-mnist-cnn-noise_gaussian.h5
        - /models/cnn/model-mnist-cnn-filter_median.h5
    - PGD Model
        A provided baseline model.
# Results and Discussion
## Fast Gradient Sign Method (FGSM)
The following are the results for the evaluation of FGSM against the models described in the evaluaton secton. First we will show the results for the Fast Gradient Sign Method with the translation distribution.
### FGSM with Translation Adversarial Examples
The following are images of 3 generated adversarial examples for each value of epsilon (0.1,0.3,0.5) using FGSM computing the loss expectation over a translation distribution:
<table>
<tr>
<td><img src='results/FGSM_eps0.1_translation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.1_translation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.1_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/FGSM_eps0.3_translation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.3_translation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.3_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/FGSM_eps0.5_translation/0->8.jpg'></td>
<td><img src='results/FGSM_eps0.5_translation/5->3.jpg'></td>
<td><img src='results/FGSM_eps0.5_translation/6->4.jpg'></td>
</tr>
</table>

### FGSM with Translation Results
| Adversarial Example     | UM | Ensemble | PGD-ADT |
|-------------------------|----|----------|---------|
| FGSM_eps0.1_translation | 0.00607  | 0.01012  | 0.00607 |
| FGSM_eps0.3_translation | 0.26316  | 0.30364  | 0.32389 |
| FGSM_eps0.5_translation | 0.83401  | 0.81174  | 0.70648 |

We notice that error rate increases as epsilon increases, as expected due to the epsilon's relation to perturbation of an image.

<img src='scripts/FGSM_translation.jpg'>

First, notice that the PGD-ADT model performs at a much lower error rate, which makes sense as we used PGD as one of the attack variants. Additionally, notice that the undefended model and the ensemble perform similarly. This might stem from using the EOT approach, too varied values of epsilon, or not using the entire MNIST dataset.

### FGSM with Rotation Adversarial Examples
The following are images of generated adversarial examples over 3 values of epsilon (0.1,0.3,0.5) using FGSM computing the loss expectation over a rotation distribution. 

<table>
<tr>
<td><img src='results/FGSM_eps0.1_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.1_rotation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.1_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/FGSM_eps0.3_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.3_rotation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.3_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/FGSM_eps0.5_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.5_rotation/5->8.jpg'></td>
<td><img src='results/FGSM_eps0.5_rotation/6->5.jpg'></td>
</tr>
</table>

### FGSM with Rotation Results
| Adversarial Example     | UM | Ensemble | PGD-ADT |
|-------------------------|----|----------|---------|
| FGSM_eps0.1_rotation | 0.00809  | 0.00809 | 0.00607 |
| FGSM_eps0.3_rotation | 0.34211  | 0.36639 | 0.04453 |
| FGSM_eps0.5_rotation | 0.88057  | 0.85829 | 0.77732 |

We notice that error rate increases as epsilon increases, as expected due to the epsilon's relation to perturbation of an image.

<img src='scripts/FGSM_rotation.jpg'>

First, notice that the PGD-ADT model performs at a much lower error rate, which makes sense as we used PGD as one of the attack variants. Additionally, notice that the undefended model and the ensemble perform similarly. This might stem from using the EOT approach, too varied values of epsilon, or not using the entire MNIST dataset.

## Projected Gradient Descent (PGD)
The following are the results for the evaluation of FGSM against the models described in the evaluaton secton. First we will show the results for the Fast Gradient Sign Method with the translation distribution.

### PGD with Translation Adversarial Examples
The following are images of 3 generated adversarial examples for each value of epsilon (0.1,0.3,0.5) using FGSM computing the loss expectation over a translation distribution:
<table>
<tr>
<td><img src='results/PGD_eps0.1_translation/0->0.jpg'></td>
<td><img src='results/PGD_eps0.1_translation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.1_translation/6->6.jpg'></td>
</tr>
</table>
<table>

<table>
<tr>
<td><img src='results/PGD_eps0.3_translation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.3_translation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.3_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/PGD_eps0.5_translation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.5_translation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.5_translation/6->8.jpg'></td>
</tr>
</table>

### PGD with Translation Results
| Adversarial Example     | UM | Ensemble | PGD-ADT |
|-------------------------|----|----------|---------|
| PGD_eps0.1_translation | 0.00607  | 0.00809 | 0.00607 |
| PGD_eps0.3_translation | 0.05466  | 0.12348 | 0.01214|
| PGD_eps0.5_translation | 0.38664  | 0.5 | 0.11741 |

We notice that error rate increases as epsilon increases, as expected due to the epsilon's relation to perturbation of an image.

<img src='scripts/PGD_translation.jpg'>

First, notice that the PGD-ADT model performs at a much lower error rate, which makes sense as we used PGD as one of the attack variants. Additionally, notice that the undefended model and the ensemble perform similarly. This might stem from using the EOT approach, too varied values of epsilon, or not using the entire MNIST dataset.

### PGD with Rotation Adversarial Examples
The following are images of 3 generated adversarial examples for each value of epsilon (0.1,0.3,0.5) using FGSM computing the loss expectation over a rotation distribution:
<table>
<tr>
<td><img src='results/PGD_eps0.1_rotation/0->0.jpg'></td>
<td><img src='results/PGD_eps0.1_rotation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.1_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<tr>
<td><img src='results/PGD_eps0.3_rotation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.3_rotation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.3_rotation/6->6.jpg'></td>
</tr>
</table>

<table>
<tr>
<td><img src='results/PGD_eps0.5_rotation/0->8.jpg'></td>
<td><img src='results/PGD_eps0.5_rotation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.5_rotation/6->5.jpg'></td>
</tr>
</table>

### PGD with Rotation Results
| Adversarial Example     | UM | Ensemble | PGD-ADT |
|-------------------------|----|----------|---------|
| PGD_eps0.1_rotation | 0.00809  | 0.01012 | 0.00607 |
| PGD_eps0.3_rotation | 0.10729  | 0.14372 | 0.02227 |
| PGD_eps0.5_rotation | 0.75709  | 0.76923 | 0.28745 |

We notice that error rate increases as epsilon increases, as expected due to the epsilon's relation to perturbation of an image.

<img src='scripts/PGD_rotation.jpg'>

First, notice that the PGD-ADT model performs at a much lower error rate, which makes sense as we used PGD as one of the attack variants. Additionally, notice that the undefended model and the ensemble perform similarly. This might stem from using the EOT approach, too varied values of epsilon, or not using the entire MNIST dataset.


# Discussion
First, we have seen that increasing the epsilon value in generating adversarial examples and the error rate of the evaluating model are directly related, and this relationship is maintained here. Additionally, It is worth noting that over the "translation-rotation" pairs, we observe that the rotation distributon is consistently more effective in generating adversarial examples. Between Task1 and Task2, ATHENA seems to have performed appromxately the same against the adaptive EOT adversarial examples. Regarding the evaluating models, there is a slight improvement with the ATHENA model versus the undefended model. In the evaluation against the PGD-ADT model, the adversarial examples generated with the PGD attacks maintained considerably lower error rates, as expected. Interestingly, a considerable decrease was observed from the undefended model and the ATHENA model to the PGD-ADT model evaluating the adversarial examples generated with the FGSM attack with an epsilon parameter of .5. There was no considerable decrease between the same evaluating models and the higher-epsilon FGSM models. Finally, the evaluation of generated adversarial examples in the context of the white-box threat model in conjunction with the Expectation Over Transformation Algorithm has shown interesting patterns in the effectiveness of the crafted adversarial examples. In future investigations, as we have noted, it would be insightful to compare each dimension to find the most effective parameter, whether it is attack or distribution, in adversarial success. It might be beneficial to explore the principal components of the attack and distribution parameters through a method like Principal Component Analysis.

# Contribution
Andrew - Generated fork repository, contributed in programming process, ran programs locally, co-authored task 1 report

JJ - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report

Stephen - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report