# Task 2. Extension of ATHENA

## Option 1. Optimization-based white-box attack
- Goal: Generate adversarial examples for the Vanilla ATHENA, using optimaztion-based white-box attack.
- Number of groups: not limited
- Bonus: 10%

In this task, students aim to generate adversarial examples based on the vanilla ATHENA in the context of the white-box threat model (Section III.F in ATHENA paper) and then evaluate the effectiveness of the crafted adversarial examples. Each group should aim to generate the adversarial examples using at most 2 attacks. For each attack, generate around 5 variants by varying tunable parameters. Evaluate the successful rate of the crafted adversarial examples on the vanilla ATHENA. Compare the adversarial examples generated in Task 2 with those generated in Task 1 and the baseline adversarial examples provided by us.

### Report:
1. Introduce the approaches that are used in the task.
2. Experimental settings --- the values of the tunable parameters for each variant.
3. Evaluation results and necessary analysis.
4. Contribution of individual team members.
5. Citations to all related works.

### Optimization-based approaches (already implemented in ATHENA, no bonus):
```
1. Xuanqing Liu, Minhao Cheng, Huan Zhang, Cho-Jui Hsieh. Towards Robust Neural Networks via Random Self-ensemble. ECCV 2018.
2. Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok. Synthesizing Robust Adversarial Examples. ICML 2018
```

### Note:
* You are encouraged to explore new approaches not listed. (**10% bonus for new attacks**)
* If you use the provided approaches, please use **EOT** approach.

# Approaches
First, we aim to generate adversarial examples based on the vanilla ATHENA in the context of the white-box threat model. We use the classic MNIST Dataset containing handwritten digits for the following investigation. Samples in the MNIST Dataset are images of handwritten digits, and labels in the MNIST Dataset are labels corresponding to the digit. The input images are 28 pixels by 28 pixels, and the labels are taken from the set {0,1,2,3,5,6,7,8,9}. Generating an adversarial example in the context of the MNIST Dataset entails adding perturbations to the image in order to "fool" the neural network without greatly distorting the image to the human eye.

Following are some some "attacks" which generate adversarial examples:
## The attacks implemented by ATHENA:
1. [FGSM](https://arxiv.org/abs/1412.6572)
2. [BIM (l2- and linf- norms)](https://arxiv.org/abs/1607.02533)
3. [CW (l2- and linf- norms)](https://ieeexplore.ieee.org/abstract/document/7958570)
4. [JSMA](https://ieeexplore.ieee.org/abstract/document/7467366)
5. [PGD](https://arxiv.org/pdf/1706.06083.pdf)
6. [MIM](https://openaccess.thecvf.com/content_cvpr_2018/papers/Dong_Boosting_Adversarial_Attacks_CVPR_2018_paper.pdf)
7. [DeepFool](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Moosavi-Dezfooli_DeepFool_A_Simple_CVPR_2016_paper.pdf)
8. [One-Pixel](https://arxiv.org/pdf/1710.08864.pdf) (black-box attack, not suitable for this task)
9. [Spatially Transformed Attack](https://arxiv.org/abs/1801.02612)
10. [Hop-Skip-Jump](https://arxiv.org/abs/1904.02144) (black-box attack, not suitable for this task)
11. [ZOO](https://arxiv.org/abs/1708.03999)

## Other possible attacks:
1. [Obfuscated Gradient](https://arxiv.org/pdf/1802.00420.pdf)
2. [DDA (Distributionally Adversarial Attack)](https://www.aaai.org/ojs/index.php/AAAI/article/view/4061)
3. [ENA (Elastic-net Attack)](https://arxiv.org/abs/1709.04114)
4. [GAN-based Attacks](https://arxiv.org/abs/1801.02610)
3. etc.

In the following investigation, we implement the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD).

### FGSM
The Fast Gradient Sign Method generates an adversarial example which maximizes loss by using gradients of the loss function with respect to the input image. With this method, we find out how much each input pixel contributes to the loss and add perturbations in order to maximize loss in the adversarial example.

### PGD
The Projected Gradient Descent is a type of attack where the attacker has knowledge of the modelâ€™s gradients/weights. (A matrix that corresponds to how the model weighs each particular feature it detects) This threat model focuses on finding the perturbation that maximises the error of particular feature gradient of an input without crossing some threshold labelled as epsilon. The goal here is to find the minimum gradient of features that an input must contain to be classified incorrectly. 

## Adversarial Attack Configurations
For each attack, we decide to generate 6 variants, producing a total of 12 adversarial examples. The variants are determined by the following configuration code:
```
{
  "num_attacks": 12,
    "configs0": {
    "attack": "fgsm",
    "description": "FGSM_eps0.1",
    "eps": 0.1,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
      "configs1": {
    "attack": "fgsm",
    "description": "FGSM_eps0.1",
    "eps": 0.1,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  },
    "configs2": {
    "attack": "fgsm",
    "description": "FGSM_eps0.3",
    "eps": 0.3,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
    "configs3": {
    "attack": "fgsm",
    "description": "FGSM_eps0.3",
    "eps": 0.3,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  },
  "configs4": {
    "attack": "fgsm",
    "description": "FGSM_eps0.5",
    "eps": 0.5,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
    "configs5": {
    "attack": "fgsm",
    "description": "FGSM_eps0.5",
    "eps": 0.5,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  },
      "configs6": {
    "attack": "pgd",
    "description": "PGD_eps0.1",
    "eps": 0.1,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
    "configs7": {
    "attack": "pgd",
    "description": "PGD_eps0.1",
    "eps": 0.1,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  },
    "configs8": {
    "attack": "pgd",
    "description": "PGD_eps0.3",
    "eps": 0.3,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
    "configs9": {
    "attack": "pgd",
    "description": "PGD_eps0.3",
    "eps": 0.3,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  },
  "configs10": {
    "attack": "PGD",
    "description": "PGD_eps0.5",
    "eps": 0.5,
    "distribution": {
      "num_samples": 500,
      "transformation": "rotation",
      "min_angle": -45,
      "max_angle": 45
    }
  },
    "configs11": {
    "attack": "pgd",
    "description": "PGD_eps0.5",
    "eps": 0.5,
    "distribution": {
      "num_samples": 500,
      "transformation": "translation",
      "min_offset": -0.2,
      "max_offset": 0.2
    }
  }
}
```
Notice there are 2 attacks (FGSM,PGD), 3 values of epsilon (0.1,0.3,0.5), and 2 variants of distribution (translation,rotation). Thus, there are 2 x 3 x 2 = 12 total adversarial examples, corresponding to the configs above. We utilize this combination of variant attack methods, variant attack parameters, and variant distribution parameters in order to provide a control for comparison for each changed parameter. This is, after all, the key to scientific investigation. Thus, we may find how each parameter change individually effects the error rate of the adversarial examples.

## Computation Cost and Subsampling

Now, instead of generating adversarial examples for all 10,000 images of the MNIST Dataset, we have chosen to do so over a subsample of size 500. We choose this because we have a method of randomly selecting images from the dataset in order to maintain an approximately uniform distribution of class. Additionally, we choose to subsample because of the computational cost of generating adversarial examples for 10,000 images.

We infer the execution time of all 12 attack configurations by the following observations and inferences. The following execution times were approximated using the well-known Python module TQDM. First, we observed that TQDM approximated a 30 minute execution time for a subsample of size 200 over all 12 configs. Thus, as 10000 is 50 times 200, we infer that the execution time for a sample of size 10000 will take 30*50 minutes = 1500 minutes = 25 hours. Therefore, we decide to use a subsample of size 500 to descrease this execution time by a factor of 20.



## EOT Attack
Finally, we choose to employ an adaptive approach which computes the loss expectation over a specific distribution. We use the well-known Expectation Over Transformation (EOT) algorithm, "a general framework allowing for the construction of adversarial examples that remain adversarial over a chosen transformation distribution T" [(Athalye, Synthesizing Robust Adversarial Examples)](https://arxiv.org/pdf/1707.07397.pdf). The transformation distribution T is specified by the "distribution" key in each config from the above configs description. We utilize a translation distribution and a rotation distribution. 
* paper: [Synthesizing Robust Adversarial Examples](https://arxiv.org/pdf/1707.07397.pdf)
* code GitHub repo: [EOT](https://github.com/prabhant/synthesizing-robust-adversarial-examples)
* ATHENA paper: [ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense](https://arxiv.org/abs/2001.00308)
We extend ``EOT`` for ensemble in the ``ATHENA`` paper (``Equation (7)``). Currently, we support ``FGSM`` and ``PGD`` as the adversarial optimizer. You will need to provide the configuration for the chosen distribution.

## Evaluation
We evaluate the generated adversarial examples on the following three models:
    - Undefended Model
        No defense configuration.
    - Vanilla Athena Average Probability Model
        73 weak defenses consisting of a clean model followed by 72
        models trained on transformed input data. The full 
        configuration of this model can be located at
        "./Task2/configs/athena-mnist.json".
    - PGD Model
        A provided baseline model.

<h1>FGSM</h1>
<table>
<h3>eps0.1_translation</h3>
<tr>
<td><img src='results/FGSM_eps0.1_translation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.1_translation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.1_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.1_rotation</h3>
<tr>
<td><img src='results/FGSM_eps0.1_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.1_rotation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.1_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.3_translation</h3>
<tr>
<td><img src='results/FGSM_eps0.3_translation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.3_translation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.3_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.3_rotation</h3>
<tr>
<td><img src='results/FGSM_eps0.3_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.3_rotation/5->5.jpg'></td>
<td><img src='results/FGSM_eps0.3_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.5_translation</h3>
<tr>
<td><img src='results/FGSM_eps0.5_translation/0->8.jpg'></td>
<td><img src='results/FGSM_eps0.5_translation/5->3.jpg'></td>
<td><img src='results/FGSM_eps0.5_translation/6->4.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.5_rotation</h3>
<tr>
<td><img src='results/FGSM_eps0.5_rotation/0->0.jpg'></td>
<td><img src='results/FGSM_eps0.5_rotation/5->8.jpg'></td>
<td><img src='results/FGSM_eps0.5_rotation/6->5.jpg'></td>
</tr>
</table>

<h1>PGD</h1>
<table>
<h3>eps0.1_translation</h3>
<tr>
<td><img src='results/PGD_eps0.1_translation/0->0.jpg'></td>
<td><img src='results/PGD_eps0.1_translation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.1_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.1_rotation</h3>
<tr>
<td><img src='results/PGD_eps0.1_rotation/0->0.jpg'></td>
<td><img src='results/PGD_eps0.1_rotation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.1_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.3_translation</h3>
<tr>
<td><img src='results/PGD_eps0.3_translation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.3_translation/5->5.jpg'></td>
<td><img src='results/PGD_eps0.3_translation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.3_rotation</h3>
<tr>
<td><img src='results/PGD_eps0.3_rotation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.3_rotation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.3_rotation/6->6.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.5_translation</h3>
<tr>
<td><img src='results/PGD_eps0.5_translation/0->2.jpg'></td>
<td><img src='results/PGD_eps0.5_translation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.5_translation/6->8.jpg'></td>
</tr>
</table>
<table>
<h3>eps0.5_rotation</h3>
<tr>
<td><img src='results/PGD_eps0.5_rotation/0->8.jpg'></td>
<td><img src='results/PGD_eps0.5_rotation/5->3.jpg'></td>
<td><img src='results/PGD_eps0.5_rotation/6->5.jpg'></td>
</tr>
</table>

## Consideration
We acknowledge that comparisons for each dimension would be beneficial; however, too extensive for the scope of this investigation. We have shown for each attack variation three adversarial examples generated by the configuration. But, for example, it might be beneficial to see FGSM_eps0.1_rotation next to PGD_eps0.1_rotation to isolate the change from FGSM to PGD.

## Evaluations
<table>
<tr>
<td><img src='results/undefended.jpg'></td>
</tr>
</table>

<table>
<tr>
<td><img src='results/vanilla.jpg'></td>
</tr>
</table>

<table>
<tr>
<td><img src='results/pgd-adt.jpg'></td>
</tr>
</table>

# Discussion
First, we have seen that increasing the epsilon value in generating adversarial examples and the error rate of the evaluating model are directly related, and this relationship is maintained here. Additionally, It is worth noting that over the "translation-rotation" pairs, we observe that the rotation distributon is consistently more effective in generating adversarial examples. Between Task1 and Task2, ATHENA seems to have performed appromxately the same against the adaptive EOT adversarial examples. Regarding the evaluating models, there is a slight improvement with the ATHENA model versus the undefended model. In the evaluation against the PGD-ADT model, the adversarial examples generated with the PGD attacks maintained considerably lower error rates, as expected. Interestingly, a considerable decrease was observed from the undefended model and the ATHENA model to the PGD-ADT model evaluating the adversarial examples generated with the FGSM attack with an epsilon parameter of .5. There was no considerable decrease between the same evaluating models and the higher-epsilon FGSM models. Finally, the evaluation of generated adversarial examples in the context of the white-box threat model in conjunction with the Expectation Over Transformation Algorithm has shown interesting patterns in the effectiveness of the crafted adversarial examples. In future investigations, as we have noted, it would be insightful to compare each dimension to find the most effective parameter, whether it is attack or distribution, in adversarial success. It might be beneficial to explore the principal components of the attack and distribution parameters through a method like Principal Component Analysis.

# Contribution
Andrew - Generated fork repository, contributed in programming process, ran programs locally, co-authored task 1 report

JJ - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report

Stephen - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report