# Contribution
Andrew - Generated fork repository, contributed in programming process, ran programs locally, co-authored task 1 report

JJ - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report

Stephen - Contributed in programming process, locally generated adversarial examples, co-authored task 1 report

# Introduction
In this experiment, we perform initial analysis of the effectiveness of the Athena framework in defending Machine
Learning models against adversarial attacks. More specifically, we provide analysis of the Athena framework's ability 
to defend a Convolutional Neural Network, trained on the MNIST handwritten digit data set. We will generate adversarial
examples for an undefended CNN and analyse the effectiveness of three different attack methods; Fast Gradient Sign Method,
Basic Iterative Method, and Projected Gradient Descent Method, measured by the error rate. This will be used as the measurement
 of adversarial robustness for the machine learning model. We will use the Athena framework to build an ensemble network
 of CNNs with weak defenses applied to the input data, i.e. a model with a rotation applied to input data, a model with 
 a shift applied to input data, a model with a gaussian filter applied to input data, etc. By performing inference on 
 the generated adversarial examples, we gain insight into the Athena framework's ability to defend a CNN against 
 adversarial attacks. This will then be compared to the effectiveness of a baseline PGD model to gain insight into
  Athena's ability to defend against Adversarial Attacks compared to another method. Our aim in this experiment is to
  determine if an ensemble network of Convolutional Neural Networks with weak transformations applied to input data 
  will be able to successfully increase the adversarial robustness of a model (in this case measured by decreasing
  the error rate when performing inference on adversarial examples).
   
# Implementation
In this report, we successfully implemented the attack of three different models; an Undefended Model, a Vanilla Athena 
Ensemble Model, and a PGD-ADT model. We chose to craft adversarial examples using the Fast Gradient Sign Method (FGSM), 
Jacobian-based Saliency Map Attack (JSMA), and Projected Gradient Descent (PGD).

### FGSM
The Fast Gradient Sign Method generates an adversarial example which maximizes loss by using gradients of the loss function with respect to the input image. With this method, we find out how much each input pixel contributes to the loss and add perturbations in order to maximize loss in the adversarial example.

### BIM
The Basic Iterative Method (BIM), also known as the Iterative FGSM (I-FGSM) is an extension of FGSM in which the attack performs a simple FGSM attack multiple times with smaller step sizes. 

### PGD
The Projected Gradient Descent is a type of attack where the attacker has knowledge of the modelâ€™s gradients/weights. (A matrix that corresponds to how the model weighs each particular feature it detects) This threat model focuses on finding the perturbation that maximises the error of particular feature gradient of an input without crossing some threshold labelled as epsilon. The goal here is to find the minimum gradient of features that an input must contain to be classified incorrectly. 

# Experimentation

For task one, our group decided to plot the following attacks:
     - FGSM
         Epsilons: 0.1, 0.5, 0.9
     - BIM
         Epsilons: 0.1, 0.5, 0.9
     - PGD
         Epsilons: 0.1, 0.5, 0.9
         
We used the following defense models:
    - Undefended Model
        No defense configuration.
    - Vanilla Athena Average Probability Model
        19 weak defenses consisting of a clean model followed by 18
        models trained on transformed input data. The full 
        configuration of this model can be located at
        "configs/task1/athena-mnist.json".
    - PGD Model
        A provided baseline model.
        

## Generating Adversarial Examples
In order to generate adversarial examples, reference the file "craft_adversarial_examples.py" located in the "scripts
directory. 

### CLI Interface
This file is callable through its Command Line Interface and takes five possible arguments:
    1. -m, --model_configs
        - Path to the model configuration file and should contain all of the relevant details to locate and 
        and load the model used to generate adversarial attacks. This file should include the model directory, the
        file name of the undefended model, the weak defense prefix and postfix, and the path to the pgd_trained model.
        Reference out configuration file located at "configs/model-mnist.json" as an example and to see proper structure.
    2. -d, --data_configs
        - Path to the data configuration file. It should contain the path to the benign samples, the path
        to the labels of the benign samples, and the location to save the adversarial examples. As above, reference our
        configuration file located at "configs/data-mnist.json" as an example and to see proper structure.
    3. -a, --attack_configs
        - Path to the attack configuration file. It should contain the number of attacks to be performed followed by
        the type of attack, description and epsilon value. Reference "configs/attack-ak-mnist.json" for proper structure.
    4. -o, --output-root
        - Path to the directory in which generated adversarial examples are stored.
    5. --debug
        - Debug may be useful when troubleshooting.
        
### How Adversarial Examples are generated
    1. By referencing the specified configuration paths from the command line arguments, we first load the target model,
    the benign samples, and the benign labels. Note that we take a subset of the MNIST dataset for computer efficiency
    purposes, in this case a subset of 10000 samples.
    2. Call the "generate_ae" function, passing in the target model, benign data, labels, attack configurations,
    whether or not to save the generated adversarial examples, and the output directory to be saved to.
    3. Load and prepare the dataset.
    4. To generate adversarial examples, call the "generate" function from the attack module.
    5. Perform inference on the generated adversarial examples and record the error rate.
    6. Display results and save to file if specified.
    
## Evaluating adversarial examples
In order to evaluate adversarial examples, reference the file "eval_model.py" located in the "scripts
directory.

### CLI Interface
This file is callable through its command line interface and takes five possible arguments.
    1. -t, --trans-configs
        - This is the path to the transformations configuration file. It should contain the specifications for the Athena
        Ensemble model. This file should include the total number of transformations in the ensemble model and the
        specifications for each transformation. This includes the type of transformation, a possible subtype, the id for
        the transformation, a short description, and any parameters that may be specific to the type. Reference the file
        "configs/athena-mnist.json" for proper structure.
    2.  -m, --model_configs
        - This is the path to the model configuration file that will be used for evaluation. See (1) under the CLI explanation for
        generating adversarial examples.
    3. -d, --data_configs
        - This is the path to the data configuration file that will be used for evaluation purposes. See (2) under
        the CLI explanation for generating adversarial examples.
    4. -o, --output_root
        - This is the path where experiment results will be stored.
    5. -d, --debug
        - This is useful for troubleshooting purposes.
        
### How Adversarial Examples are Evaluated
    1. The transformation, model, and data configurations are loaded. This is passed into the evaluate function, specified
    in the file.
    2. Each model is loaded. This includes the Baseline model, the endefended, and the creation of the Athena Ensemble model.
    3. Load the benign samples and labels and perform inference on each of the three models.
    4. Load the adversarial examples and perform inference on each of the three models.
    5. Record the error rates for inference on the adversarial examples for each of the model and store the results in 
    the output directory.
        
# Results
The following are one generated adversarial example per varied parameter per attack method:

## BIM
<table>
<tr>
<td><img src='results/figures/BIM_eps0.1/2->6.jpg'></td>
<td><img src='results/figures/BIM_eps0.5/2->6.jpg'></td>
<td><img src='results/figures/BIM_eps0.9/2->6.jpg'></td>
</tr>
</table>

## FGSM
<table>
<tr>
<td><img src='results/figures/FGSM_eps0.1/2->6.jpg'></td>
<td><img src='results/figures/FGSM_eps0.5/0->8.jpg'></td>
<td><img src='results/figures/FGSM_eps0.9/0->8.jpg'></td>
</tr>
</table>

## PGD
<table>
<tr>
<td><img src='results/figures/PGD_eps0.1/2->6.jpg'></td>
<td><img src='results/figures/PGD_eps0.5/2->6.jpg'></td>
<td><img src='results/figures/PGD_eps0.9/2->6.jpg'></td>
</tr>
</table>

Notice very noisy images begin to be generated as epsilon reaches .9 with the three attack methods, almost to the point where the semantics are changed; however, it is still interesting to consider the evaluations of these images on the undefended model, the ensemble, and the PGD-ADT model. You will notice, if you look closely, that the semantics of the images are not entirely lost, for you can discern the true label of the image to a high degree of accuracy. So, we acknowledge that we should maintain human semantics to test a model's intelligence; however, we assert that the semantics of these images have not entirely been lost.

## Evaluations
<table>
<tr>
<td><img src='results/figures/eval_um.png'></td>
</tr>
</table>

<table>
<tr>
<td><img src='results/figures/eval_avep.png'></td>
</tr>
</table>

<table>
<tr>
<td><img src='results/figures/eval_pgd-adt.png'></td>
</tr>
</table>

## Analysis of Results
By referencing the above results, it can be seen that the Athena Ensemble framework was able to successfuly defend a Convolutional Neural Network from adversarial examples with small epsilon values. An effective adversarial attack may be able to cause small perturbations to input data, even imperceptible to the human eye, in order to decrease a deep neural network's ability to classify input data accurately. Therefore, it is much more important for an adversarial defense strategy to be able to defend against these examples. In production, one may be able to use an adversarial classifier to filter out input data that deviates too much from expected input data. As the changes to this data becomes more imperceptible, it would be much more difficult for such a classifier to identify these adversarial examples. This creates a flaw in the adversarial robustness of the model. Through evaluating these examples with the Athena framework, it shows the ability to decrease the error rate of performing inference on input data with small changes, thus increasing the robustness. By comparing to the PGD-ADT model, it can be seen that the Athena framework not only defended against these samples accurately, but up to par with the state of the art. 

# Citations
Ian J. Goodfellow, Jonathon Shlens, & Christian Szegedy. (2015). Explaining and Harnessing Adversarial Examples. https://arxiv.org/pdf/1412.6572.pdf

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, & Adrian Vladu. (2019). Towards Deep Learning Models Resistant to Adversarial Attacks. https://arxiv.org/pdf/1706.06083.pdf

Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, & Andrew B. Gardner. (2017). Detecting Adversarial Samples from Artifacts. https://arxiv.org/pdf/1703.00410.pdf

Ying Meng, Jianhai Su, Jason M. O'Kane, & Pooyan Jamshidi. (2020). ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Attacks. https://arxiv.org/pdf/2001.00308.pdf