# Evaluating Attacks and Defenses with `mister_ed`
This file will contain code snippets for how to quickly iterate through effectiveness of attacks against (trained) networks. It's highly recommended that you have walked through tutorials 1 and 2 prior to this one. 

As usual, the first thing we'll want to do is import everything.

In [None]:
# EXTERNAL LIBRARY IMPORTS
import numpy as np 
import scipy 

import torch # Need torch version >=0.3
import torch.nn as nn 
import torch.optim as optim 
assert float(torch.__version__[:3]) >= 0.3

In [None]:
# MISTER ED SPECIFIC IMPORT BLOCK
# (here we do things so relative imports work )
# Universal import block 
# Block to get the relative imports working 
import os
import sys 
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)


import config
import prebuilt_loss_functions as plf
import loss_functions as lf 
import utils.pytorch_utils as utils
import utils.image_utils as img_utils
import cifar10.cifar_loader as cifar_loader
import cifar10.cifar_resnets as cifar_resnets
import adversarial_training as advtrain
import adversarial_evaluation as adveval
import utils.checkpoints as checkpoints
import adversarial_perturbations as ap 
import adversarial_attacks as aa
import spatial_transformers as st


In this file we'll be looking at the techniques we'll use to evaluate both attacks and defenses. In general, the task we want to solve is this: we have a classifier trained on a dataset and wish to evaluate its accuracy against unperturbed inputs as well as various properties of an adversarial attack that has gradient access to this classifier. 

Recall that an adversarial attack here has many degrees of freedom we can choose:
- Threat model: $\ell_p$-bounded noise, rotations, translations, flow, any combination of the above
- Bounds for the threat model
- Attack technique: PGD, FGSM, Carlini-Wagner
- Attack parameters: number of iterations, step size, loss functions, etc

And we can choose to evaluate several properties of each attack on a network: 
- Top-k accuracy 
- Average loss value of successful attacks (i.e. average loss value for examples in which the attack causes the index of the maximum logit to change)
- The generated adversarial images 
- Average distance (say according to a custom function) of generated adversarial images to their originals

All we'll be doing in this file is walking through an example of how to build objects to perform evaluations of (some of) these properties on a medley of attacks. 


## Building an AdversarialEvaluationObject

In [None]:
%%html
<img src="images/adversarial_evaluation.png",width=60,height=60>

The above image describes the general workflow: 
First we initialize an `AdversarialEvaluation` instance which keeps track of which classifier we're evaluating against, as well as the normalizer (which recall just performs some operations on raw-data to make it classifier-friendly). This instance will have an `evaluate_ensemble` method which needs as arguments a DataLoader and a dictionary, called the `attack_ensemble`, that contains the attacks (which are wrapped up in `EvaluationResult` instances). This method will output a dictionary that points to the same `EvaluationResult` objects which now have the result data stored in them. Unless otherwise specified, we'll also evaluate the ground accuracy of the classifier and include that in the return-value as well.


Let's go ahead and build up everything except the `EvaluationResult` objects and proceed from there.


In [None]:
# Load the trained model and normalizer
model, normalizer = cifar_loader.load_pretrained_cifar_resnet(flavor=20, return_normalizer=True) 

# Load the evaluation dataset 
cifar_valset = cifar_loader.load_cifar_data('val') 

# Put this into the AdversarialEvaluation object
adv_eval_object = adveval.AdversarialEvaluation(model, normalizer)

## Building an Attack Ensemble

Recall in tutorial_1 we built `AdversarialAttack` objects and used their `.attack(...)` methods to generate adversarial perturbations, where the keyword arguments to `.attack(...)` described the parameters of the attack.

And then in tutorial_2 we build `AdversarialAttackParameters` objects which is a wrapper to hold an `AdversarialAttack` object and the kwargs that described the parameters of the attack. We used this to generate attacks inside the training loop to perform adversarial training.

And finally, in this tutorial we'll build `EvaluationResult` objects which hold an `AdversarialAttackParameters` object and a dictionary storing some information about what we'll evaluate.

The following image summarizes the data structures we've built (the bullet points refer to the arguments needed upon construction)

In [None]:
%%html 
<img src="images/evaluationResult_ds.png",width=60,height=60>

In this worked example, we'll build 3 different evaluation results and evaluate them simultaneously:
- **FGSM8**: An additive noise attack, with $\ell_\infty$ bound of 8.0, attacked using FGSM 
- **PGD4**: An additive noise attack, with $\ell_\infty$ bound of 4.0, attacked using PGD 
- **PGD8**: An additive noise attack, with $\ell_\infty$ bound of 8.0, attacked using PGD

In [None]:
# First let's build the attack parameters for each.
# Note: we're not doing anything new yet. These constructions are covered in the first two tutorials

# we'll reuse the loss function:
attack_loss = plf.VanillaXentropy(model, normalizer)
linf_8_threat = ap.ThreatModel(ap.DeltaAddition, {'lp_style': 'inf', 
                                                 'lp_bound': 8.0 / 255.0})
linf_4_threat = ap.ThreatModel(ap.DeltaAddition, {'lp_style': 'inf', 
                                                  'lp_bound': 4.0 / 255.0})


#------ FGSM8 Block 
fgsm8_threat = ap.ThreatModel(ap.DeltaAddition, {'lp_style': 'inf', 
                                                 'lp_bound': 8.0/ 255.0})
fgsm8_attack = aa.FGSM(model, normalizer, linf_8_threat, attack_loss)
fgsm8_attack_kwargs = {'step_size': 0.05, 
                       'verbose': False}
fgsm8_attack_params = advtrain.AdversarialAttackParameters(fgsm8_attack,
                                                           attack_specific_params=
                                                           {'attack_kwargs': fgsm8_attack_kwargs})


# ------ PGD4 Block 
pgd4_attack = aa.PGD(model, normalizer, linf_4_threat, attack_loss)
pgd4_attack_kwargs = {'step_size': 1.0 / 255.0, 
                      'num_iterations': 20, 
                      'keep_best': True,
                      'verbose': False}
pgd4_attack_params = advtrain.AdversarialAttackParameters(pgd4_attack, 
                                                          attack_specific_params=
                                                          {'attack_kwargs': pgd4_attack_kwargs})

# ------ PGD4 Block 
pgd8_attack = aa.PGD(model, normalizer, linf_8_threat, attack_loss)
pgd8_attack_kwargs = {'step_size': 1.0 / 255.0, 
                      'num_iterations': 20, 
                      'keep_best': True,
                      'verbose': False}
pgd8_attack_params = advtrain.AdversarialAttackParameters(pgd4_attack, 
                                                          attack_specific_params=
                                                          {'attack_kwargs': pgd8_attack_kwargs})

In [None]:
'''
Next we'll build the EvaluationResult objects that wrap these. 
And let's say we'll evaluate the:
- top1 accuracy 
- average loss 
- average SSIM distance of successful perturbations [don't worry too much about this]

The 'to_eval' dict as passed in the constructor has structure 
 {key : <shorthand fxn>}
where key is just a human-readable handle for what's being evaluated
and shorthand_fxn is either a string for prebuilt evaluators, or you can pass in a general function to evaluate
'''

to_eval_dict = {'top1': 'top1', 
                'avg_loss_value': 'avg_loss_value', 
                'avg_successful_ssim': 'avg_successful_ssim'}

fgsm8_eval = adveval.EvaluationResult(fgsm8_attack_params, 
                                      to_eval=to_eval_dict)


pgd4_eval = adveval.EvaluationResult(pgd4_attack_params, 
                                     to_eval=to_eval_dict)

pgd8_eval = adveval.EvaluationResult(pgd8_attack_params, 
                                     to_eval=to_eval_dict)



With our `EvaluationResult` objects built, all that remains is to collect all these into a dictionary and pass them to our `AdversarialEvaluation` object and interpret the result.

In [None]:
attack_ensemble = {'fgsm8': fgsm8_eval, 
                   'pgd4' : pgd4_eval, 
                   'pgd8' : pgd8_eval
                  }
ensemble_out = adv_eval_object.evaluate_ensemble(cifar_valset, attack_ensemble, 
                                                 verbose=True, 
                                                 num_minibatches=1)


Now let's take a look at the evaluation results. First notice that the key `'ground'` has been added to the ensemble output. This stores the top1 accuracy of unperturbed inputs (and thus the accuracy of the classifier).

In general, the results of the evaluations will be stored in the `EvaluationResult.results` dictionary, with the keys being the same as the evaluation types desired. These generally will point to an `AverageMeter` object, which is a simple little object to keep track of average values. You can query its `.avg` value:

In [None]:
# First notice the keys of ensemble_out include ground:
print(attack_ensemble.keys())

attack_ensemble['pgd8'].results

In [None]:
# Now let's build a little helper to print things out cleanly:

sort_order = {'ground': 1, 'fgsm8': 2, 'pgd4': 3, 'pgd8': 4}
def pretty_printer(eval_ensemble, result_type):
    print('~' * 10, result_type, '~' * 10)
    for key in sorted(list(eval_ensemble.keys()), key=lambda k: sort_order[k]):
        eval_result = eval_ensemble[key]
        pad = 6 - len(key)
        if result_type not in eval_result.results:
            continue 
        avg_result = eval_result.results[result_type].avg
        print(key, pad* ' ', ': ', avg_result)
    

In [None]:
'''And then we can print out and look at the results:
This prints the accuracy. 
Ground is the unperturbed accuracy. 
If everything is done right, we should see that PGD with an l_inf bound of 4 is a stronger attack 
against undefended networks than FGSM with an l_inf bound of 8
'''
pretty_printer(ensemble_out, 'top1')

In [None]:
# We can examine the loss (noting that we seek to 'maximize' loss in the adversarial example domain)
pretty_printer(ensemble_out, 'avg_loss_value')

In [None]:
# This is actually 1-SSIM, which can serve as a makeshift 'similarity index', 
# which essentially gives a meterstick for how similar the perturbed images are to the originals
pretty_printer(ensemble_out, 'avg_successful_ssim')

# (Advanced): Custom Evaluation Techniques
For most use cases, the predefined evalutions (accuracy, loss, etc) should be fine. Should one want to extend this, however, it's not too hard to do. We'll walk through an example where we evaluate the average l_inf distance of **successful** attacks. 

First we'll need to build a function that takes in an `EvaluationResult` object, a label and the tuple that is generated from the output of `AdversarialAttackParameters.attack(...)`. 

In [None]:
def avg_successful_linf(self, eval_label, attack_out):
    
    # First set up the averageMeter to hold these results
    if self.results[eval_label] is None:
        self.results[eval_label] = utils.AverageMeter() 
    result = self.results[eval_label]
    
    # Collect the successful attacks only: 
    successful_pert, successful_orig = self._get_successful_attacks(attack_out)
    
    # Handle the degenerate case 
    if successful_pert is None or successful_pert.numel() == 0:
        return 
    
    # Compute the l_inf dist per example
    batched_norms = utils.batchwise_norm(torch.abs(successful_pert - successful_orig), 
                                         'inf', dim=0)
    # Update the result (and multiply by 255 for ease in exposition)
    batch_avg = float(torch.sum(batched_norms)) / successful_pert.shape[0]
    
    result.update(batch_avg * 255, n=successful_pert.shape[0])
    

In [None]:
# And now let's incorporate this into our to_eval_dict
new_to_eval_dict = {'avg_successful_linf': avg_successful_linf}

# And make some new EvaluationResult objects
new_fgsm8_eval = adveval.EvaluationResult(fgsm8_attack_params, 
                                          to_eval=new_to_eval_dict)

new_pgd4_eval = adveval.EvaluationResult(pgd4_attack_params, 
                                         to_eval=new_to_eval_dict)

new_pgd8_eval = adveval.EvaluationResult(pgd8_attack_params, 
                                         to_eval=new_to_eval_dict)

new_ensemble_in = {'fgsm8': new_fgsm8_eval, 
                   'pgd4': new_pgd4_eval, 
                   'pgd8': new_pgd8_eval}

# And run through the evaluation 
new_ensemble_out = adv_eval_object.evaluate_ensemble(cifar_valset, new_ensemble_in,
                                                     verbose=True,
                                                     num_minibatches=1)


In [None]:
# Finally we can take a look at the evaluation that we've monkeypatched in
pretty_printer(new_ensemble_out, 'avg_successful_linf')

This concludes the tutorials for `mister_ed`. If there's anything that's confusing, or any features that you want supported that aren't ready out of the box, please feel free to open an issue on the main github repo and I'll do my best to catering to user requests. 

(also let me know about any bugs!)