# Improving security of a classifier

In this notebook we will try to make classifiers more robust to adversarial evasion attacks.
First, we define a protocol for assessing robustness of classifiers. Then, in the second part of this tutorial, we use a robust model, trained with a widely-used technique called adversarial training.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/maurapintor/ARTISAN/blob/HEAD/03_defenses.ipynb)



## Security Evaluation

We could be interested in evaluating the **robustness** of a classifier against increasing values of the maximum perturbation $\varepsilon$.

SecML provides a way to easily produce a **Security Evaluation Curve**, by means of the `CSecEval` class.

The `CSecEval` instance will take a `CAttack` as input and will test the classifier using the desired perturbation levels.

*Please note that the security evaluation process may take a while (up to a few minutes) depending on the machine the script is run on.*

In [None]:
try:
    import secml
except ImportError:
    %pip install secml

try:
    import foolbox
except:
    %pip install foolbox

try:
    import robustbench
except ImportError:
    %pip install git+https://github.com/RobustBench/robustbench.git


In [None]:
n_ts = 20  # number of testing samples

from secml.data.loader import CDataLoaderMNIST

loader = CDataLoaderMNIST()
ts = loader.load('testing', num_samples=n_ts)

# normalize the data
ts.X /= 255

In [None]:
from secml.ml.peval.metrics import CMetricAccuracy
from secml.ml import CClassifierPyTorch
from collections import OrderedDict
from torch import nn
import torch
import os
from robustbench.utils import download_gdrive

class SmallCNN(nn.Module):
    def __init__(self, drop=0.5):
        super(SmallCNN, self).__init__()
        self.num_channels = 1
        self.num_labels = 10
        activ = nn.ReLU(True)
        self.feature_extractor = nn.Sequential(OrderedDict([
            ('conv1', nn.Conv2d(self.num_channels, 32, 3)),
            ('relu1', activ),
            ('conv2', nn.Conv2d(32, 32, 3)),
            ('relu2', activ),
            ('maxpool1', nn.MaxPool2d(2, 2)),
            ('conv3', nn.Conv2d(32, 64, 3)),
            ('relu3', activ),
            ('conv4', nn.Conv2d(64, 64, 3)),
            ('relu4', activ),
            ('maxpool2', nn.MaxPool2d(2, 2)),
        ]))
        self.classifier = nn.Sequential(OrderedDict([
            ('fc1', nn.Linear(64 * 4 * 4, 200)),
            ('relu1', activ),
            ('drop', nn.Dropout(drop)),
            ('fc2', nn.Linear(200, 200)),
            ('relu2', activ),
            ('fc3', nn.Linear(200, self.num_labels)),
        ]))

    def forward(self, input):
        features = self.feature_extractor(input)
        logits = self.classifier(features.view(-1, 64 * 4 * 4))
        return logits

PRETRAINED_FOLDER = 'pretrained'
# create folder for storing models
if not os.path.exists(PRETRAINED_FOLDER):
    os.mkdir(PRETRAINED_FOLDER)

MODEL_ID_REGULAR = '12HLUrWgMPF_ApVSsWO4_UHsG9sxdb1VJ'
filepath = os.path.join(PRETRAINED_FOLDER, f'mnist_regular.pth')
if not os.path.exists(filepath):
    # utility function to handle google drive data
    download_gdrive(MODEL_ID_REGULAR, filepath)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
regular_mnist_model = SmallCNN()
regular_mnist_model.load_state_dict(torch.load(os.path.join(PRETRAINED_FOLDER, 
                                                            'mnist_regular.pth'), map_location=device))

regular_mnist_clf = CClassifierPyTorch(model=regular_mnist_model, pretrained=True, input_shape=(1, 28, 28))

metric = CMetricAccuracy()
preds = regular_mnist_clf.predict(ts.X)
accuracy = metric.performance_score(y_true=ts.Y, y_pred=preds)
print(f"Accuracy on test set: {accuracy * 100}%")

In [None]:
# let's define the attack we want to use for security evaluation
from secml.adv.attacks.evasion import CFoolboxPGDLinf

y_target = None
lb, ub = 0.0, 1.0
eps = 0.3  # this will be changed by the security evaluation class
alpha = 0.05
steps = 100

# TODO write your code here: create PGD Linf attack

In [None]:
from secml.array import CArray
from secml.adv.seceval import CSecEval

epsilon_vals = CArray.linspace(0, stop=0.5, num=10)
# TODO write your code here: create security evaluation
sec_eval.run_sec_eval(ts)

In [None]:
%matplotlib inline

from secml.figure import CFigure

fig = CFigure(height=5, width=10)

fig.sp.plot_sec_eval(sec_eval.sec_eval_data, marker='o', label="Mnist regular", show_average=True)

fig.show()

We can see how this classifier is *vulnerable* to adversarial attacks, and how we are able to evade it even with small perturbations.

In the next part of the tutorial we will try to find a model that is more robust.

## Adversarial Training

Adversarial training aims at solving a min-max optimization problem. 

$$
\min _\theta \rho(\theta), \quad \text { where } \quad \rho(\theta)=\mathbb{E}_{(x, y) \sim \mathcal{D}}\left[\max _{\delta \in \mathcal{S}} L(x+\delta, y, \theta)\right]
$$

Where we want to solve the inner problem by creating adversarial examples, and the outer problem by feeding these to the training loss.

In simpler words, to perform adversarial training we compute adversarial examples and use them as training data for the classifier.

REMEMBER: It takes a longer time to train an AT model, because it also has to compute the adversarial examples.


In [None]:
MODEL_ID_ROBUST = '1gg7Zyly9hcrxtuDfacHXDubg0O1ddGOC'
filepath = os.path.join(PRETRAINED_FOLDER, f'mnist_robust_dnn.pth')
if not os.path.exists(filepath):
    # utility function to handle google drive data
    download_gdrive(MODEL_ID_ROBUST, filepath)

robust_net = SmallCNN()
robust_net.load_state_dict(torch.load(filepath, map_location=device))

# wrap torch model in CClassifierPyTorch class
robust_clf = CClassifierPyTorch(model=robust_net,
                                input_shape=(1, 28, 28),
                                pretrained=True)

y_pred = robust_clf.predict(ts.X)
acc = metric.performance_score(y_true=ts.Y, y_pred=y_pred)
print("Accuracy on test set: {:.2%}".format(acc))

Now let's evaluate again the security of this new robust classifier. Of course, we have to compute again the attacks, as the gradients will have changed after retraining.

In [None]:
attack_robust = CFoolboxPGDLinf(robust_clf, y_target,
                         lb=lb, ub=ub,
                         epsilons=eps,
                         abs_stepsize=alpha,
                         steps=steps,
                         random_start=False)

sec_eval_robust = CSecEval(attack=attack_robust, param_name="epsilon", 
                    param_values=epsilon_vals)
sec_eval_robust.run_sec_eval(ts)

In [None]:
%matplotlib inline

from secml.figure import CFigure

fig = CFigure(height=5, width=10)

fig.sp.plot_sec_eval(sec_eval.sec_eval_data, marker='o', 
                     label="Mnist regular", show_average=True)

fig.sp.plot_sec_eval(sec_eval_robust.sec_eval_data, marker='*',
                     label="Mnist robust", show_average=True)
fig.show()


In [None]:
# let's define a convenience function to easily plot the MNIST dataset
def show_digits(samples, preds, labels, n_display=8, title=None):
    digits = list(range(10))
    samples = samples.atleast_2d()
    n_display = min(n_display, samples.shape[0])
    fig = CFigure(width=n_display * 2, height=4)
    for idx in range(n_display):
        fig.subplot(2, n_display, idx + 1)
        fig.sp.xticks([])
        fig.sp.yticks([])
        fig.sp.imshow(samples[idx, :].reshape((28, 28)), cmap='gray')
        fig.sp.title("{} ({})".format(digits[labels[idx].item()], digits[preds[idx].item()]),
                     color=("green" if labels[idx].item() == preds[idx].item() else "red"))
    if title is not None:
        fig.title(title)
    fig.show()


# take a subset of samples
samples = ts[:10, :]

# set the attacks epsilons to a desired maximum perturbation
attack.epsilon = 0.2
attack_robust.epsilon = 0.2

y_pred_not_robust, _, adv_ds_not_robust, _ = attack.run(samples.X, samples.Y)
y_pred_robust, _, adv_ds_robust, _ = attack_robust.run(samples.X, samples.Y)

show_digits(adv_ds_not_robust.X, y_pred_not_robust, samples.Y, n_display=8, title="DNN predictions")
show_digits(adv_ds_robust.X, y_pred_robust, samples.Y, n_display=8, title="Robust DNN predictions")

## Exercise 1

For this first exercise, we are going to test the [transferability](https://www.usenix.org/conference/usenixsecurity19/presentation/demontis) of the adversarial examples.
Namely, we are going to test the adversarial examples created against one classifier on a second classifier.
Use the results of the previous cell as starting point (we already created the adversarial examples that we need for this step).

1. Compute the accuracy of the robust classifier on the adversarial examples created with the standard classifier.
2. Compute the accuracy of the standard classifier on the adversarial examples created with the robust classifier.


In [None]:
# TODO write your code here


## Exercise 2

Compute the security evaluation curve of different sklearn classifiers on a random blob dataset.
1. Create a random blob dataset.
2. Create two different classifiers.
3. Train the classifiers on the dataset, and test the accuracy.
4. Compute the two security evaluation curves and show them in a single plot.
5. (extra) Try to write a function to compute the security evaluation curve without using the utility from SecML.

In [None]:
# TODO write your code here
