<a href="https://colab.research.google.com/github/neohack22/IASD/blob/IA/IA/projects/presentation/robustness/ROBUST_Notebook_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## From adversarial examples to training robust models

In the previous notebooks, we focused on methods for solving the maximization problem over perturbations; that is, to finding the solution to the problem
\begin{equation}
\DeclareMathOperator*{\maximize}{maximize}
\maximize_{\|\delta\| \leq \epsilon} \ell(h_\theta(x + \delta), y).
\end{equation}

In this notebook, we will focus on training a robust classifier. More precisly, we aim at solving following minimization problem, namely Adversarial Training:
\begin{equation}
\DeclareMathOperator*{\minimize}{minimize}
\minimize_\theta \frac{1}{|S|} \sum_{x,y \in S} \max_{\|\delta\| \leq \epsilon} \ell(h_\theta(x + \delta), y).
\end{equation}
The order of the min-max operations is important here.  Specially, the max is inside the minimization, meaning that the adversary (trying to maximize the loss) gets to "move" _second_.  We assume, essentially, that the adversary has full knowledge of the classifier parameters $\theta$, and that they get to specialize their attack to whatever parameters we have chosen in the outer minimization. The goal of the robust optimization formulation, therefore, is to ensure that the model cannot be attacked _even if_ the adversary has full knowledge of the model.  Of course, in practice we may want to make assumptions about the power of the adversary but it can be difficult to pin down a precise definition of what we mean by the "power" of the adversary, so extra care should be taken in evaluating models against possible "realistic" adversaries.

## Exercice 1
1. Train a robust classifier using Adversarial Training with a specific norm
2. Evaluate your classifier on natural and adversarial examples crafted with the norm of the training and other norms
3. Make an analysis and conclude

In [1]:
import json
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision

from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch.autograd import Variable

import time
cuda = torch.cuda.is_available()

### Train a robust classifier using Adversarial Training with a specific norm

First of all,  we are going to train our model:
- on normal images
- on attacked images

Then we willl attack both with the ℓ∞ norm.

Eventually we will tackle the use of other norms.



In [2]:
# load CIFAR10 dataset
def load_cifar(split, batch_size):
  train = True if split == 'train' else False
  dataset = datasets.CIFAR10("./docs", train=train, download=True, transform=transforms.ToTensor())
  return DataLoader(dataset, batch_size=batch_size, shuffle=train)

batch_size = 100
train_loader = load_cifar('train', batch_size)
test_loader = load_cifar('test', batch_size)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./docs/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./docs/cifar-10-python.tar.gz to ./docs
Files already downloaded and verified


In [3]:
print(len(train_loader.dataset))
print(len(test_loader.dataset))

50000
10000


In [4]:
# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']

There's no need to apply the sotmax here because it will be in the cross entropy loss too.

In [5]:
class ConvModel(torch.nn.Module):
  
  def __init__(self):
    super(ConvModel, self).__init__()
    self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
    self.relu = nn.ReLU()
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2,padding=0)
    self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
    self.lin1 = nn.Linear(64*64, 1000)
    self.lin2 = nn.Linear(1000, 120)
    self.lin3 = nn.Linear(120, 10)
    self.d_out = nn.Dropout(0.2)

  def forward(self, x):
    x = self.conv1(x)
    x = self.relu(x)
    x = self.pool(x)
    x = self.conv2(x)
    x = self.relu(x)
    x = self.pool(x)
    x = torch.flatten(x, 1) # flatten necessary dim

    
    x = self.lin1(x)
    x = self.relu(x)
    x=  self.d_out(x)

    x = self.lin2(x)
    x = self.relu(x)
    x=  self.d_out(x)
    
    x = self.lin3(x)
    x=  self.d_out(x)
    
    return x    

In [6]:
class ProjectedGradientDescent:
  def __init__(self, model, eps, alpha, num_iter):
    # code here ...
    self.model=model
    self.eps=eps
    self.alpha=alpha
    self.num_iter=num_iter
  def compute(self, X, y):
    """ Construct PGD adversarial pertubration on the examples x."""  
    """ Construct Projected Gradient Descent adversarial examples on the examples X"""
    delta = torch.zeros_like(X, requires_grad=True)
    for t in range(self.num_iter):
        loss = nn.CrossEntropyLoss()(self.model(X + delta), y)
        loss.backward()
        
        delta.data = (
            delta + X.shape[
                0]*self.alpha*delta.grad.detach().sign()).clamp(
                -self.eps,self.eps)
        delta.grad.zero_()
        
    
    attacked_images= X + delta.detach()
    return attacked_images

In [7]:
def train_model_normal (model, criterion, optimizer, loader, attack):
  # training with normal images
  # just to comapre it to the adversal model trained with real images
  """Function to train the model"""
  model.train()
  iter = 0
  losses = []
  valid_loss_min = np.Inf #to track the changes in the validation loss
  last_loss_to_print = [] # This is just a convenient variable to print the loss across epochs
  # Training the Model
  for epoch in range(int(epochs)):
      start_time  = time.time()

      for batch_n, (images, labels) in enumerate(train_loader):
          if cuda :
              images, labels = images.cuda(), labels.cuda()
          # Forward + Backward + Optimize
          # Gradients are set equal to 0 at each epoch
          optimizer.zero_grad()

          # Obtaining the normal images
          images_attacked=images

          # Forward
          outputs_attacked = model(images_attacked)
          loss = criterion(outputs_attacked, labels)
          # Backward
          loss.backward()
          # Optimize: We calculate the gradients with optimizer.step()
          optimizer.step()
          iter+=1
          losses.append(loss)
          last_loss_to_print = loss

In [8]:
n_iters = 10 
epochs = 10 
lr_rate = 0.01

def adversarial_train_model(model, criterion, optimizer, loader, attack):
  # adverserial training with PGD
  """Function to train the model"""
  model.train()
  iter = 0
  losses = []
  valid_loss_min = np.Inf #to track the changes in the validation loss
  last_loss_to_print = [] # This is just a convenient variable to print the loss across epochs
  # Training the Model
  for epoch in range(int(epochs)):
      start_time  = time.time()

      for batch_n, (images, labels) in enumerate(train_loader):
          if cuda :
              images, labels = images.cuda(), labels.cuda()
          # Forward + Backward + Optimize
          # Gradients are set equal to 0 at each epoch
          optimizer.zero_grad()

          # Obtaining the attacked images
          images_attacked=attack.compute(images,labels)

          # Forward
          outputs_attacked = model(images_attacked)
          loss = criterion(outputs_attacked, labels)
          # Backward
          loss.backward()
          # Optimize: We calculate the gradients with optimizer.step()
          optimizer.step()
          iter+=1
          losses.append(loss)
          last_loss_to_print = loss

### Evaluate your classifier on natural and adversarial examples crafted with the norm of the training and other norms

We look at the normal model in order to make the comparison with the other model trained with attacked images.

Alpha has to be lower than epsilon: epsilon is what we want to explore and we do it through small steps of size alpha.<br>

In [9]:
# Instantiate Model Class
model_normal = ConvModel()
# move tensors to GPU if CUDA is available
if cuda:
  model_normal = model_normal.cuda()

# define your loss
criterion = nn.CrossEntropyLoss()

# define the optimizer
optimizer = torch.optim.SGD(
    model_normal.parameters(), lr=lr_rate, momentum =0.9)


# define the attack
attack = ProjectedGradientDescent(
# We set num_iter=5 because with num_iter=20 tooks 2 min par epoch    
    model_normal, eps=0.1, alpha=0.01, num_iter=5) 

# train the model robust to attack
train_model_normal(
    model_normal, criterion, optimizer, train_loader, attack)

model_normal.eval()

ConvModel(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (lin1): Linear(in_features=4096, out_features=1000, bias=True)
  (lin2): Linear(in_features=1000, out_features=120, bias=True)
  (lin3): Linear(in_features=120, out_features=10, bias=True)
  (d_out): Dropout(p=0.2, inplace=False)
)

In [11]:
# epsilons = [0, .05, .1, .15, .2, .25, .3]
# epsilon = .05
def eval_model_underattack(model, loader, attack=None):
  """Function to evaluate your model on a specific loader"""

  # Test the Model
  # We set the model to evaluation mode with model.eval(). 
  # It disables any drop-out or batch normalization layers in model
  model.eval()
  correct = 0
  correct_under_attack = 0
  total = 0
  for images, labels in test_loader:
      if cuda :
        images, labels = images.cuda(), labels.cuda()
      
      # Set requires_grad attribute of tensor. Important for Attack
      images.requires_grad = True
      
      # Classify the initial image (with no perturbation yet)
      outputs = model(images)
      _, predicted_label = torch.max(outputs.data, 1)
      correct += (predicted_label == labels).sum().item()
      # Calculate the loss
      loss = criterion(outputs, labels)
      # Zero all existing gradients
      model.zero_grad()
      # Backward
      loss.backward()
      # Collect images gradients
      images_grad = images.grad.data
      
      # # Produce Attack
      

      # Obtaining the attacked images
      images_attacked=attack.compute(images,labels)

      # Re-classify the perturbed image
      outputs = model(images_attacked)
      _, attacked_label = torch.max(outputs.data, 1)
      correct_under_attack += (attacked_label == labels).sum().item()

      total += labels.size(0)
      
    
  print(correct)
  print(correct_under_attack)
  print(total)
  print(
      'Accuracy of the model on the test images: %d %%' % (
          100 * correct / total))
  print(
      'Accuracy of the model on the test images under attack: %d %%' % (
          100 * correct_under_attack / total))

In [12]:
def eval_model(model, loader, attack=None):
  """Function to evaluate your model on a specific loader"""
  accuracy = 0.
  n_inputs = 0.
  for n_batch, (imgs, labels) in enumerate(loader):
      if cuda:
        imgs, labels = imgs.cuda(), labels.cuda()
      if attack==None:
        outputs = model(imgs)
      else:
        outputs = model(imgs + attack.compute(imgs, labels))
      predicted = outputs.argmax(axis=1)
      n_inputs += outputs.size(0)
      accuracy += (predicted == labels).sum()
  accuracy = accuracy/n_inputs
  print("Accuracy: ", accuracy)

In [13]:
attack_normal= ProjectedGradientDescent(model_normal, 0.03, 0.01, 10)
attack = ProjectedGradientDescent(model, 0.03, 0.01, 10)

# eval the normal modedl
print("results on the normal model")
eval_model(model_normal, test_loader)
eval_model(model_normal, test_loader, attack)

# eval the attacked model
print("results on the robust model")
eval_model(model, test_loader)
eval_model(model, test_loader, attack)


attack_normal= ProjectedGradientDescent(model_normal, 0.1, 0.0005, 10)
attack = ProjectedGradientDescent(model, 0.1, 0.0005, 10)

# eval the normal modedl
print("results on the normal model")
eval_model(model_normal, test_loader)
eval_model(model_normal, test_loader, attack)

# eval the attacked model
print("results on the robust model")
eval_model(model, test_loader)
eval_model(model, test_loader, attack)

results on the normal model
Accuracy:  tensor(0.7198, device='cuda:0')
Accuracy:  tensor(0.6324, device='cuda:0')
results on the robust model
Accuracy:  tensor(0.2549, device='cuda:0')
Accuracy:  tensor(0.1732, device='cuda:0')
results on the normal model
Accuracy:  tensor(0.7198, device='cuda:0')
Accuracy:  tensor(0.5464, device='cuda:0')
results on the robust model
Accuracy:  tensor(0.2484, device='cuda:0')
Accuracy:  tensor(0.1463, device='cuda:0')


Let's train our models with the best parameters found.

In [14]:
# Deal with the normal model in order to make the comparison with the other model trained with attacked images

# Instantiate Model Class
model_normal = ConvModel()
# move tensors to GPU if CUDA is available
if cuda:
  model_normal = model_normal.cuda()

# define your loss
criterion = nn.CrossEntropyLoss()

# define the optimizer
optimizer = torch.optim.SGD(
    model_normal.parameters(), lr=lr_rate, momentum =0.9)


# define the attack
# Alpha has to be lower than epsilon: epsilon is the ball we want to explore
# and we do it through small steps of size alpha
# We set num_iter = 5 because with num_iter = 20 tooks 2 min par epoch
attack = ProjectedGradientDescent(
    model_normal, eps=0.1, alpha=0.0005, num_iter=10) 

# train the model robust to attack
train_model_normal(
    model_normal, criterion, optimizer, train_loader, attack)

model_normal.eval()

ConvModel(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (lin1): Linear(in_features=4096, out_features=1000, bias=True)
  (lin2): Linear(in_features=1000, out_features=120, bias=True)
  (lin3): Linear(in_features=120, out_features=10, bias=True)
  (d_out): Dropout(p=0.2, inplace=False)
)

In [15]:
# Instantiate Model Class
model = ConvModel()
# move tensors to GPU if CUDA is available
if cuda:
  model = model.cuda()

# define your loss
criterion = nn.CrossEntropyLoss()

# define the optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=lr_rate, momentum =0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# define the attack
# Alpha has to be lower than epsilon: epsilon is the ball we want to explore
# and we do it through small steps of size alpha
# We set num_iter = 5 because with num_iter = 20 tooks 2 min par epoch
attack = ProjectedGradientDescent(model, eps=0.1, alpha=0.0005, num_iter=10) 

# train the model robust to attack
adversarial_train_model(model, criterion, optimizer, train_loader, attack)

model.eval()

ConvModel(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (lin1): Linear(in_features=4096, out_features=1000, bias=True)
  (lin2): Linear(in_features=1000, out_features=120, bias=True)
  (lin3): Linear(in_features=120, out_features=10, bias=True)
  (d_out): Dropout(p=0.2, inplace=False)
)

In [16]:
# test now the attacks
attack_normal= ProjectedGradientDescent(model_normal, 0.1, 0.0005, 10)
attack = ProjectedGradientDescent(model, 0.1, 0.0005, 10)

# eval the normal model
print("results on the normal model")
eval_model(model_normal, test_loader)
eval_model(model_normal, test_loader, attack)

# eval the attacked model
print("results on the robust model")
eval_model(model, test_loader)
eval_model(model, test_loader, attack)

results on the normal model
Accuracy:  tensor(0.7135, device='cuda:0')
Accuracy:  tensor(0.5841, device='cuda:0')
results on the robust model
Accuracy:  tensor(0.3174, device='cuda:0')
Accuracy:  tensor(0.1935, device='cuda:0')


### Make an analysis and conclude

*Analysis:* 

The smaller the step the smaller the accuracy.<br>
The greatest epsilon is, the lesser the accuray.<br>
For a given epilson, the higher the step, the less the accuracy.<br>
Increasing iterations decreases accuracy to a certain level.



```
attack_normal= ProjectedGradientDescent(model_normal, 0.03, 0.01, 10)
attack=        ProjectedGradientDescent(model,        0.03, 0.01, 10)

results on the normal model
*   Accuracy without attack:  0.7198
*   Accuracy with attack:     0.6324

results on the robust model
*   Accuracy without attack:  0.2549
*   Accuracy with attack:     0.1732

attack_normal= ProjectedGradientDescent(model_normal, 0.1, 0.0005, 10)
attack=        ProjectedGradientDescent(model,        0.1, 0.0005, 10)

results on the normal model
*   Accuracy without attack:  0.7198
*   Accuracy with attack:     0.5464

results on the robust model
*   Accuracy without attack:  0.2484
*   Accuracy with attack:     0.1463
```

- The accuracy of the robust models are lower than the normal models'.<br>
It's due to the fact that we indroduce noise when training with attacked images, which alters the weaker correlations.
- Thus the parameters choosen are resulting from a trade off between accuracy vs robustness.
- Yet when palying with parameters epsilon, alpha and iterations, we notice that the most efficient attack occurs with a smaller alpha, bringing the accuracy to 14% (for a given magnitude of epsilon being 0.1).

As a consequence, we trained our modelds with the most efficient parameters to attack.

*Conclusions:* 

**We see that the training has worked since the accuracy obtained increases from 14% to 17%.**<br>

```
results on the normal model
Accuracy:  tensor(0.7135, device='cuda:0')
Accuracy:  tensor(0.5841, device='cuda:0')
results on the robust model
Accuracy:  tensor(0.3174, device='cuda:0')
Accuracy:  tensor(0.1935, device='cuda:0')
```

With additional data, we could imagine that the accuracy could be improved (adversarial models require more data than non adversarial ones).




Using a different kind of norm is explained in different publications. It implies:

- the use of a larger epsilon values, because the volume of the ball is much smaller than with a ℓ∞ nrom
- Choosing a higher epsilon can lead to have a more visible attack. One can ask itself what are the acceptable values of eppislon.
- Litterature also says that with the ℓ∞ norm, the attack is spread within the images while with a l1 or l2 norm the attack is more on specific points or areas. This could lead also to the attack to be more noticed.

