# HW2 - A guided surrogate attack implementation

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.0 (27/02/2022)

**Requirements:**
- Python 3 (tested on v3.9.6)
- Matplotlib (tested on v3.5.1)
- Numpy (tested on v1.22.1)
- Pillow (tested on v9.3.0)
- Torch (tested on v1.13.0)
- Torchmetrics (tested on v0.11.0)

### 0. Prelim: Imports needed and testing for CUDA

In [None]:
# Future
from __future__ import print_function
# Matplotlib
import matplotlib.pyplot as plt
# Numpy
import numpy as np
# Pillow
from PIL import Image
# Torch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
# Torchvision
from torchvision import datasets, transforms
# Our custom ResNet and utils functions
from resnet import *
from utils import *

We advise running on GPU and setting up CUDA on your machine as it might drastically speed up the running time for this notebook!

In [None]:
# Define device for torch
use_cuda = True
print("CUDA is available:", torch.cuda.is_available())
device = torch.device("cuda" if (use_cuda and torch.cuda.is_available()) else "cpu")

### 1. Prelim: Dataset and Dataloader

The CIFAR-10 training dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.
We will only use 1000 samples for each class (to keep the dataset small in size and reduce the execution time), hence using 10000 samples in total.

**Question 1:** Based on the description for the dataset, could you describe what seems to be the machine learning problem we will be studying? More specifically,
- Is it a supervised or unsupervised task?
- Is it a regression, classification, clustering, or association problem?
- What does the training dataset consist of, in terms of inputs and outputs?
- How many input and output features seem to be present here?
- Any additional information about the task?
Feel free to explore the dataset yourself, by using additional functions.

In [None]:
# NOTE: This is a fix to work around the "User-agent" issue 
# when downloading the CIFAR10 dataset
from six.moves import urllib
opener_req = urllib.request.build_opener()
opener_req.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener_req)

In [None]:
# Transform definition
# (Basic: only convert image to torch tensor)
tf = transforms.Compose([transforms.ToTensor()])

In [None]:
# CIFAR10 dataset and dataloader
# (Training set)
train_dataset = torchvision.datasets.CIFAR10(root = './data/', download = True, train = True, transform = tf)
train_dataset_reduced = torch.utils.data.Subset(train_dataset, [i for i in range(0, 10000)])
train_loader = torch.utils.data.DataLoader(dataset = train_dataset_reduced, batch_size = 256, shuffle = False)
print(len(train_loader))

Similarly, the test set has 1000 images in each class, but for simplicity and reduced execution time, we will only use 20 images in each class to evaluate our attack functions (or 200 images in total).

In [None]:
# CIFAR10 dataset and dataloader
# (Testing set)
test_dataset = torchvision.datasets.CIFAR10(root = './data/', download = True, train = False, transform = tf)
test_dataset_reduced = torch.utils.data.Subset(test_dataset, [i for i in range(0, 200)])
test_loader = torch.utils.data.DataLoader(dataset = test_dataset_reduced, batch_size = 1, shuffle = False)
print(len(test_loader))

### 2. Prelim: Our pre-trained Model under attack

We will use a simple pre-trained resnet model, with architecture and trainer stored in *resnet.py* and weights stored in file *resnet.data*.
This is a pre-trained model with a simple architecture. Its baseline accuracy is 88.16%, and a rather easy target for an attack.

**Question 2:** Based on the display for the model below, what seems to be the architecture for the model? What layers have been implemented?

In [None]:
# Load the pretrained model
original_model = ResNet(ResidualBlock, [2, 2, 2])
pretrained_model = "./resnet.data"
original_model.load_state_dict(torch.load(pretrained_model, map_location = 'cpu'))
original_model.to(device)
original_model.eval()
print(original_model)

### 3. Writing our attack function

**Question 3:** Write a function iugm_attack(), which performs an **untargeted iterated gradient attack**.
1. It should use the Option #2, described in class, which - on each iteration of the iterated gradient attack - is aiming towards the least probable class, according to the logits.
2. It should have a maximal number of iterations, set to 10 by default.
3. It has 5 inputs:
    - our original image,
    - the epsilon value to be used,
    - the model under attack,
    - the original label for the image,
    - and a maximal number of iterations for the attack.
4. Our attack function simply returns the attack sample to be evaluated by our test function.

In [None]:
def iugm_attack(image, epsilon, model, original_label, iter_num = 10):
    # Skip if epsilon is 0
    if epsilon == 0:
        return image
    else:
        eps_image = None
        return eps_image

### 4. Testing your attack on your model

We will consider two different attack strategies (one-shot and iterated), with different epsilon values and numbers of maximal iterations 

#### First attack: one-shot untargeted gradient attack

If we fix max_iter = 1, we will make our iterated attack a one-shot attack.

#### Second attack: iterated untargeted gradient attack

We fix max_iter = 10.

**Question 4:** Can you suggest some epsilon values to put in the *epsilons* and *epsilons2* list below? Explain your choice of values briefly.

In [None]:
# Test First attack: one-shot untargeted gradient attack
epsilons = [0, None]
accuracies, examples = run_attacks_for_epsilon(epsilons, \
                                               original_model, \
                                               iugm_attack, \
                                               device, \
                                               test_loader, \
                                               max_iter = 1)

In [None]:
# Test Second attack: iterated untargeted gradient attack
epsilons2 = [0, None]
accuracies2, examples2 = run_attacks_for_epsilon(epsilons2, \
                                                 original_model, \
                                                 iugm_attack, \
                                                 device, \
                                                 test_loader, \
                                                 max_iter = 10)

### 5. Visualization of attacks effects on pre-trained model 

As usual, we will display the accuracy vs. epsilon values graph for our given attack, as well as some samples for the given epsilon values.

The graph below will display the accuracy vs. epsilon value for both attacks.

We also suggest to show some adversarial examples, to compute the plausibility threshold value for $ \epsilon $, using the two cells below.

**Question 5:** Looking at the displays below, can you discuss which attack seems to be the most efficient on our model? Is it the one-shot or the iterated one? Is that an expected result? Discuss.

In [None]:
# Show attack curves for both attacks
display_attack_curves(epsilons, epsilons2, accuracies, accuracies2)

In [None]:
# Display adversarial samples for first attack
display_adv_samples(epsilons, examples)

In [None]:
# Display adversarial samples for second attack
display_adv_samples(epsilons2, examples2)

### 6. Creating a surrogate model

Let us now pretend that the original model in Section 2 is protected, using some sort of black-boxing that prevents us from accessing its gradients. This means that using the IUGM attack from earlier on this model directly is therefore forbiden from now on.

Let us create a surrogate version of the model, which has exactly the same architecture, but is not reusing the pre-trained parameters in the resnet.data file.

**Question 6:** Below, we show how to create a surrogate model, which uses exactly the same layers as the previous original ResNet model in Section 2, even though with different weights and biases in each of these layers, as it is currently untrained. Is it important that the surrogate follows exactly the same architecture as the original model? Or could we do with a different architecture?

In [None]:
# Make a surrogate version of the model, untrained
surrogate_model = ResNet(ResidualBlock, [2, 2, 2])
surrogate_model.to(device)
surrogate_model.train()
print(surrogate_model)

### 7. Training our surrogate model

We will need to train the surrogate model using the train dataloader from earlier, but also using the outputs of the original model as ground truth, instead of the original labels in the dataset.

**Question 7:** Have a look at the trainer function below, whose objective is to train the surrogate model. Why is the for loop *for batch_number, (inputs, _) in enumerate(train_loader):* not fetching the ground truth labels from the dataset? Is that normal?

**Question 8:** Fill the blanks in the function below. Show your code in report and explain it.

In [None]:
def train_surrogate(surrogate_model, original_model, train_loader, epochs = 25, lr = 0.001):
    # Use Adam optimizer to update surrogate model parameters
    '''
    Something needs to be changed here?
    '''
    optimizer = optim.Adam(None, lr = lr)
    
    # Use cross-entropy loss function
    criterion = nn.CrossEntropyLoss()
    
    # Performance curves data
    train_losses = []
    train_accuracies = []
    
    for epoch in range(epochs):
        # Initialize epoch loss and accuracy
        epoch_loss = 0.0
        correct = 0
        total = 0
        
        # Iterate over training data
        for batch_number, (inputs, _) in enumerate(train_loader):
            # Get from dataloader and send to device
            inputs = inputs.to(device)
            
            # Use original model outputs and predictions as ground truth
            '''
            Something needs to be changed here?
            '''
            orig_outputs = None
            _, orig_predicted = torch.max(orig_outputs.data, 1)
            
            # Zero out gradients
            optimizer.zero_grad()
            
            # Compute output for surrogate model
            '''
            Something needs to be changed here?
            '''
            surrog_outputs = None
            _, surrog_predicted = torch.max(surrog_outputs.data, 1)
            
            # Compute loss for training surrogate
            '''
            Something needs to be changed here?
            '''
            loss = criterion(None, None)
            
            # Backpropagate loss and update surrogate model weights
            loss.backward()
            optimizer.step()
            
            # Accumulate loss and correct predictions for epoch
            epoch_loss += loss.item()
            total += inputs.size(0)
            '''
            Something needs to be changed here?
            '''
            correct += None
            
        # Calculate epoch loss and accuracy
        epoch_loss /= len(train_loader)
        epoch_acc = correct/total
        train_losses.append(epoch_loss)
        train_accuracies.append(epoch_acc)
        print(f'--- Epoch {epoch+1}/{epochs}: Train loss: {epoch_loss:.4f}, Train accuracy: {epoch_acc:.4f}')
    
    return train_losses, train_accuracies

**Question 9:** Is our model training? We should obtain an accuracy of 97% in the end or so. Does that mean that our surrogate model is better at predicting than the original model, whose baseline accuracy was only 86%?

In [None]:
# Train surrogate from scratch
surrogate_model = ResNet(ResidualBlock, [2, 2, 2])
surrogate_model.to(device)
surrogate_model.train()
train_losses, train_accuracies = train_surrogate(surrogate_model, original_model, \
                                                 train_loader, epochs = 25, lr = 0.001)

### 8. Rework your attack to use surrogate, tranfer attack samples to original model

**Question 9:** Rewrite a function iugm_attack_surr(), which performs an **untargeted iterated gradient attack**, but using the **surrogate to generate attack samples**, before **testing them on the original (protected) model**.
1. It should use the Option #2, described in class, which - on each iteration of the iterated gradient attack - is aiming towards the least probable class, according to the logits.
2. It should have a maximal number of iterations, set to 10 by default.
3. It has 5 inputs:
    - our original image,
    - the epsilon value to be used,
    - the model under attack,
    - the original label for the image,
    - and a maximal number of iterations for the attack.
4. Our attack function simply returns the attack sample to be evaluated by our test function.

Reuse your iugm_attack() function and implement the changes as needed!

In [None]:
def iugm_attack_surr(image, epsilon, original_model, surrogate_model, original_label, iter_num = 10):
    # Skip if epsilon is 0
    if epsilon == 0:
        return image
    else:
        eps_image = None
        return eps_image

Try your function below. We suggest to use the following $ \epsilon $ values, but feel free to play with them if needed. 

In [None]:
# Test First attack: one-shot untargeted gradient attack
epsilons_surr = [0, .025, .05, .25, 1, 2.5, 5, 10]
accuracies_surr, examples_surr = run_attacks_for_epsilon_surr(epsilons_surr, \
                                                              original_model, \
                                                              surrogate_model, \
                                                              iugm_attack_surr, \
                                                              device, \
                                                              test_loader, \
                                                              max_iter = 1)

In [None]:
# Test Second attack: iterated untargeted gradient attack
epsilons_surr2 = [0, .005, .01, .05, .1, .2, .5, 1, 2, 5]
accuracies_surr2, examples_surr2 = run_attacks_for_epsilon_surr(epsilons_surr2, \
                                                                original_model, \
                                                                surrogate_model, \
                                                                iugm_attack_surr, \
                                                                device, \
                                                                test_loader, \
                                                                max_iter = 10)

### 9. Final visualization for surrogate attack

As before, let us compare the attack curves of our attacks effects on both models.

**Question 10:** Discuss the curves and pictures below. Is the surrogate attack working? Were we able to train a surrogate model and transfer its attack samples to the original model?

In [None]:
# Show attack curves for original attack and surrogate attack
display_attack_curves_surr(epsilons, epsilons2, \
                           accuracies, accuracies2, \
                           epsilons_surr, accuracies_surr, \
                           epsilons_surr2, accuracies_surr2)

In [None]:
# Display adversarial samples for first attack on surrogate
display_adv_samples(epsilons_surr, examples_surr)

In [None]:
# Display adversarial samples for second attack on surrogate
display_adv_samples(epsilons_surr2, examples_surr2)

### Challenge question

**Question 11:** Can you suggest directions to improve the performance of our surrogate attack even more? Implementing these operations is not required.