Assignment 11: Adversarial Training
===================================


Microsoft Forms Document: https://forms.office.com/r/e3n9gYXsL5

In this assignment we will show that adversarial training provides stability against adversarial attacks for the MNIST dataset.
We will compare three different types of training procedures:

1. Train with only the original samples
2. Train with original samples and added random noise
3. Train with original samples and adversarial samples

Note that the results of this experiment might not translate well to other datasets.

Task 1: Dataset
---------------

We will make use of the default implementations of the MNIST dataset.
As usual, we will need the training and validation set splits of MNIST, including data loaders.
Select appropriate batch sizes for training and validation set.

In [None]:
import torch
import torchvision

# training set and data loader
train_loader = ...

# validation set and data loader
validation_loader = ...

device = torch.device("cuda")

Task 2: Classification Network
------------------------------
We use the same small-scale network as we have done in assignment 8. 

In [None]:
class Network (torch.nn.Module):
  def __init__(self, Q1, Q2, K, O):
    # call base class constrcutor
    super(Network,self).__init__()
    ...
  
  def forward(self,x):
    return ...

Task 3: Fast Gradient Sign
--------------------------

For adversarial training, we need to implement a function to generate an adversarial sample for a given input.
Here, we are implementing the Fast Gradient Sign method, which modifies the given input by adding a scaled version of the sign of the gradient:

$$\mathbf X_{\mathrm{FGS}} = \mathbf X + \alpha\,\mathrm{sign}(\nabla_{\mathbf X} \mathcal J)$$

Finally, the result needs to be clamped to be in range $[0,1]$.

Note that this function will be used with batches of samples.

In [None]:
def FGS(x, t, network, loss, alpha=0.3):
  # tell autograd that we need the gradient for the input
  ...
  # forward input
  z = ...
  # compute loss and gradient
  J = ...

  # get the gradient
  gradient = ...
  # create FGS adversarial sample
  adversarial_sample = ...
  
  return adversarial_sample

Task 4: Random Noise
--------------------
For comparison, we want a function that produces noise similar to FGS, which we add to the image:

$$\mathbf X_{\mathrm{noise}} = \mathbf X + \alpha \{-1,1\} ^{D\times E}$$

Here, $-1$ and $1$ are sampled with the same probability, and independently for each pixel.

Again, we clamp the results to be in range $[0,1]$.

Note that this function will also be used with batches of samples.

In [None]:
def noise(x, alpha=0.3, **kwargs):
  # generate noise 
  N = ...
  # Add noise and clamp
  noisy_sample = ...

  return noisy_sample

Task 5: Training Loop
---------------------

Implement a training and validation loop that possibly includes training with adversarial or with noise samples.
This loop iterates over all training batches once, i.e., we implement one epoch of training here.


In [None]:
def training_loop(network, loss, optimizer, add_additional_samples = None, alpha=0.3):
  for x,t in train_loader:
    # compute output for current batch
    z = ...
    # compute loss
    J = ...
    # compute gradient
    ...

    if add_additional_samples is not None:
      # compute modified samples for batch
      if add_additional_samples == "FGS":
        # create FGS adversarial samples
        x_hat = ...
      else:
        # create noisy samples
        x_hat = ...

      # compute output for modified samples
      z_hat = ...
      # compute loss on modified samples
      J = ...
      # compute gradient
      ...

Task 6: Validation Loop
-----------------------

We need to compute both the classification accuracy and the adversarial stability for the validation set.
For each batch, first we select the correctly classified images.
For these, we generate FGS adversarial samples.
Finally, we test whether these adversarial samples are classified as the original classes.

Compute classification accuracy and adversarial accuracy on the whole test set.
Think about how to normalize the adversarial accuracy.

In [None]:
def validation_loop(network, loss, alpha=0.3):
  total, correct_clean_count, correct_adversarial_count = 0,0,0

  # iterate over validation set samples
  for x,t in validation_loader:
    with torch.no_grad():
      # classify original samples
      z = ...

      # compute classification accuracy on original samples
      correct_clean_count += ...

    # select the correctly classified samples
    ...
    
    # create adversarial samples using FGS
    x_FGS = ...

    # check how many are correctly classified
    with torch.no_grad():
      # classify adversarial samples
      z_FGS = ...

      # compute classification accuracy on adversarial samples
      correct_adversarial_count += ...

  # compute clean and adversarial accuracy and return them
  clean_accuracy = ...
  adversarial_accuracy = ...
  return clean_accuracy, adversarial_accuracy

Task 7: Training of Three Networks
----------------------------------

Instantiate three different but identical networks.
Instantiate according optimizers that train these networks.
Train these networks for 10 epochs.
The first network will be trained on clean samples only.
The second network will be trained using adversarial samples.
The third network will be trained with noise samples.

Evaluate all three networks on the validation set, and record clean and adversarial classification accuracies.

Note that the training time is extended as compared to a normal training since the creation of adversarial samples requires time.

In [None]:
# define one network for each training procedure
networks = ...
# define optimizer
optimizer = ...

# define loss function
loss = ...

# store accuracies on clean and adversarial samples for the three cases
clean_accuracies = ...
adversarial_accuracies = ...


# iterate over 10 epochs (or more)
for epoch in range(10):
  # perform training loop
  ...

  # compute and store validation set accuracies
  ...

Task 8: Plotting of Accuracies
------------------------------

Plot the different clean accuracies and adversarial accuracies over the training epochs.

In [None]:
from matplotlib import pyplot

pyplot.figure(figsize=(6,6))

# plot clean accuracies
pyplot.subplot(211)
...

# plot adversarial accuracies
pyplot.subplot(212)
...