<a href="https://colab.research.google.com/github/its-safi/DeepLearning_Homeworks/blob/main/code04_AdvAttacks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Adversarial Attacks**

One of the key functionalities that Pytorch provides us, is differentiation: with the right commands we can obtain the partial derivative of any output with respect to any input.

Where as when we train, we use this to update the internal weight of the neural network, now we can use this to try to perturb the input image. This is called an *adversarial attack*. Note that unlike *poisoning attacks*, adversarial attacks are attacks on testing images, not training images.

We use these ideas to construct one specific kind of adversarial attack called a "Fast Gradient Sign Method", or FGSM. 

We create FGSM attacks on CIFAR 10, building off of what we've done in Notebooks 2 and 3, and also from this tutorial:

https://pytorch.org/tutorials/beginner/fgsm_tutorial.html

In [None]:
import torchvision
import torchvision.transforms as transforms
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
"""
We define the same CIFAR10 model we've been working with, 
and load the model we've saved, as in Notebook 2b.
"""

# Define the Conv Net
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.relu = nn.ReLU()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x



<All keys matched successfully>

## Problem 1: Saving and Loading Files

Go back to your previous notebook, and train your best CIFAR10 network. Then save it using the command:

torch.save(model2.state_dict(), './drive/MyDrive/Colab\ Notebooks')

(choose whatever path you like). 

Make sure that ConvNet defined above has the same structure as your favorite model that you saved. Then modify the command below as appropriate, to initialize a new model of the same structure, and then load in the saved weights.



In [None]:
# Now load your saved model
# Have to mount google drive before giving this command
model = ConvNet()
model.load_state_dict(torch.load('./drive/MyDrive/Colab\ Notebooks'))

In [None]:
# Now giving the same commands we've seen in previous notebooks
# we load the data and define the training and validation
# data loaders we need. 
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

batch_size = 1

# the tutorial calls the dataloader twice -- this code defines a function
# that will do this for the train/test data.

def fetch_dataloader(batch_size, transform=None, is_train=True):
    """
    Loads data from disk and returns a data_loader.
    A DataLoader is similar to a list of (image, label) tuples.
    It is used to avoid loading all of the data into memory.
    This is particularly useful for large data sets.
    """
    data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

    # Custom train/val split.
    indices = [i for i in range(len(data)) if (i%10 > 0) == is_train]

    data = torch.utils.data.Subset(data, indices)
    loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, num_workers=2)
    return loader


train_transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

val_transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

data_train = fetch_dataloader(batch_size, train_transform, is_train=True)
data_val = fetch_dataloader(batch_size, val_transform, is_train=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [None]:
# Let's again check how good our model is on all the data
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in data_val:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = model(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 60 %


In [None]:
# FGSM attack code
def fgsm_attack(image, epsilon, data_grad):
    # if eps == 0 just return the image
    #if (eps==0): 
    #  return image
    # Collect the element-wise sign of the data gradient
    sign_data_grad = data_grad.sign()
    # Create the perturbed image by adjusting each pixel of the input image
    perturbed_image = image + epsilon*sign_data_grad
    # Adding clipping to maintain [-1,1] range
    perturbed_image = torch.clamp(perturbed_image, -1, 1)
    # Return the perturbed image
    return perturbed_image

In [None]:
def test( model, device, test_loader, epsilon ):

    # Accuracy counter
    correct = 0
    adv_examples = []

    # Loop over all examples in test set
    for data, target in test_loader:

        # Send the data and label to the device
        data, target = data.to(device), target.to(device)

        # Set requires_grad attribute of tensor. Important for Attack
        data.requires_grad = True

        # Forward pass the data through the model
        output = model(data)
        init_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability

        # If the initial prediction is wrong, dont bother attacking, just move on
        if init_pred.item() != target.item():
            continue

        # Calculate the loss
        loss = F.nll_loss(output, target)

        # Zero all existing gradients
        model.zero_grad()

        # Calculate gradients of model in backward pass
        loss.backward()

        # Collect datagrad
        data_grad = data.grad.data

        # Call FGSM Attack
        perturbed_data = fgsm_attack(data, epsilon, data_grad)

        # Re-classify the perturbed image
        output = model(perturbed_data)

        # Check for success
        final_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        if final_pred.item() == target.item():
            correct += 1
            # Special case for saving 0 epsilon examples
            if (epsilon == 0) and (len(adv_examples) < 5):
                #adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                perturbed_data = perturbed_data / 2 + 0.5     # unnormalize
                npPD = perturbed_data.detach().numpy()
                adv_examples.append( (init_pred.item(), final_pred.item(), npPD) )
        else:
            # Save some adv examples for visualization later
            if len(adv_examples) < 5:
                perturbed_data = perturbed_data / 2 + 0.5     # unnormalize
                npPD = perturbed_data.detach().numpy()
                adv_examples.append( (init_pred.item(), final_pred.item(), npPD) )
                #adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                #adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )

    # Calculate final accuracy for this epsilon
    final_acc = correct/float(len(test_loader))
    print("Epsilon: {}\tTest Accuracy = {} / {} = {}".format(epsilon, correct, len(test_loader), final_acc))

    # Return the accuracy and an adversarial example
    return final_acc, adv_examples

In [None]:
# We set the list of epsilons for the attack
#epsilons = [0, .05, .1, .15, .2, .25, .3]
epsilons = [0, .05]
# We set the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Now we run the attack and see the accuracy
accuracies = []
examples = []

# Run test for each epsilon
for eps in epsilons:
    acc, ex = test(model, device, data_val, eps)
    accuracies.append(acc)
    examples.append(ex)

Epsilon: 0	Test Accuracy = 3038 / 5000 = 0.6076
Epsilon: 0.05	Test Accuracy = 533 / 5000 = 0.1066


## Problem 2: Print some images of successful attacks, 

i.e., some images that were correctly classified pre-attack, and then incorrectly classified.

## Problem 3 (Optional):

Try creating targeted attacks. For example, see for a given $\epsilon$ what fraction of the data you can change into a particular label, as opposed to just the wrong label. For example, what fraction can you turn into a frog?