# 0. Introduction

In this lab we will use [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset,

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
from torchvision import datasets
from torchvision import transforms
from torchvision import models

import matplotlib.pyplot as plt
import numpy as np

## Enabling and testing the GPU

First, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

Next, we'll confirm that we can connect to the GPU with pytorch:

In [None]:
torch.cuda.is_available()

# Loading the dataset

Read the code below and understand what it does. 

We are using the built-in dataset for CIFAR10, and we are applying a set of transformations to avoid overfitting.

In [None]:
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

trainset = datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=128, shuffle=True, num_workers=2)

testset = datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=128, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


In [None]:
def imshow(img):
    img = img / 2 + 0.5     # de-normalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

dataiter = iter(trainloader)
images, labels = next(dataiter)

imshow(torchvision.utils.make_grid(images[:10]))
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(10)))

# AlexNet

We will use AlexNet[1] architecture to train on this task.


We have loaded the architecture from torch hub and you can review the model architecture below.

In [None]:
model = models.alexnet(num_classes=10)
model

We will train the network using Stochastic Gradient Descent ([SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html)).

Create the optimizer below, make sure an appropriate weight decay, momentum and lr are selected.


In [None]:
### your code here

Now create a training loop (using previous labs as example or by looking into pytorch documentation) and train the network for 10 epochs on the train set. Produce a confusion matrix for the test set predictions.

In [None]:
### your code here

# Transfer Learning




We will try this, using transfer learning and fine-tuning. Specifically, we will use the AlexNet architecture pre-trained on ImageNet and will fine-tune the classifier.

In [None]:
pretrained = models.alexnet(weights='DEFAULT')
model = models.alexnet(num_classes=10)
model.load_state_dict(pretrained.features.state_dict(), strict=False)
for param in model.features.parameters():
    param.require_grad=False

We loaded the pretrained weights on the feature extraction part of the architecture and have frozen them, so that they do not update during training.

Create a new optimizer and repeat the training process for the new model. Then produce a confusion matrix for the new model and compare the results with training from scratch.

In [None]:
### your code here

# Unsupervised tasks

During the lecture you went through some unsupervised paradigms for deep learning. In this section, we will expand on contrastive learning [2],[3] using a supervised contrastive approach [4] and then evaluate the performance on the downstream task.

In [None]:
# Code borrowed from https://github.com/HobbitLong/SupContrast
class SupConLoss(nn.Module):
    """Supervised Contrastive Learning: https://arxiv.org/pdf/2004.11362.pdf.
    It also supports the unsupervised contrastive loss in SimCLR"""
    def __init__(self, temperature=0.07, contrast_mode='all',
                 base_temperature=0.07):
        super(SupConLoss, self).__init__()
        self.temperature = temperature
        self.contrast_mode = contrast_mode
        self.base_temperature = base_temperature

    def forward(self, features, labels=None, mask=None):
        """Compute loss for model. If both `labels` and `mask` are None,
        it degenerates to SimCLR unsupervised loss:
        https://arxiv.org/pdf/2002.05709.pdf
        Args:
            features: hidden vector of shape [bsz, n_views, ...].
            labels: ground truth of shape [bsz].
            mask: contrastive mask of shape [bsz, bsz], mask_{i,j}=1 if sample j
                has the same class as sample i. Can be asymmetric.
        Returns:
            A loss scalar.
        """
        device = (torch.device('cuda')
                  if features.is_cuda
                  else torch.device('cpu'))

        if len(features.shape) < 3:
            raise ValueError('`features` needs to be [bsz, n_views, ...],'
                             'at least 3 dimensions are required')
        if len(features.shape) > 3:
            features = features.view(features.shape[0], features.shape[1], -1)

        batch_size = features.shape[0]
        if labels is not None and mask is not None:
            raise ValueError('Cannot define both `labels` and `mask`')
        elif labels is None and mask is None:
            mask = torch.eye(batch_size, dtype=torch.float32).to(device)
        elif labels is not None:
            labels = labels.contiguous().view(-1, 1)
            if labels.shape[0] != batch_size:
                raise ValueError('Num of labels does not match num of features')
            mask = torch.eq(labels, labels.T).float().to(device)
        else:
            mask = mask.float().to(device)

        contrast_count = features.shape[1]
        contrast_feature = torch.cat(torch.unbind(features, dim=1), dim=0)
        if self.contrast_mode == 'one':
            anchor_feature = features[:, 0]
            anchor_count = 1
        elif self.contrast_mode == 'all':
            anchor_feature = contrast_feature
            anchor_count = contrast_count
        else:
            raise ValueError('Unknown mode: {}'.format(self.contrast_mode))

        # compute logits
        anchor_dot_contrast = torch.div(
            torch.matmul(anchor_feature, contrast_feature.T),
            self.temperature)
        # for numerical stability
        logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True)
        logits = anchor_dot_contrast - logits_max.detach()

        # tile mask
        mask = mask.repeat(anchor_count, contrast_count)
        # mask-out self-contrast cases
        logits_mask = torch.scatter(
            torch.ones_like(mask),
            1,
            torch.arange(batch_size * anchor_count).view(-1, 1).to(device),
            0
        )
        mask = mask * logits_mask

        # compute log_prob
        exp_logits = torch.exp(logits) * logits_mask
        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

        # compute mean of log-likelihood over positive
        mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)

        # loss
        loss = - (self.temperature / self.base_temperature) * mean_log_prob_pos
        loss = loss.view(anchor_count, batch_size).mean()

        return loss

We will train the features part of the network with an unsupervised contrastive loss and then fine-tune the classifier.

In [None]:
feat_model = models.alexnet(num_classes=10).features
feat_model

In [None]:
# select an appropriate lr and number of epochs
epochs = 50
criterion = SupConLoss()
optimizer = optim.SGD(lr=0.001, weight_decay=5e-4, momentum=0.9)
for e in range(epochs):
    for idx, (images, labels) in enumerate(trainloader):
        images = images.cuda()
        labels = labels.cuda()
        features = feat_model(images)
        loss = criterion(features, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

In [None]:
model = models.alexnet(num_classes=10)
model.load_state_dict(feat_model.state_dict(), strict=False)
for param in model.features.parameters():
    param.require_grad=False

Train the model on the classification task and compare the performance with previous experiments

In [None]:
### your code here

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional
Neural Networks”, NIPS 2012

[2] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. "A simple framework for contrastive learning of visual representations." In International conference on machine learning. PMLR, 1597–1607. 

[3] Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E Hinton. 2020. "Big self-supervised models are strong semi-supervised learners." Advances in neural information processing systems 33 (2020), 22243–22255.

[4] P. Khosla et al., "Supervised Contrastive Learning", in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 18661–18673.