# Scoring

- The maximum number of points for this assignment is 10, the minimum number of points is 0.
- You have one week to complete the assignment. Once the assignment is submitted you are not allowed to change it.
- One week delay is penalized with 1 point.
  - Example. The assignment is issued on the 1st January. The deadline without penalization is until 23:59 January 14th (anywhere on Earth). Student A submits his assignment on 22:51 January 6th and is not penalized; student B submits his assignment on 01:13 January 8th and is penalized with 1 point; student C submits his assignment on 3:56 January 16th and is penalized with 2 points and so on.
- You have three weeks to compelte the assignment. After three weeks we will not accept solutions.

# ResNet18

In this task you will write a piece of code that
creates [ResNet18](https://arxiv.org/abs/1512.03385).
ResNet18 is еру deep neural networks devised for image classification.
In the histroy of development of computer vision it is 
a fundamental architecture. 
The main idea of ResNet-family architecture is to sum
an input signal to a computational block $x$ with its output $\mathcal{F}(x)$.
In the paper this type of a connection called "residual connection".
![Residual connection](residual_connection.png)
You will entounter the idea of combining modified signal with the
original one later in our course, for instance in U-Nets.

## Import required packages

In [4]:
import math

import numpy as np
import torch
from torch import nn
from torchvision import datasets, transforms

from tqdm.auto import tqdm
import matplotlib.pyplot as plt

## Preliminaries

In [None]:
# 1 point for both

def conv1x1(in_planes, out_planes, stride=1):
    """
    Args:
        in_planes  (int): the number of input channels;
        out_palnes (int): the number of output channels;
        stride     (int, default=1): stride.
        
    Return:
        A two-dimensional convolutional layer with
        `in_planes` input channels, `out_planes` output channels,
        kernel size 1, stride size `stride`, 0 padding and
        without bias parameter.
    """
    
    """ 
    Your code here. 
    """

def conv3x3(in_planes, out_planes, stride=1):
    """
    Args:
        in_planes  (int): the number of input channels;
        out_palnes (int): the number of output channels;
        stride     (int, default=1): stride.
        
    Return:
        A two-dimensional convolutional layer with
        `in_planes` input channels, `out_planes` output channels,
        kernel size 3, stride size `stride`, padding equals to 1 and
        without bias parameter.
    """
    
    """ 
    Your code here. 
    """


## Resnet18 Basic Block

![resnet_bb.svg](resnet_bb.svg)

In [None]:
class BasicBlock(nn.Module):
    """
    Write a piece of code that creates a Basic block for ResNet
    architecture.
    Basic block has two computational paths: feedforward path and
    residual path (see a picture below).
    We will utilise function `downsample_basic_block` from above
    as a residual path.
    Feedforward path consists of consequitive 
    application of the following five layers:
    1. conv3x3 (use `in_planes`, `planes` and `stride` as 
                parameters for the conv3x3);
    2. Batch normalistaion layer with `planes` features;
    3. Activation function;
    4. conv3x3 (use `planes`, `planes` as input parameters
                to conv3x3, keep `stride` by default)
    5. Batch normalisation layer with `planes` features;
    
    Then sum outputs of the residual and feedforward paths 
    as the picture suggests and return the activated sum (i.e.
    apply the activation function to the sum).
    
    Provide a possibility to use either ReLU 
    or LeakyReLU or PReLU inside a Basic block .
    
    Hint:
        When you are using ReLU function from Pytorch use can
        specify `inplace=True`. In that case the result of the
        activation function will be stored in the same tensor, 
        i.e. you do not need explicitly assign the result of
        the inplace operation to some variable. It could help
        to decrease the memory consumption sometimes.
    """
    def __init__(self, in_planes, planes, stride=1, downsample=None, relu_type='relu'):
        """
        Args:
            in_planes   (int): number of input channels to the block;
            planes     (int): number of output channels of the block;
            stride     (int, default=1): stride for the first convolutional layer;
            downsample (nn.Module, default=None): Convolutional layer to 
                                                  to downsample the residual connection
            rely_type  (str, default='relu'): Type of activation function;
        
        """
        
        super(BasicBlock, self).__init__()

        assert relu_type in ['relu', 'leaky_relu', 'prelu']
        
        self.downsample = downsample
        
        # 1 points
        """
        Your code here.
        """

    def forward(self, x):
        # 1 point
        """
        Your code here.
        """

        return out

## Activation functions

![Activation functions](activations.drawio.svg)

## ResNet18

In [None]:
class ResNet(nn.Module):
    def __init__(
        self, 
        block, 
        layers, 
        in_planes=64,
        num_classes=100, 
        relu_type='relu'
    ):
        """
        ResNet18 comprises an input layer and four consequitive 
        group of Basic block (layer) that are outputs of the
        `self._make_layer` method.
        The first layer (not the input one!) has 64 input channels
        and twice more, 128, output channels, the next one has 128
        input channels and 256 output channels, etc.
        When construction the network use
        for layer1: stride=1
        for layer{2,3,4}: stride=2.
        The next layer averages should average input tensor over
        the spatial dimensions. Let us think that a batch has shape
        [B,C,H,W],  B -- the number of elements in the batch;
                    C -- the number of output channels;
                    H, W -- height and weight, spatial sizes.
        After averaging, your tensor should have size [B,C,1,1].
        The final layer is a linear projection from C-dimensional space
        into `num_classes`-dimensional one.
        
        Hint:
            1. You may want to use nn.AdaptiveAvgPool2d;
        """
        
        self.in_planes = in_planes
        self.relu_type = relu_type
        self.num_classes = num_classes
        self.downsample_block = downsample_basic_block

        super(ResNet, self).__init__()
        
        # 2 points
        """
        Your code here.
        """

        # Initialise modules
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        # 2 points
        """
        Args:
            block  (nn.Module): Basic block to use;
            planes (int): The number of output channels;
            blocks (int): How many Basic blocks to repeat;
            stride (int, default=1): stride.
            
        Return:
            torch.nn.modules.container.Sequential
            
        * _make_layer method creates `blocks` copies of the `block`.
        * The first `block` has `self.in_planes` input channels
            and `planes` output channels;
        * Other `blocks-1` block have the same number of
            input and output channels, namely `planes`.
        * Apply non unit stride ONLY for the first block!
        
        Hints:
            1. Do not forget to use downsample block.
                When do you need to use it?
            2. Use a list `layers` to store a list of 
                required blocks;
            3. Once the layer is created do not forget to
                change the value of `self.inplanes` since
                the number of input channels of the
                next layer is the same as the number of 
                output layers of the current layer.
        """
        
        # Define downsample
        """
        Your code here.
        """
        
        # Define layers
        layers = []
        """
        Your code here.
        """

        return nn.Sequential(*layers)
    
    def downsample_basic_block(self, in_planes, out_planes, stride):
        # 1 point
        """
        Args:
            in_planes  (int): the number of input channels;
            out_palnes (int): the number of output channels;
            stride     (int): stride.

        Return:
            Downsample block comprises two layers:
            1. conv1x1(inplanes, outplanes, stride),
            2. Batch normalisation block with `outplanes` features.
        Hint:
            Use nn.Sequential
        """

        """ 
        Your code here. 
        """

    def make_input_layer(self, in_channels, out_channels, relu_type):
        # 1 points
        """
        Args:
            in_channles  (int): the number if input channels;
            out_channels (int): the number of output channels;
            relu_type    (str): type of an activation function.
            
        Return:
            A sequence of layers:
            1. 2D convolution layer with `in_channels` input channels;
                `out_channels` output channels, `kernel_size` 7,
                `stride` 2, `bias` False. 
                * * * 
                What padding should one use in order to half the
                spatial sizes of the input?
                E.g. 
                inp = torch.randn(B, C, 64, 64)
                conv = nn.Conv2d(C, out_channels, 7, stride=2, padding=???, bias=False)
                conv(inp).shape # equals [B, out_channels, 32, 32]
                * * * 
            2. Batch normalisation layer with out_channels features;
            3. Activation function;
            4. Maximum pooling with `kernel_size` 3, `stride` 2,
                `dilation` 1, `ceil_mode` False.
                * * * 
                What padding should one use in order to half the
                spatial sizes of the input?
                E.g. 
                inp = torch.randn(B, C, 32, 32)
                pooling = nn.MaxPool2d(
                    kernel_size=3, stride=2, padding=???, 
                    dilation=1, ceil_mode=False)
                pooling(inp).shape # equals [B, out_channels, 16, 16]
                * * * 
        """
        
        """
        Your code here.
        """

    def forward(self, x):
        # 1 point
        """
        Your code here.
        """

In [None]:
# Helper functions

def calc_accuracy(trues, logits):
    preds = np.argmax(logits, axis=1)
    return (trues == preds).mean()
  
def train_epoch(net, dl, criterion, optimizer, device='cuda'):
    net.train()
    losses = list()
    for batch in dl:
        images, trues = batch

        images = images.to(device)
        trues = trues.to(device)

        logits = net(images)  

        loss = criterion(logits, trues)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        losses.append(loss.item())
    return losses


def inference_dl(net, dl, device):
    net.eval()
    all_trues = list()
    all_logits = list()
    with torch.no_grad():
    for batch in dl:    
        images, trues = batch
        images = images.to(device)
        logits = net(images)

        all_trues.append(trues)
        all_logits.append(logits)

    all_trues = torch.cat(all_trues)
    all_logits = torch.cat(all_logits)

    return all_trues, all_logits

In [None]:
# These hyperparameters might work.
# ADjust them according to your computational environment.

batch_size = 128
num_workers = 0
lr = 1e-3
num_epochs = 20
device = 'cuda'

In [None]:
# Define normalisation;

all_transforms = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
    ])

# Datasets and Dataloaders;
train_ds = datasets.CIFAR100('../data', train=True, download=True, transform=all_transforms)
test_ds = datasets.CIFAR100('../data', train=False, download=True, transform=all_transforms)

train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_dl = torch.utils.data.DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=num_workers)

# Create a network

relu_type = "prelu"
net = ResNet(BasicBlock, [2, 2, 2, 2], relu_type=relu_type)
net.to(device)

# Define a loss function and optimiser
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

In [None]:
# Train your network

valid_accuracy = list()
train_losses = list()
data = list()

print('Epoch:', -1)
trues, logits = inference_dl(net, test_dl, device)

trues = trues.cpu().numpy()
logits = logits.detach().cpu().numpy()

accuracy = calc_accuracy(trues, logits)
valid_accuracy.append(accuracy)

print('Train loss:', 0)
print('Valid accuracy:', accuracy)

data.append((trues, logits))


for epoch in range(num_epochs):

    print('Epoch:', epoch)

    train_eposc_losses = train_epoch(
        net, 
        train_dl, 
        criterion=criterion, 
        optimizer=optimizer,
        device=device
    )
    train_losses += train_eposc_losses

    trues, logits = inference_dl(net, test_dl, device)

    trues = trues.cpu().numpy()
    logits = logits.detach().cpu().numpy()

    accuracy = calc_accuracy(trues, logits)
    valid_accuracy.append(accuracy)

    print('Train loss:', np.mean(train_eposc_losses))
    print('Valid accuracy:', accuracy)

    data.append((trues, logits))

In [None]:
# Plot a loss curve and an accuracy curve

plt.figure(figsize=(30,5 ))
plt.plot(train_losses)
plt.grid()
plt.show()

plt.figure(figsize=(30,5 ))
plt.plot(valid_accuracy)
plt.grid()
plt.show()

In [None]:
# Plot and analyse the confusion matrix

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

trues = data[-1][0]
preds = np.argmax(data[-1][1], axis=1)

cn = confusion_matrix(trues, preds)

fig, ax = plt.subplots(figsize=(10, 10))
ConfusionMatrixDisplay(cn).plot(ax=ax)
plt.show()
