# Implementing a convolutional layer

## Introduction

The objective of this exercise is get an in-depth understanding of the convolutional layer as it is a crucial layer in Deep Neural Networks, especially when applied to image datasets. To achieve this, you will simply implement your own custom convolutional layer ``MyConv2d`` and make sure that it yields the same results as [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d).


## Contents:

1. Utils
2. Implement a custom layer in Pytorch: ``MyConv2d``
3. Use ``MyConv2d`` inside a neural network model
4. Test ``MyConv2d`` by comparing it to [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d)

## Andrew's videos related to this exercise:
- [C4W1L02 Edge Detection Examples](https://www.youtube.com/watch?v=XuD4C8vJzEQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=2)
- [C4W1L04 Padding](https://www.youtube.com/watch?v=smHa2442Ah4&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=4)
- [C4W1L05 Strided Convolutions](https://www.youtube.com/watch?v=tQYZaDn_kSg&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=5)
- [C4W1L06 Convolutions Over Volumes](https://www.youtube.com/watch?v=KTB_OFoAQcc&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=6)
- [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7)



In [1]:
import torch
from torch import nn, optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from datetime import datetime
from typing import Sequence
from torch.utils.data import random_split

torch.manual_seed(123)
# We use torch.double to get the same results as Pytorch
torch.set_default_dtype(torch.double)

## 1. Utils

Nothing to see in the cell below, just the definition functions we'll need later, you don't even need to read them, just know that there are 2 functions:

- ``train``: Train a model, save weight values of the convolutional layer for each epoch of the training. Return them, stored in 2 lists: ``weight_values`` and ``bias_values``.
- ``int_to_pair(n)`` : Return `(n, n)` if `n` is an int or `n` if `n` is already a tuple of length 2.
- ``relative_error(a, b)``: Compute the relative error of ``b``

In [2]:
device = torch.device('cpu')
print(f"Device {device}.")

def train(n_epochs, optimizer, model, loss_fn, train_loader):
    """
    Train our model and save weight values
    """
    n_batch = len(train_loader)
    losses_train = []
    model.train()
    optimizer.zero_grad(set_to_none=True)
    
    weight_values = []
    bias_values = []
    
    for epoch in range(1, n_epochs + 1):
        
        loss_train = 0.0
        for imgs, labels in train_loader:

            # We use torch.double to get the same results as Pytorch
            imgs = imgs.to(device=device, dtype=torch.double) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()

            loss_train += loss.item()
            
        losses_train.append(loss_train / n_batch)
        
        # Here we store weight values at each step of the training process
        with torch.no_grad():
            weight_values.append(model.conv1.weight.data.clone().detach())
            if model.conv1.bias is not None:
                bias_values.append(model.conv1.bias.data.clone().detach())

        print('{}  |  Epoch {}  |  Training loss {:.5f}'.format(
            datetime.now().time(), epoch, loss_train / n_batch))
    return weight_values, bias_values

def int_to_pair(n):
    """
    Return `(n, n)` if `n` is an int or `n` if it is already a tuple of length 2
    """
    # If n is a float or integer
    if not isinstance(n, Sequence):
        return (int(n), int(n))
    elif len(n) == 1:
        return (int(n[0]), int(n[0]))
    elif len(n) == 2:
        return ( int(n[0]), int(n[1]) )
    else:
        raise ValueError("Please give an int or a pair of int")
    

def relative_error(a, b):
    return (torch.norm(a - b) / torch.norm(a))

Device cpu.


The objective this week is not to solve a classification task, only to implement a convolutional layer. We will then use a smaller version of the MNIST dataset in order to reduce computational time. 

## TODO:

Write a ``load_MNIST`` function that:
- Load and preprocess the MNIST training dataset such that images are
  - center-cropped from 28x28 to 24x24 pixels.  
  - transformed to tensor
  - normalized using appropriate mean and standard deviation values. 
- return a small portion (10%) of the MNIST training dataset. 

In [3]:
def load_MNIST(data_path='../data/', preprocessor=None):
    
    if preprocessor is None:
        preprocessor = transforms.Compose([
            transforms.CenterCrop(24),
            transforms.ToTensor(),
            transforms.Normalize([0.1306], [0.3080]),
        ])
    
    # load datasets
    data_train_val = datasets.MNIST(
        data_path,       
        train=True,      
        download=True,
        transform=preprocessor)

    # keep only a few samples
    n_kept = int(len(data_train_val)*0.1)
    n_thrown =  len(data_train_val) - n_kept

    data_train, _ = random_split(
        data_train_val, 
        [n_kept, n_thrown],
        generator=torch.Generator().manual_seed(123)
    )

    print("Size of the train dataset:        ", len(data_train))
    
    return data_train

data_train = load_MNIST()

Size of the train dataset:         6000


## 2. Implement a custom layer in Pytorch: MyConv2d 

In the cell below, there is a template of a ``MyConv2d`` class that would re-create a [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d) layer. By solving the 4 problems below you will complete this class step by step.

First of all, defining a custom layer in PyTorch is very similar to defining a custom neural network or a custom block of layers: they all consist in subclassing the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=nn%20module#torch.nn.Module) class (see the 3rd tutorial for more details). As usual, we have to create a class that subclasses nn.Module and that implements a [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) method that defines what happens to a given input.


## TODO:

### 1. Compute the output shape of a convolutional layer

Reminder about Pytorch notations and dimensions from the 2nd tutorial:

- ``N``: batch size,         (how many inputs do you feed at the same time)
- ``C``: number of channels, (number of color channels RGB=3, RGBA=4, etc if refering to an image or the number of filters if refering to a convolutional layer)
- ``H``: height of the image
- ``W``: width of the image


Take a look at the [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d) documentation and scroll down to examine the output shape formula. To get an illustration of this formula, you can also watch Andrew's video  [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7) 

Write a method ``get_output_size(self, x)`` (so inside the ``MyConv2d``  class), such that:
- ``x`` is an input batch of dimension ``(N, C_in, H_in, W_in)``
- it returns the output shape of a convolutional layer ``(N, C_out, H_out, W_out)`` (following Pytorch's notations). Note that we have:
  - ``N = x.shape[0]``
  - ``C_in = self.in_channels``
  - ``H_in = x.shape[-2]``
  - ``W_in = x.shape[-1]``
  - ``kernel_size = self.kernel_size`` (Note: it's a tuple (kernel_size_height, kernel_size_width))
  - ``padding = self.padding`` (Note: it's a tuple (padding_height, padding_width))
  - ``stride = self.stride`` (Note: it's a tuple (stride_height, stride_width))

### 2. Apply the padding transform to an image

In this video [C4W1L04 Padding](https://www.youtube.com/watch?v=smHa2442Ah4&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=4), you learned how to apply a padding transform to an image.

Write a method ``apply_padding(self, x)`` (so inside the ``MyConv2d`` class), such that: 
- ``x`` is an input batch of dimension ``(N, C_in, H_in, W_in)``
- it returns a tensor ``x_pad``, whose center values are the same as ``x`` values but with extra zeros on the border (the numbers of zeros to add are defined by ``self.padding``). **Note:**  ``x_pad`` is then of dimension ``(N, C_in, H_in + 2*self.padding[0], W_in + 2*self.padding[1])``

### 3. Apply the convolution operation to an image

You learned how to apply convolution to an image, in the following videos:

- [C4W1L02 Edge Detection Examples](https://www.youtube.com/watch?v=XuD4C8vJzEQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=2)
- [C4W1L05 Strided Convolutions](https://www.youtube.com/watch?v=tQYZaDn_kSg&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=5)
- [C4W1L06 Convolutions Over Volumes](https://www.youtube.com/watch?v=KTB_OFoAQcc&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=6)
- [C4W1L07 One Layer of a Convolutional Net](https://www.youtube.com/watch?v=jPOAS7uCODQ&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=7)

Write a method ``apply_conv(self, x_pad)`` (so inside the ``MyConv2d``  class), such that 
- ``x_pad`` is the padded version of the input batch ``x``.
- it returns ``out`` (whose shape is computed by ``self.get_out_put_size``) the output of the convolutional operation applied to ``x_pad``. **Note:** since we follow Pytorch's implementation of convolution, it has to be possible choose to have a bias or not. So in your ``apply_conv`` there must be somewhere a condition ``if self.bias is None: ...... else: ....... ``

**Note**: A few word about vectorization: In python you should always favour vectorized computations because it is much faster. However, the most important thing is still that you get the right result. So try to vectorize computations as much as possible but it is fine if a few for loops remain.

### 4. Implementing a Convolutional layer (forward method)

In the [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) method, combine your previously defined methods to:

1. Figure out the output shape expected ``(N, C_out, H_out, W_out)`` by calling ``self.get_output_size``
2. Apply padding to ``x`` by calling ``self.apply_padding``
4. Compute ``out``, the result of the convolution operation by calling ``self.apply_conv`` (depending on point 3. you might have to have some for loops)
5. Return ``out``



In [4]:
class MyConv2d(nn.Module):
    """
    Custom convolutional 2d layer
    """
    
    def __init__(
        self, 
        in_channels:int,
        out_channels:int,
        kernel_size, 
        stride = (1, 1), 
        padding = (0, 0), 
        bias:bool = False,
    ):
        """
        in_channels: Number of input channels 
        out_channels: Number of output channels (number of filters)
        kernel_size: Filter's size (tuple of int)
        stride: Length of the kernel's jumps  (tuple of int)
        padding: Number of pixels to add around the image (tuple of int)
        bias: Should a bias parameter be added to each filter or not?
        """

        super().__init__()
        # Will NOT be automatically added to the list
        # of trainable parameter (see doc of nn.Parameter)
        self.in_channels = int(in_channels)
        self.out_channels = int(out_channels)
        self.padding = int_to_pair(padding)
        self.stride = int_to_pair(stride)
        self.kernel_size = int_to_pair(kernel_size)

        # Will be automatically added to the list of 
        # model's trainable parameters (see doc quoted above)
        # Dim = (C_out, C_in, kernel_height, kernel_width)
        self.weight = nn.Parameter(torch.Tensor(1, self.out_channels, self.in_channels, self.kernel_size[0], self.kernel_size[1]))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(self.out_channels))
        else:
            self.bias = None
            #print(self.bias) # Uncomment this to make sure that the printed output starts with  "Parameter containing: ..."
        #print(self.weight) # Uncomment this to make sure that the printed output starts with  "Parameter containing: ..."


    def get_output_size(self, x):
        # shape of x: (N, C_in, H_in, W_in)
        # Note: batch_size is the only dimension that is not influenced
        # by the convolution operation
        H_out = (x.shape[-2] + 2*self.padding[0] - self.kernel_size[0]) // self.stride[0] + 1 
        W_out = (x.shape[-1] + 2*self.padding[1] - self.kernel_size[1]) // self.stride[1] + 1 
        return (int(x.shape[0]), self.out_channels, H_out, W_out)

    def apply_padding(self, x):
        # shape of x: (N, C_in, H_in, W_in)
        # shape of x_pad: (N, 1, C_in, H_in + 2*pad_height, W_in + 2*pad_width)
        x_pad = torch.zeros(x.shape[0], 1, x.shape[1], x.shape[2]+2*self.padding[0], x.shape[3]+2*self.padding[1])
        x_pad[:, 0, :, self.padding[0]:(x.shape[2]+self.padding[0]), self.padding[1]:(x.shape[3]+self.padding[1])] = torch.clone(x)
        return x_pad

    def apply_conv(self, x_pad, i, j):
        start_h = i*self.stride[0]
        end_h = min(start_h + self.kernel_size[0], x_pad.shape[-2])
        start_w = j*self.stride[1]
        end_w = min(start_w + self.kernel_size[1], x_pad.shape[-1])
        # shape of x_pad: (N, 1, C_in, H_in + 2*pad_height, W_in + 2*pad_width)
        # shape of weight: (C_out, C_in, kernel_height, kernel_width)
        # shape of tmp_mul: (N, C_out, C_in, kernel_height, kernel_width)
        # shape of output: (N, C_out)
        if self.bias is not None:
            return torch.sum(
                    self.weight[:, :, 0:(end_h-start_h), 0:(end_w-start_w)] * x_pad[:, :, :, start_h:end_h, start_w:end_w],
                    dim=(2,3,4) ,keepdim=False
                ) + self.bias
        else:
            return torch.sum(
                    self.weight[:, :, 0:(end_h-start_h), 0:(end_w-start_w)] * x_pad[:, :, :, start_h:end_h, start_w:end_w],
                    dim=(2,3,4) ,keepdim=False
                )

        
    def forward(self, x):
        """
        Required method for any nn.Module class
        """
        # shape of x: (N, C_in, H_in, W_in)
        (batch_size, C_out, H_out, W_out) = self.get_output_size(x)
        x_pad = self.apply_padding(x)
        out = torch.empty((batch_size, C_out, H_out, W_out))
        for i in range(H_out):
            for j in range(W_out):
                out[:,:,i, j] = self.apply_conv(x_pad, i, j)
        return out

    def __str__(self):
        """
        Standard python method to implement if you want to custom your ``print(MyConv2d(...))``

        This method is not mandotory, it's just so that you can print MyConv2d
        the same way a Conv2d layer is printed
        """
        string = (
            "MyConv2d(" + str(self.in_channels) + ", " + str(self.out_channels)
            +", kernel size=" + str(self.kernel_size) + ", stride=" + str(self.stride)
            +", padding=" + str(self.padding) + ", bias=" +str(self.bias)
        )
        return string



## 3. Use MyConv2d inside a neural network model

Here is defined a very basic neural network to test ``MyConv2d`` on MNIST data. It consists of: 

- a 2D-Convolutional layer (``MyConv2d`` or ``nn.Conv2d``) (see ``conv_type`` parameter)
- a tanh activation function
- a 2D-MaxPooling layer, to reduce the size of the image and therefore reduce the number of parameters
- a fully connected layer with 10 outputs for the 10 classes of the MNIST dataset

You don't have to do anything here, simply run the cell

In [5]:
class MyNet(nn.Module):
    """
    Simple net with only one conv layer and one fc layer. 
    
    The convolutional layer can be ``MyConv2d`` or ``nn.Conv2d``
    """

    def __init__(
        self,
        in_channels:int,
        out_channels:int,
        kernel_size = (3, 3), 
        stride = (1, 1), 
        padding = (0, 0), 
        bias:bool = False,
        conv_type:str = 'custom',
    ):
        """
        in_channels: Number of input channels 
        out_channels: Number of output channels (number of filters)
        kernel_size: Filter's size (tuple of int)
        stride: Length of the kernel's jumps  (tuple of int)
        padding: Number of pixels to add around the image (tuple of int)
        bias: Should a bias parameter be added to each filter or not?
        conv_type: Should MyConv2d or nn.Conv2d be used? 
        """
        
        super().__init__()
        # Make sure these parameters are all pairs of int
        kernel_size = int_to_pair(kernel_size)
        stride = int_to_pair(stride)
        padding = int_to_pair(padding)
        # Use MyConv2d
        if conv_type == 'custom':
            self.conv1 = MyConv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=bias)
        # or use pytorch's Conv2d
        else:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=bias)
        # Output shape
        H_out = (24 + 2*padding[0] - kernel_size[0]) // stride[0] + 1 
        W_out = (24 + 2*padding[1] - kernel_size[1]) // stride[1] + 1 
        
        # Divide by 2 here because we will apply a pooling layer
        self.fc2 = nn.Linear((H_out//2)*(W_out//2)*out_channels, 10)

    def forward(self, x):
        # x shape: (batch_size, C_in, H, W)
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        out = out.view(-1, out.shape[-3]*out.shape[-2]*out.shape[-1])
        out = self.fc2(out)
        return out

## 4. Test MyConv2d by comparing it to nn.Conv2d

MyConv2d and nn.Conv2d are compared by storing after each epoch:

- Weight and biases values of the convolutional layer
- Training loss

**NOTE**

- Your implementation will probably be much slower than nn.Conv2d.
- If your implementation is correct, the training loss printed should be identical and the expected relative error should be around 1e-16.
- As reminded below: You can play with the following parameters if you want, especially for debugging purpose, but before submitting your notebook:
  - set ``n_epochs`` to ``4`` 
  - set ``c_out`` to  ``>= 2`` 
  - set ``kernel = (n1, n2)`` such that ``n1 != n2`` 
  - set ``stride = (n3, n4)`` such that ``n3 != n4``
  - set ``padding = (n5, n6)`` such that ``n5 != n6``
  - set ``bias`` to ``True``

## TODO

1. According to you, why is your implementation slower? (There could be multiple reasons)

In [6]:
# DONT USE GPU!!! IT WOULD REQUIRE USING register_buffer TO MOVE THE MODEL CORRECTLY TO
# THE GPU. (See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer)
# And you probably don't want to spend the entire week learning how to use this properly
device = torch.device('cpu')
print(f"Training on device {device}.")

# We set shuffle to False so that we can compare both models more accurately
train_loader = torch.utils.data.DataLoader(data_train, batch_size=512, shuffle=False)
loss_fn = nn.CrossEntropyLoss()

# These parameters don't matter much in this assignment
n_epochs = 4    
lr = 0.1
c_in = 1        # Grey scale images so c_in = 1

# You can play with the following parameters if you want, especially for debugging purpose but before submitting your notebook:
# - set c_out to  >= 2 
# - set kernel = (n1, n2) such that n1 != n2 
# - set stride = (n3, n4) such that n3 != n4
# - set padding = (n5, n6) such that n5 != n6
# - set bias to True
c_out = 8    
kernel = (3,4)
stride = (2,1)
padding = (1,2)
bias = True 

torch.manual_seed(265)
# Using MyConv2d
model01 = MyNet(c_in, c_out, kernel_size=kernel, padding=padding, stride=stride, bias=bias, conv_type='custom').to(device=device) 
# Using nn.Conv2d
model02 = MyNet(c_in, c_out, kernel_size=kernel, padding=padding, stride=stride, bias=bias, conv_type='pytorch').to(device=device) 

# Make both models start with the same weight values
# Note that contrary to project 1, setting a manual seed before each
# model instanciation is not enough because the convolutional layers
# are implemented differently
with torch.no_grad():
    model01.conv1.weight.data = model02.conv1.weight.data.clone()
    if model01.conv1.bias is not None:
        model01.conv1.bias.data = model02.conv1.bias.data.clone()
    model01.fc2.bias.data = model02.fc2.bias.data.clone()
    model01.fc2.weight.data = model02.fc2.weight.data.clone()

print("\n ========= Training using MyConv2d =========")

optimizer = optim.SGD(model01.parameters(), lr=lr)
weights01, biases01 = train(
    n_epochs = n_epochs,
    optimizer = optimizer, 
    model = model01,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

print("\n ========= Training using nn.Conv2d =========")


optimizer = optim.SGD(model02.parameters(), lr=lr)
weights02, biases02 = train(
    n_epochs = n_epochs,
    optimizer = optimizer, 
    model = model02,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

print("\n ======= Relative error:     nn.Conv2d    VS    MyConv2d   =========")
print("weight:  ", [relative_error(weights02[i], weights01[i]) for i in range(len(weights02))] )
if bias:
    print("bias:    ", [relative_error(biases02[i], biases01[i]) for i in range(len(biases02))] )

Training on device cpu.



  Variable._execution_engine.run_backward(


11:04:36.283716  |  Epoch 1  |  Training loss 1.65041
11:04:49.787356  |  Epoch 2  |  Training loss 0.87041
11:05:02.959676  |  Epoch 3  |  Training loss 0.63724
11:05:17.267263  |  Epoch 4  |  Training loss 0.53621

11:05:18.885795  |  Epoch 1  |  Training loss 1.65041
11:05:20.759910  |  Epoch 2  |  Training loss 0.87041
11:05:22.560617  |  Epoch 3  |  Training loss 0.63724
11:05:24.338566  |  Epoch 4  |  Training loss 0.53621

weight:   [tensor(1.2782e-16), tensor(1.2495e-16), tensor(1.5564e-16), tensor(1.4324e-16)]
bias:     [tensor(2.0313e-16), tensor(1.9168e-16), tensor(1.5030e-16), tensor(1.9176e-16)]


In [7]:
print(weights01)
print(biases01)

[tensor([[[[ 0.0919,  0.1978, -0.1818,  0.1813],
          [ 0.0879,  0.0478,  0.1608, -0.2375],
          [ 0.3274,  0.0829,  0.1484,  0.0470]]],


        [[[-0.1470,  0.2617,  0.1812,  0.3049],
          [ 0.3262,  0.0980,  0.2830,  0.0120],
          [ 0.2943, -0.1640, -0.2505, -0.0157]]],


        [[[-0.2092, -0.0616, -0.0547,  0.0385],
          [ 0.1728,  0.2239, -0.0740,  0.1061],
          [-0.0549,  0.1786, -0.0521, -0.0992]]],


        [[[ 0.1670, -0.0755, -0.1079,  0.0503],
          [ 0.0337, -0.0064,  0.0045,  0.2014],
          [ 0.2909,  0.2476,  0.3405,  0.0170]]],


        [[[ 0.2229,  0.1066, -0.3072, -0.1484],
          [-0.1369, -0.2767,  0.0317, -0.2690],
          [ 0.0960, -0.0063, -0.2061,  0.0387]]],


        [[[ 0.2009,  0.2604, -0.0370,  0.1221],
          [ 0.1912, -0.0198, -0.1543,  0.2744],
          [-0.1815,  0.2016, -0.1290,  0.1012]]],


        [[[-0.1920,  0.0951, -0.2677,  0.1269],
          [-0.2961, -0.1194, -0.1663, -0.0071],
          [ 0.1