## Problems 1 (2 Points)

In this homework we will implement a forward pass of the convolutional layer. We will not touch the reverse pass and calculation of derivatives.

Let's remember how the convolutional layer works:

* an array of images is supplied as input also called a batch
* zeros are added to each image along the borders
* each of the filters of the convolutional layer “slides” over each image

***Let's start with a warm-up - we'll implement a function that adds padding.***

Let us have an input_images batch of two images with three channels (RGB). Let the size of the images be 3*3. Recall that the input of the convolutional layer has the following dimension:

* batch size
* number of channels
* height
* width

In the case under consideration, the input dimension is (2, 3, 3, 3).

If we add a padding of one zero around each image then the size of each image will become 3 + 2 * 1 = 5 pixels wide and 5 pixels high respectively (add one zero on each side of the image).

Write any working implementation.

![](https://ucarecdn.com/b4f16f35-13a7-4740-9760-075a708382c3/-/crop/463x266/0,168/-/preview/)

In [35]:
import torch

# Create an input array from two RGB 3*3 images
input_images = torch.tensor(
      [[[[0,  1,  2],
         [3,  4,  5],
         [6,  7,  8]],

        [[9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20],
         [21, 22, 23],
         [24, 25, 26]]],


       [[[27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]],

        [[36, 37, 38],
         [39, 40, 41],
         [42, 43, 44]],

        [[45, 46, 47],
         [48, 49, 50],
         [51, 52, 53]]]])


def get_padding2d(input_images, padding_size=1):
    batch_size, num_channels, output_height, output_width = input_images.shape

    padded_images = torch.zeros((batch_size, num_channels, output_height + 2 * padding_size, output_width + 2 * padding_size))
    padded_images[:, :, padding_size:padding_size + output_height, padding_size:padding_size + output_width] = input_images

    return padded_images


correct_padded_images = torch.tensor(
       [[[[0.,  0.,  0.,  0.,  0.],
          [0.,  0.,  1.,  2.,  0.],
          [0.,  3.,  4.,  5.,  0.],
          [0.,  6.,  7.,  8.,  0.],
          [0.,  0.,  0.,  0.,  0.]],

         [[0.,  0.,  0.,  0.,  0.],
          [0.,  9., 10., 11.,  0.],
          [0., 12., 13., 14.,  0.],
          [0., 15., 16., 17.,  0.],
          [0.,  0.,  0.,  0.,  0.]],

         [[0.,  0.,  0.,  0.,  0.],
          [0., 18., 19., 20.,  0.],
          [0., 21., 22., 23.,  0.],
          [0., 24., 25., 26.,  0.],
          [0.,  0.,  0.,  0.,  0.]]],


        [[[0.,  0.,  0.,  0.,  0.],
          [0., 27., 28., 29.,  0.],
          [0., 30., 31., 32.,  0.],
          [0., 33., 34., 35.,  0.],
          [0.,  0.,  0.,  0.,  0.]],

         [[0.,  0.,  0.,  0.,  0.],
          [0., 36., 37., 38.,  0.],
          [0., 39., 40., 41.,  0.],
          [0., 42., 43., 44.,  0.],
          [0.,  0.,  0.,  0.,  0.]],

         [[0.,  0.,  0.,  0.,  0.],
          [0., 45., 46., 47.,  0.],
          [0., 48., 49., 50.,  0.],
          [0., 51., 52., 53.,  0.],
          [0.,  0.,  0.,  0.,  0.]]]])


print(torch.allclose(get_padding2d(input_images), correct_padded_images))

True


## Problems 2 (3 Points)

In this task, we will consider what the convolutional layer consists of.

A convolutional layer is an array of filters.

Each filter has the following dimension:
* number of layers in the input image (for RGB this is 3)
* filter height
* filter width

In the kernel, all filters have the same dimensions so the width and height of the filters are called the kernel width and height. Most often, the width of the kernel is equal to the height of the kernel in which case they are called the kernel size (kernel_size).


The layer also has the following parameters:

* padding - how many pixels to increase the input image on each side.

* stride - how many pixels the filter is shifted by when calculating the convolution



Try to derive the formula for the output dimension of the convolutional layer yourself, knowing the input and kernel parameters.

Check the correctness of the formula by comparing it with the formula from the [documentation](https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d).


To make sure your formula is correct write a function that takes as input:
* input dimension (number of images in a batch*number of layers in one image*image height*image width)
* number of filters
* filter size (we assume that the height is the same as the width)
* padding
* stride

The function must return the output dimension.

In [34]:
import numpy as np


def calc_out_shape(input_matrix_shape, out_channels, kernel_size, stride, padding):
    out_shape = [
        input_matrix_shape[0],
        out_channels,
        1 + (input_matrix_shape[2] + 2 * padding - (kernel_size - 1) - 1) // stride,
        1 + (input_matrix_shape[3] + 2 * padding - (kernel_size - 1) - 1) // stride
    ]
    return out_shape

print(np.array_equal(
    calc_out_shape(input_matrix_shape=[2, 3, 10, 10],
                   out_channels=10,
                   kernel_size=3,
                   stride=1,
                   padding=0),
    [2, 10, 8, 8]))

True


## Problem 3 (5 Points)

Let's reuse the code from the previous step to test our implementation of the convolutional layer.

Let's consider convolving a batch of one single-layer 3*3 image with a kernel of one 2*2 filter, stride = 1, that is the output should be one 2*2 matrix. The strictly written output dimension is equal to (1 - images in the batch, 1 - number of filters in the kernel, 2 - height of the output matrix, 2 - width of the output matrix).

Let W be the kernel weights, X the input, Y the output.
![](https://ucarecdn.com/fe231533-bcb9-40c6-a589-46b515991c35/)

You can calculate the output in a loop:
![](https://ucarecdn.com/2eef930e-2afe-420a-8965-96acc95a6139/)

At each iteration of the loop, the filter is multiplied pixel by pixel by part of the image, and then the 4 resulting numbers are added up to produce one pixel of output.

The required number of iterations for this case is 4, since there can be 2 positions of the core and 2 vertically, the total number of iterations is the product of the numbers of positions, that is, in this case 2*2 = 4.

Let's move from the simple case to the general one.

* ***If the image were multi-layered*** for example three-layered - RGB, then the filters in the core should also be three-layered. Each filter layer is multiplied pixel-by-pixel by the corresponding layer of the original image. That is, in this case, after multiplication, the result would be 4 * 3 = 12 products, the results of which are added up, and the value of the output pixel is obtained.

* ***If there were more than one filters in the kernel*** then an outer filter loop would be added, inside which we calculate the convolution for each filter.

* ***If there were more than 1 image in the input batch*** then another outer loop would be added over the images in the batch.

Reminder: In all steps of this tutorial, we consider the bias in the convolution layers to be zero.

This problem requires implementing a convolutional layer through loops.

Please note that the code considers the general case - the input batch does not necessarily consist of one image; there are several layers in the kernel.

In [31]:
import torch
from abc import ABC, abstractmethod


def calc_out_shape(input_matrix_shape, out_channels, kernel_size, stride, padding):
    batch_size, channels_count, input_height, input_width = input_matrix_shape
    output_height = (input_height + 2 * padding - (kernel_size - 1) - 1) // stride + 1
    output_width = (input_width + 2 * padding - (kernel_size - 1) - 1) // stride + 1

    return batch_size, out_channels, output_height, output_width

# abstract class for convolution layer
class ABCConv2d(ABC):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride

    def set_kernel(self, kernel):
        self.kernel = kernel

    @abstractmethod
    def __call__(self, input_tensor):
        pass

# wrapper class over torch.nn.Conv2d to unify the interface
class Conv2d(ABCConv2d):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size,
                                      stride, padding=0, bias=False)

    def set_kernel(self, kernel):
        self.conv2d.weight.data = kernel

    def __call__(self, input_tensor):
        return self.conv2d(input_tensor)

# function that creates an object of class cls and returns a convolution from input_matrix
def create_and_call_conv2d_layer(conv2d_layer_class, stride, kernel, input_matrix):
    out_channels = kernel.shape[0]
    in_channels = kernel.shape[1]
    kernel_size = kernel.shape[2]

    layer = conv2d_layer_class(in_channels, out_channels, kernel_size, stride)
    layer.set_kernel(kernel)

    return layer(input_matrix)

# Function that tests the conv2d_cls class.
# Returns True if the convolution matches the convolution using torch.nn.Conv2d.
def test_conv2d_layer(conv2d_layer_class, batch_size=2,
                      input_height=4, input_width=4, stride=2):
    kernel = torch.tensor(
                      [[[[0., 1, 0],
                         [1,  2, 1],
                         [0,  1, 0]],

                        [[1, 2, 1],
                         [0, 3, 3],
                         [0, 1, 10]],

                        [[10, 11, 12],
                         [13, 14, 15],
                         [16, 17, 18]]]])

    in_channels = kernel.shape[1]

    input_tensor = torch.arange(0, batch_size * in_channels *
                                input_height * input_width,
                                out=torch.FloatTensor()) \
        .reshape(batch_size, in_channels, input_height, input_width)

    custom_conv2d_out = create_and_call_conv2d_layer(
        conv2d_layer_class, stride, kernel, input_tensor)
    conv2d_out = create_and_call_conv2d_layer(
        Conv2d, stride, kernel, input_tensor)

    return torch.allclose(custom_conv2d_out, conv2d_out) \
             and (custom_conv2d_out.shape == conv2d_out.shape)


# Convolutional layer through loops.
class Conv2dLoop(ABCConv2d):
    def __call__(self, input_tensor):
      kernel_size = self.kernel.shape[2]
      # Calculate
      temp = calc_out_shape(input_tensor.shape, self.out_channels, self.kernel_size, self.stride, 0)

      batch_size, output_channels, output_height, output_width = temp
      output_tensor = torch.zeros((batch_size, output_channels, output_height, output_width))

      for n in range(batch_size):
          for f in range(output_channels):
              for y in range(output_height):
                  for x in range(output_width):
                      output_tensor[n, f, y, x] = sum(
                          [(self.kernel[f, c, :, :] * input_tensor[n, c, y:y+kernel_size, x:x+kernel_size]).sum()
                            for c in range(input_tensor.shape[1])]
                      )
      return output_tensor

print(test_conv2d_layer(Conv2dLoop))

True


## Problems 4 (5 Points)

Let's reuse the code from the third task to test our implementation of the convolutional layer.

Implementation through loops is very performance inefficient. There are actually two ways to do the same thing using matrix multiplication.

This step will implement the first of them.

Consider the convolution of one single-channel image of size 4*4 pixels (pixel values are denoted by X).

We will collapse with a core of one filter of size 3*3, the weights are designated by W.

For simplicity, let's assume stride = 1.

Then the output Y will have a dimension of 1*1*2*2 (in this case, one image at the input is the first unit in the dimension, one filter in the kernel is the second unit in the output dimension).
![](https://ucarecdn.com/1845714a-9187-4dca-83ef-6fe44f030391/-/crop/760x275/0,198/-/preview/)

It turns out that the convolution output can be obtained by matrix multiplication as shown below.
![](https://ucarecdn.com/e2a38490-d886-47d6-97b0-dc65f8906ba1/)

***We recommend checking this by multiplying the matrices on the piece of paper.***

Let's move from the simple case to the general one:

* ***If there is more than one filter in the kernel.*** Note that for each filter, the W' matrix will be multiplied by the same image vector. This means that it is possible to concatenate the kernel filter matrices vertically and, in one multiplication, obtain the answer for all filters.
![](https://ucarecdn.com/91757315-13b9-439c-a59d-9fa14629ce52/)

* ***If there is more than one image at the input:*** note that the matrix W’ is the same for all batch images, that is you can first stretch each image into a column, and then concatenate these columns horizontally for all batch images.
![](https://ucarecdn.com/e8a10b8d-876e-44cb-ad5c-2a56abafa974/)

* ***If there is more than one layer in the image*** first we perform input and kernel transformations for each layer and then we concatenate: the vectors of different input layers into one large vector and the kernel matrices, respectively into one long matrix. And we will get the addition from the outputs by layers in the process of matrix multiplication.
![](https://lh5.googleusercontent.com/2M0cSgkwRnEMQ8y2mrnD-D2alEYn3vVsX7UrgRNLV9BbYv6nswIWesOpKjjNpPMgUl0ixOZoUVyeZXHy5Jlfy1bS4lLkrLuo2ZmOH1gYh88aMKgKa_mjrZHAWzYbBtWihg8GDrxK)

***That is even in the most general case, we can get the answer with one matrix multiplication.***

But the output calculated in this way does not coincide in dimension with the output of the standard layer from PyTorch - you need to change the dimension.


The code has already implemented:

* transforming the input batch of images
* multiplying the kernel matrix by the input matrix
* response conversion

Reminder: In all steps of this tutorial we consider the bias in the convolution layers to be zero.

***All you have to do is convert the kernel to the format described above.***

***Please note that the code considers the general case - the input consists of several multi-layer images there are several layers in the kernel.***

In [22]:
import torch
from abc import ABC, abstractmethod


def calc_out_shape(input_matrix_shape, out_channels, kernel_size, stride, padding):
    batch_size, channels_count, input_height, input_width = input_matrix_shape
    output_height = (input_height + 2 * padding - (kernel_size - 1) - 1) // stride + 1
    output_width = (input_width + 2 * padding - (kernel_size - 1) - 1) // stride + 1

    return batch_size, out_channels, output_height, output_width


class ABCConv2d(ABC):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride

    def set_kernel(self, kernel):
        self.kernel = kernel

    @abstractmethod
    def __call__(self, input_tensor):
        pass


class Conv2d(ABCConv2d):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size,
                                      stride, padding=0, bias=False)

    def set_kernel(self, kernel):
        self.conv2d.weight.data = kernel

    def __call__(self, input_tensor):
        return self.conv2d(input_tensor)


def create_and_call_conv2d_layer(conv2d_layer_class, stride, kernel, input_matrix):
    out_channels = kernel.shape[0]
    in_channels = kernel.shape[1]
    kernel_size = kernel.shape[2]

    layer = conv2d_layer_class(in_channels, out_channels, kernel_size, stride)
    layer.set_kernel(kernel)

    return layer(input_matrix)


def test_conv2d_layer(conv2d_layer_class, batch_size=2,
                      input_height=4, input_width=4, stride=2):
    kernel = torch.tensor(
                      [[[[0., 1, 0],
                         [1,  2, 1],
                         [0,  1, 0]],

                        [[1, 2, 1],
                         [0, 3, 3],
                         [0, 1, 10]],

                        [[10, 11, 12],
                         [13, 14, 15],
                         [16, 17, 18]]]])

    in_channels = kernel.shape[1]

    input_tensor = torch.arange(0, batch_size * in_channels *
                                input_height * input_width,
                                out=torch.FloatTensor()) \
        .reshape(batch_size, in_channels, input_height, input_width)

    custom_conv2d_out = create_and_call_conv2d_layer(
        conv2d_layer_class, stride, kernel, input_tensor)
    conv2d_out = create_and_call_conv2d_layer(
        Conv2d, stride, kernel, input_tensor)

    return torch.allclose(custom_conv2d_out, conv2d_out) \
             and (custom_conv2d_out.shape == conv2d_out.shape)


class Conv2dMatrix(ABCConv2d):
    # Function to convert the kernel into a matrix of the desired type.
    def _unsqueeze_kernel(self, torch_input, output_height, output_width):
        input_height = torch_input.shape[2]
        input_width = torch_input.shape[3]
        kernel_height = self.kernel.shape[2]
        kernel_width = self.kernel.shape[3]

        cols = self.in_channels * input_height * input_width
        rows = self.out_channels * output_height * output_width
        kernel_unsqueezed = torch.zeros((rows, cols))

        for out_channel in range(self.out_channels):
            for in_channel in range(self.in_channels):
                for i in range(kernel_height):
                    for j in range(kernel_width):
                        for x in range(output_height):
                            for y in range(output_width):
                                if x * self.stride + i < input_height and y * self.stride + j < input_width:
                                    row = (out_channel * output_height + x) * output_width + y
                                    col = (in_channel * input_height + x * self.stride + i) * input_width + y * self.stride + j
                                    kernel_unsqueezed[row, col] = self.kernel[out_channel, in_channel, i, j]

        return kernel_unsqueezed

    def __call__(self, torch_input):
        batch_size, out_channels, output_height, output_width\
            = calc_out_shape(
                input_matrix_shape=torch_input.shape,
                out_channels=self.kernel.shape[0],
                kernel_size=self.kernel.shape[2],
                stride=self.stride,
                padding=0)

        kernel_unsqueezed = self._unsqueeze_kernel(torch_input, output_height, output_width)
        result = kernel_unsqueezed @ torch_input.view((batch_size, -1)).permute(1, 0)
        return result.permute(1, 0).view((batch_size, self.out_channels,
                                          output_height, output_width))

print(test_conv2d_layer(Conv2dMatrix))

True


## Problem 5 (5 Point)

In the previous problem, W' has many zeros. This reduces the effectiveness of the method.

This task will involve implementation through matrices in a different, more efficient way.

Let this time the input be a batch of one three-layer (RGB) image of size 3*3.

Let the kernel have 2 filters with a width and a height of 2 pixels.

Then the output should have dimension 1*2*2*2.

Let W be the kernel weights X the values of the input matrix Y the output values.

For simplicity the image layers and kernel filter layers are colored.

***Please note*** for example the “blue” X0 does not have to be equal to the “red” X0 the same applies to the values in kernel filters - different colors and the same variables can have different values this designation was chosen so as not to clutter the figure with complex indices.
![](https://ucarecdn.com/ddc6ccbe-2aef-4c7b-a47b-a15e67d3f6ec/)

If in the first matrix method we pulled out images into columns, now we will pull out kernel filters into rows.
![](https://ucarecdn.com/afbc3c3c-a347-4248-b0de-cd614e637fc3/)

***We recommend checking on a piece of paper that the result of the product of such matrices gives the same result as the convolution.***

Let's move from the simple case to the general one:

* ***If there is more than one image in a batch*** the kernel transformation does not change, and the transformed input image matrices are concatenated horizontally.

But the output calculated in this way does not coincide in dimension with the output of the standard layer from PyTorch - you need to change the dimension.

The matrix multiplication function has already been implemented.

Reminder: In all steps of this tutorial we consider the bias in the convolution layers to be zero.

***You need to write kernel and input conversion functions.***

In [36]:
import torch
from abc import ABC, abstractmethod


def calc_out_shape(input_matrix_shape, out_channels, kernel_size, stride, padding):
    batch_size, channels_count, input_height, input_width = input_matrix_shape
    output_height = (input_height + 2 * padding - (kernel_size - 1) - 1) // stride + 1
    output_width = (input_width + 2 * padding - (kernel_size - 1) - 1) // stride + 1

    return batch_size, out_channels, output_height, output_width


class ABCConv2d(ABC):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride

    def set_kernel(self, kernel):
        self.kernel = kernel

    @abstractmethod
    def __call__(self, input_tensor):
        pass


def create_and_call_conv2d_layer(conv2d_layer_class, stride, kernel, input_matrix):
    out_channels = kernel.shape[0]
    in_channels = kernel.shape[1]
    kernel_size = kernel.shape[2]

    layer = conv2d_layer_class(in_channels, out_channels, kernel_size, stride)
    layer.set_kernel(kernel)

    return layer(input_matrix)


class Conv2d(ABCConv2d):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size,
                                      stride, padding=0, bias=False)

    def set_kernel(self, kernel):
        self.conv2d.weight.data = kernel

    def __call__(self, input_tensor):
        return self.conv2d(input_tensor)


def test_conv2d_layer(conv2d_layer_class, batch_size=2,
                      input_height=4, input_width=4, stride=2):
    kernel = torch.tensor(
                      [[[[0., 1, 0],
                         [1,  2, 1],
                         [0,  1, 0]],

                        [[1, 2, 1],
                         [0, 3, 3],
                         [0, 1, 10]],

                        [[10, 11, 12],
                         [13, 14, 15],
                         [16, 17, 18]]]])

    in_channels = kernel.shape[1]

    input_tensor = torch.arange(0, batch_size * in_channels *
                                input_height * input_width,
                                out=torch.FloatTensor()) \
        .reshape(batch_size, in_channels, input_height, input_width)

    custom_conv2d_out = create_and_call_conv2d_layer(
        conv2d_layer_class, stride, kernel, input_tensor)
    conv2d_out = create_and_call_conv2d_layer(
        Conv2d, stride, kernel, input_tensor)

    return torch.allclose(custom_conv2d_out, conv2d_out) \
             and (custom_conv2d_out.shape == conv2d_out.shape)


class Conv2dMatrixV2(ABCConv2d):
    # Function for converting the kernel to the required format.
    def _convert_kernel(self):
        filters_number = self.kernel.shape[0]
        converted_kernel = []
        for f in range(filters_number):
            converted_kernel_per_filter = self.kernel[f, :, :, :].reshape((1, -1))
            converted_kernel.append(converted_kernel_per_filter)
        converted_kernel = torch.cat(converted_kernel, 0)

        return converted_kernel

    # Function for converting input to the desired format.
    def _convert_input(self, torch_input, output_height, output_width):
        batch_size = torch_input.shape[0]
        input_channels = torch_input.shape[1]

        converted_input = []
        for batch in range(batch_size):
            image = []
            for channel in range(input_channels):
                image_per_channel = []
                for h in range(output_height):
                    for w in range(output_width):
                        row = torch_input[batch, channel, h:h+self.kernel_size, w:w+self.kernel_size].reshape((-1, 1))
                        image_per_channel.append(row)
                image_per_channel = torch.cat(image_per_channel, 0)
                image.append(image_per_channel)
            image = torch.cat(image, 0)
            converted_input.append(image)
        converted_input = torch.cat(converted_input, 1)

        return converted_input

    def __call__(self, torch_input):
        batch_size, out_channels, output_height, output_width\
            = calc_out_shape(
                input_matrix_shape=torch_input.shape,
                out_channels=self.kernel.shape[0],
                kernel_size=self.kernel.shape[2],
                stride=self.stride,
                padding=0)

        converted_kernel = self._convert_kernel()
        converted_input = self._convert_input(torch_input, output_height, output_width)

        conv2d_out_alternative_matrix_v2 = converted_kernel @ converted_input
        return conv2d_out_alternative_matrix_v2.transpose(1,0).view(torch_input.shape[0],
                                                     self.out_channels,
                                                     output_height,
                                                     output_width)



print(test_conv2d_layer(Conv2dMatrixV2))

True
