# Deep Learning for Computer Vision

---

**Goethe University Frankfurt am Main**

Winter Semester 2022/23

<br>

## *Assignment 3 (Layer)*

---

**Points:** 70<br>
**Due:** 16.11.2022, 10 am<br>
**Contact:** Matthias Fulde ([fulde@cs.uni-frankfurt.de](mailto:fulde@cs.uni-frankfurt.de))<br>

---

**Your Name:**

<br>

<br>

## Table of Contents

---

- [1 Vectorization Layer](#1-Vectorization-Layer)
- [2 Linear Layer](#2-Linear-Layer)
- [3 Convolutional Layer](#3-Convolutional-Layer)
- [4 Max Pooling](#4-Max-Pooling)


<br>

## Setup

---

Besides the NumPy and Matplotlib libraries, we import the definitions of the network layers and the corresponding test cases.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from modules.layer import *
from modules.layer_test import *

%load_ext autoreload
%autoreload 2

<br>

## Exercises

---

<br>

### 1 Vectorization Layer (5 Points)

---

We want to be able to use linear and convolutional layers in the same network. Hence we need a function to convert tensor inputs into vectors and vice versa.

<br>

### 1.1 Implementation

---

Complete the definition of the `Vector` class in the `modules/layer.py` file.

In the `forward` method, store the shape of the input to be used in the backward pass. Convert the inputs into vectors such that the result is a matrix where each row is one input. Store the result in the `out` variable that is returned from the method.

In the `backward` method, apply the reversed operation and restore the original shape for the gradient of the loss that the method receives from the following layer. Store the result in the `in_grad` variable that is returned from the method.

Your implementation should be fully vectorized, so no loops are allowed.

<br>

#### 1.1.1 Test

To test your implementation, run the following code cell.

In [None]:
Vector_test()

<br>

### 2 Linear Layer (15 Points)

---

A fully-connected neural network is composed of linear or affine layers that map input vectors with shape $(\text{num_samples},\text{in_features})$ to output vectors with shape $(\text{num_samples},\text{out_features})$, where the difference between a linear and an affine map is whether a bias is added or not. In this exercise we want to implement such a layer as described below.

<br>

### 2.1 Implementation

---

Complete the definition of `Linear` class in the `modules/layer.py` file.

In the `__init__` method, store a flag indicating if a linear or affine transformation should be used.<br>
Initialize the parameters with random values as follows:<br>

- The parameters are stored in a dictionary already created in the base class. Save the weights and, conditionally, the bias, as in
  ```python
  self.param['weight'] = ...
  self.param['bias'] = ...
  ```
- The weights should have the shape $(\text{in_features},\text{out_features})$ and the bias should have the shape $(\text{out_features})$, where $\text{in_features}$ is the length of the input vectors and $\text{out_features}$ is the length of the output vectors.
- Initialize each parameter value from $\text{Uniform}(-\sqrt{k},\sqrt{k})$ where

  $$
      k = \frac{1}{\text{in_features}}.
  $$
   
<br>

In the `forward` method, store the received inputs for gradient computation in the backward pass. Depending on the stored flag, apply a linear or affine transformation to the inputs and store the result in the `out` variable that is returned from the method.
   
<br>

In the `backward` method, compute the gradient of the loss with respect to the parameters and inputs. The method receives the gradient of the loss with respect to the layer output.
   
- The gradient of the loss with respect to the weights and, if required, the bias, is stored in a dictionary inherited from the base class. Save the parameters using the same keys that were used for the parameter dictionary, as in
  ```python
  self.grad['weight'] = ...
  self.grad['bias'] = ...
  ```
- Store the gradient of the loss with respect to the layer inputs in the `in_grad` variable that is returned from the method.

Your implementation should be fully vectorized, that is, no loops are allowed.

<br>

#### 2.1.1 Test

To test your implementation, run the following code cell.

In [None]:
Linear_test()

<br>

### 3 Convolutional Layer (25 Points)

---

In order to implement a CNN, we need convolutional layers. For inputs with shape $(\text{num_samples},\text{in_channels},\text{in_height},\text{in_width})$ the layer should generate output with shape $(\text{num_samples},\text{out_channels},\text{out_height},\text{out_width})$, with the spacial dimensions of the output depending on the size of the input, the kernel size and the padding and stride applied in the convolution.

<br>

### 3.1 Implementation

---

Complete the definition of `Conv2d` class in the `modules/layer.py` file.

In the `__init__` method, store the given values for padding and stride, and a flag indicating if a bias should be used or not.<br>
Initialize the parameters with random values as follows:<br>

- The parameters are stored in a dictionary already created in the base class. Save the weights and, conditionally, the bias, as in
  ```python
  self.param['weight'] = ...
  self.param['bias'] = ...
  ```
- The weights should have the shape $(\text{out_channels},\text{in_channels},\text{kernel_size},\text{kernel_size})$ and the bias should have the shape $(\text{out_channels})$, thus you can assume that the filters are always square.
- Initialize each parameter value from $\text{Uniform}(-\sqrt{k},\sqrt{k})$ where

  $$
      k = \frac{1}{\text{in_channels} \:\cdot\: \text{kernel_size}^2}.
  $$
   
<br>

In the `forward` method, store the received inputs for gradient computation in the backward pass. Convolve the inputs with the filters using the stored values for padding and stride and conditionally add the bias, then store the result in the `out` variable that is returned from the method.
   
<br>

In the `backward` method, compute the gradient of the loss with respect to the parameters and inputs. The method receives the gradient of the loss with respect to the layer output.
   
- The gradient of the loss with respect to the weights and, if required, the bias, is stored in a dictionary inherited from the base class. Save the parameters using the same keys that were used for the parameter dictionary, as in
  ```python
  self.grad['weight'] = ...
  self.grad['bias'] = ...
  ```
- Store the gradient of the loss with respect to the layer inputs in the `in_grad` variable that is returned from the method.

Your implementation should be at least partly vectorized, that is, you're allowed to use at most *four* loops each for the forward and backward pass.

<br>

#### 3.1.1 Test

To test your implementation, run the following code cell.

In [None]:
Conv2d_test()

<br>

### 4 Max Pooling (25 Points)

---

In order to reduce the size of the feature maps in an CNN, we want to use max pooling. For each channel separately, the input is filtered such that only the maximum activation in each filter position is selected.

<br>

### 4.1 Implementation

---

Complete the definition of `MaxPool` class in the `modules/layer.py` file.

In the `__init__` method, store the given values for kernel size and stride. If no stride is given, set the value to the kernel size, so that non-overlapping windows are used for pooling.
   
<br>

In the `forward` method, store the received inputs for gradient computation in the backward pass. Apply max pooling to the input using the given kernel size and stride, then store the result in the `out` variable that is returned from the method.
   
<br>

In the `backward` method, compute the gradient of the loss with respect to the inputs. The method receives the gradient of the loss with respect to the layer output. Store the gradient of the loss with respect to the layer inputs in the `in_grad` variable that is returned from the method.

Your implementation should be at least partly vectorized, that is, you're allowed to use at most *four* loops each for the forward and backward pass.

<br>

#### 4.1.1 Test

To test your implementation, run the following code cell.

In [None]:
MaxPool_test()