<a href="https://colab.research.google.com/github/jroessler/ai-im/blob/main/03_Convolutional_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 03 Notebook - Convolutional Neural Networks

The notebook is accomponying Lecture 03 - Introduction to Convolutional Neural Networks.

This and the following notebooks accompany the lecture to show you how to implement neural networks and the like using PyTorch & Co with relatively little effort. You will see that most of the concepts, techniques and functions are already available and it is often very easy to make use of them. All notebooks will be connected to [Colab](https://colab.research.google.com/notebooks/intro.ipynb) such that you can directly execute the code and play with it by yourself. We encourage you to not just execute the code but to also think about it in detail. 

Further, we implemented the code such that each subchapter is executable without the need of executing previous subchapter. Imports and helper functions might appear to be redundant.

This notebook introduces Computer Vision and Convolutional Neural Networks (CNNs). We'll first show different color spaces. Next, we'll implement different building blocks (i.e., Convolutional Operation, Padding and Stride, Nonlinearity, Pooling, Channels). Finally, we will develop a small CNN using PyTorch.

Have fun & keep coding!

Authors: Johannes Melsbach & Jannik Rößler

## 1. Color Spaces

First, let's have a look at how the computer sees images, and what different color spaces look like.

#### Imports

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import pandas as pd
import requests

#### Helper Functions

In [None]:
def plot_image(img, figsize=(4, 4)):
    """
    Plot an image
    
    :param img: Image which should be plotted
    :param figsize: Size of the figure
    """
    fig, ax = plt.subplots(1, 1, figsize=figsize)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(img, cmap='gray')

Download Images

In [None]:
!wget -q -O five.png https://raw.githubusercontent.com/jroessler/ai-im/main/img/five.png
!wget -q -O four.png https://raw.githubusercontent.com/jroessler/ai-im/main/img/four.png
!wget -q -O happy_child.jpg https://raw.githubusercontent.com/jroessler/ai-im/main/img/happy_child.jpg
!wget -q -O nine.png https://raw.githubusercontent.com/jroessler/ai-im/main/img/nine.png
!wget -q -O one.png https://raw.githubusercontent.com/jroessler/ai-im/main/img/one.png

### 1.1 Single Channel

Images with just one channel such as 1-bit monochrome or 8-bit grayscale.

*Recap*: A channel is a single basic color in an image.

#### 1.1.1 Binary image / 1-bit-monochrome

* Pixel values are either 0 (black) or 1 (white). No values in between.

In [None]:
img_raw = cv2.imread("/content/one.png")
img_gray = cv2.cvtColor(img_raw, cv2.COLOR_BGR2GRAY)  # Cast into 8-bit grayscale image
img_monochrome = (img_gray/255).astype('uint8')  # Turn into 1-bit monochrome

In [None]:
plot_image(img_monochrome)

#### 1.1.2 8-bit-grayscale

* Pixel values are between 0 (black) and 255 (white)

In [None]:
img_gray = cv2.resize(img_gray, (25,25))  # Resize to 25x25 (for illustration purposes)

In [None]:
plot_image(img_gray)

Let's plot the image with the values for each pixel.

In [None]:
df = pd.DataFrame(img_gray)
df = df.iloc[1:]  # For some reasons we have to drop the first row. Otherwise the colormap does not work properly!
df.reset_index(inplace=True, drop=True)
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('gray', axis=1)

### 1.2 Multiple Channels

Images with multiple channels such as RGB (red, gree, blue) with three channels and HSV (hue, saturatin, value) with three channels.

#### 1.2.1 BGR Color Space

In [None]:
# BGR is the default color space when reading images with cv2
img_raw = cv2.imread("/content/happy_child.jpg")
img_rgb = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)  # Cast into RGB image
img_hsv = cv2.cvtColor(img_raw, cv2.COLOR_BGR2HSV)  # Cast into HSV image

In [None]:
plot_image(img_raw, figsize=(6, 6))

#### 1.2.2 RGB Color Space

RGB color space is based on the theory that all visible colors can be created using red, green, and blue. Each pixel has three values (called RGB value), where each value is between 0 (no color) and 255 (full saturation). 

In [None]:
plot_image(img_rgb, figsize=(6, 6))

#### 1.2.2.1 Red channel

In [None]:
red = img_rgb.copy()
# set green and red channels to 0
red[:, :, 1] = 0  # Set green channel to 0
red[:, :, 2] = 0  # Set blue channel to 0
plot_image(red, figsize=(6, 6))

#### 1.2.2.2 Green channel

In [None]:
green = img_rgb.copy()
# set green and red channels to 0
green[:, :, 0] = 0  # Set red channel to 0
green[:, :, 2] = 0  # Set blue channel to 0
plot_image(green, figsize=(6, 6))

#### 1.2.2.3 Blue channel

In [None]:
blue = img_rgb.copy()
# set green and red channels to 0
blue[:, :, 0] = 0  # Set red channel to 0
blue[:, :, 1] = 0  # Set green channel to 0
plot_image(blue, figsize=(6, 6))

#### 1.2.3 HSV Color Space

Each pixel has three values: hue, saturation and value (HSV)

* Hue: Hue is the color portion of the model, expressed as a number from 0 to 360 degrees. For example, red falls between 0 and 60 degrees.
* Saturation: Saturation describes the amount of gray in a color, from 0 to 100 percent.
* Value (or brightness): Value describes the brightness of the color, from 0 to 100 percent.

In [None]:
plot_image(img_hsv, figsize=(6, 6))

## 2. Building Block: Convolutional Operation

*Recap:* 

* The convolutional operation measures the overlap between two functions, that is (informally) whether a part of the input contains a specific feature. Latter is represented by the kernel

*CNN terminology:*
* X is called the input (e.g., the image)
* W is called the kernel, sometimes called filter
* Y is the output, sometimes referred to as feature map or activation map

In PyTorch we can implement a convolutional layer which applies the convolutional operation. However, in PyTorch other operations such as nonlinearity (applying activation function) and pooling have to be applied separately using other classes.

We'll use the same example from the lecture! (see Slide #17; black bar at the left of the image + vertical edge kernel (going from darker pixels from the left to brighter pixels to the right))

#### Imports

In [None]:
import cv2
import requests
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch import nn
import torch

#### Helper Functions

In [None]:
def plot_image(img, figsize=(4, 4)):
    """
    Plot an image
    
    :param img: Image which should be plotted
    :param figsize: Size of the figure
    """
    fig, ax = plt.subplots(1, 1, figsize=figsize)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(img, cmap='gray')

In [None]:
def image_to_tensor(img):
    """
    Cast an (cv2) image into a tensor with shape (1,1,height,width)
    
    :param img: Image which should be cast into a tensor
    
    :return: Tensor
    """
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_tensor = torch.from_numpy(img_gray)
    img_tensor = img_tensor.reshape(1,1,img_tensor.shape[0],img_tensor.shape[1]).float()  #. Reshape image . PyTorch expects 4-dimensional input
    return img_tensor

### 2.1 Create Image

* 1-bit monochrome image (0 is black, 1 is wite)
* Vertical black bar at the very left of the image, rest is white

In [None]:
img_tensor=torch.zeros(1,1,4,4)  # Shape (number of samples, number of channels, height, width)
img_tensor[0,0,:,:] = 1
img_tensor[0,0,:,0] = 0
print(f"Image shape: {img_tensor.shape}")
print(f"Image: \n{img_tensor} \n")
plot_image(img_tensor.reshape(img_tensor.shape[2], img_tensor.shape[3]))

### 2.2 Create Convolutional Layer

In PyTorch we can create a convolutional layer (which applies the convolutional operation) simply by creating a Conv2d object. We only need to provide some parameters such as the number of input and output channels, the kernel size and the size of stride and padding.

In our example the parameters are the following:
* number of input channels: 1 (Number of channels in the input. Here: 1)
* number of output channels: 1 (Number of channels in the output (feature map). Here: 1)
* kernel size: 2 (Often we use homogeneous kernels with $w_h=w_w$. Thus we don't have to provide width and height but only a single integer. Here: 2)
* stride: 1 (Refers to the number of rows and columns we move the kernel. Here: 1).
* padding: 0 (Adding extra pixels around the boundary of the input image. Here: No padding. 0)

**Important**: In our example we'll use the kernel from the lecture with no bias. Usually, in PyTorch kernel and bias are randomly initialized and you don't have to modify them! This is just for illustration purposes.

Components:

* [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d): Creates a Convolutional Layer which applies the convolutinal operation
* [state_dict](https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html): Dictionary which contains the parameter of the neural network

In [None]:
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2, stride = 1, padding = 0)  # Conv2d object

"""
In practice you would not do that

PyTorch randomly initializes the kernel. In order to illustrate the example from the lecture, we have to set the kernels parameters by ourself.
"""
kernel_parameters = torch.tensor([[-1.0, 1.0],[-1.0, 1.0]])  # Define parameter
conv.state_dict()['weight'][0][0]=kernel_parameters  # Set kernel
conv.state_dict()['bias'][0]=0.0  # Set bias to zero.
"""
Until here!

In practice you would not do that
"""

print(f"Kernel shape: {conv.state_dict()['weight'].shape}")
print(f"Kernel: \n{conv.state_dict()['weight']} \n")

### 2.3 Calculate Feature Map

After creating the convolutional layer, we can forward propagate the image through this layer in order to calculate the feature map.

In [None]:
feature_map=conv(img_tensor)  # Provide the input to the conv2d object to calculate the feature map
print(f"Feature map shape: {feature_map.shape}")
print(f"Feature map: \n{feature_map} \n")
plot_image(feature_map.detach().numpy().reshape(feature_map.shape[2], feature_map.shape[3]))

### 2.4 Vertical Edge Detector in Practice

We'll use the convolutional layer from above (vertical edge detector) to illustrate how the convolutional operation measures the overlap between a vertical edge detector and different images.

We'll use a function which expects the path to the image as an input. You can call the function with different paths, to see how the feature map changes depending on the input.

In [None]:
def detect_vertical_edges(img_path):
    """
    Applies the convolutional operation with a vertical edge detector and plots the feature map.
    
    :param img_path: Path of the image
    """
    img_raw = cv2.imread(img_path)
    img_raw = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)
    img_tensor = image_to_tensor(img_raw)
    
    # Plot input
    plot_image(img_tensor.detach().numpy().reshape(img_tensor.shape[2],img_tensor.shape[3]))
    
    # Calculate feature map
    feature_map = conv(img_tensor)
    print(f"Feature map shape: {feature_map.shape}")
    
    # Plot feature map
    plot_image(feature_map.detach().numpy().reshape(feature_map.shape[2],feature_map.shape[3]))

In [None]:
# You can test these images by yourself. Alternatively, use your own images!
detect_vertical_edges("/content/one.png")
#detect_vertical_edges("/content/four.png")
#detect_vertical_edges("/content/five.png")
#detect_vertical_edges("/content/nine.png")

## 3. Building Block: Padding and Stride

*Recap:*
* Padding: Add extra pixels (typically zero) around the boundary of the input image
* Stride: Refers to the number of rows and columns we move the kernel

In practice, we rarely use inhomogeneous padding and stride, that is, we usually have $p_h=p_w$ and $s_h=s_w$

By default (at least in PyTorch), $p_h=p_w=0$ and $s_h=s_w=1$

#### Imports

In [None]:
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch import nn
import torch

#### Helper Functions

In [None]:
def plot_image(img, figsize=(4, 4)):
    """
    Plot an image
    
    :param img: Image which should be plotted
    :param figsize: Size of the figure
    """
    fig, ax = plt.subplots(1, 1, figsize=figsize)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(img, cmap='gray')

### 3.1 Create Image

* 1-bit monochrome image (0 is black, 1 is wite)
* Vertical black line at the very left of the image, rest is white

In [None]:
img_tensor=torch.zeros(1,1,4,4)  # Shape (number of samples, number of channels, height, width)
img_tensor[0,0,:,:] = 1
img_tensor[0,0,:,0] = 0
print(f"Image shape: {img_tensor.shape}")
print(f"Image: \n{img_tensor} \n")
plot_image(img_tensor.reshape(img_tensor.shape[2], img_tensor.shape[3]))

### 3.2 Create Convolutional Layer

We'll use a function which expects two parameters, namely stride and padding. You can call the function with different values for both parameters to test various values.

In [None]:
def create_conv_layer(stride=1, padding=0):
    """
    Create a convolutional layer with given stride and padding
    
    :param stride: Stride
    :param padding: Padding
    """
    conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2, stride = stride, padding = padding)

    # Again, set the parameters of the kernel to the one from the lecture
    kernel_parameters = torch.tensor([[-1.0, 1.0],[-1.0, 1.0]])  # Define parameter
    conv.state_dict()['weight'][0][0]=kernel_parameters  # Set kernel
    conv.state_dict()['bias'][0]=0.0  # Set bias to zero
    
    return conv

#### Example: Stride = 1 Padding = 1

In [None]:
STRIDE = 1
PADDING = 1

In [None]:
conv_1_1 = create_conv_layer(STRIDE, PADDING)
feature_map_1_1=conv_1_1(img_tensor)
print(f"Feature map shape: {feature_map_1_1.shape}")
print(f"Feature map: \n{feature_map_1_1} \n")

#### Tasks

**Task 1**: Stride=2, Padding=1. What's the size of the feature map? Before calculating, do the math by yourself.<br>
**Task 2**: Stride=3, Padding=2. What's the size of the feature map? Before calculating, do the math by yourself.

Remeber, the output shape is defined as:

$$(\frac{x_h-w_h+2p_h+s_h}{s_h},\frac{x_w-w_w+2p_w+s_w}{s_w})$$

with $(x_h,x_w)$ being the shape of the input, $(w_h,w_w)$ being the shape of the kernel, $s_h$ and $s_w$ being the height and width of the stride, and $p_h$ and $p_w$ being the height and width of padding

In [None]:
# Implement Task 1 and Task 2 here

## 4. Building Block: Nonlinearity

*Recap:*
* To model nonlinear relationships we need to apply nonlinear functions, called activation functions
* Activation functions are applied elementwise
* Typically, CNNs deploy piecewise activation functions i.e., ReLU or other generalizations of the ReLU activation function such as LeakyReLU.

#### Imports

In [None]:
import torch.nn.functional as F
from torch import nn
import torch

### 4.1 Create Feature Map

We'll use the example from the lecture (see S. #42)

In [None]:
feature_map = torch.zeros(1,1,4,4)
feature_map[0,0,:,0:2] = 2
feature_map[0,0,1:3,0:2] = 3
feature_map[0,0,:,3] = -2
feature_map[0,0,1:3,3] = -3
print(f"Feature map shape: {feature_map.shape}")
print(f"Feature map: \n{feature_map} \n")

### 4.2 Apply Activation Function

PyTorch provides various activation functions including ReLU and LeakyReLU. You can find all activation functions [here](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)

Further, note that there are actually **two** ways of applying an activation function in PyTorch. Both result in the same output.
1. Create an object (a nn.Module class) and call it
2. Functional (stateless) approach (torch.nn.functional). More information [here](https://pytorch.org/docs/stable/nn.functional.html)

Components:

* [torch.nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU)
* [torch.nn.LeakyReLU](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html#torch.nn.LeakyReLU)
* [torch.nn.functional.relu](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions)
* [torch.nn.functional.leaky_relu](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions)

In [None]:
nonlinear_feature_map = nn.ReLU()(feature_map)  # Create a nn.ReLU object and call it
#nonlinear_feature_map = F.relu(feature_map)  # Functional (stateless) approach
print(f"Nonlinear feature map shape: {nonlinear_feature_map.shape}")
print(f"Nonlinear feature map: \n{nonlinear_feature_map} \n")

In [None]:
nonlinear_feature_map = nn.LeakyReLU()(feature_map)  # Create a nn.LekayReLU object and call it
#nonlinear_feature_map = F.leaky_relu(feature_map)  # Functional (stateless) approach
print(f"Nonlinear feature map shape: {nonlinear_feature_map.shape}")
print(f"Nonlinear feature map: \n{nonlinear_feature_map} \n")

## 5. Building Block: Pooling Operation

*Recap:*
* The pooling operation replaces the input at a certain location with a summary statistic of nearby values.

Similar to *4. Building Block: Nonlinearity*, we can apply pooling operations in **two** ways. 

1. Creating and using a nn.Module class, or
2. Applying a stateless function. 

We'll use the nn.Module approach in the following example.

Different pooling functions (from the lecture):
* Maxmimum pooling
* Average pooling
* Global maximum pooling
* Global average pooling

#### Imports

In [None]:
import torch.nn.functional as F
from torch import nn
import torch

### 5.1 Create Nonlinear Feature Map

We'll use the example from the lecture (see S. #45,# 47, #48)

In [None]:
nonlinear_feature_map = torch.zeros(1,1,4,4)
nonlinear_feature_map[0,0,:,0:2] = 2
nonlinear_feature_map[0,0,1:3,0:2] = 3
nonlinear_feature_map
print(f"Nonlinear feature map shape: {nonlinear_feature_map.shape}")
print(f"Nonlinear feature map: \n{nonlinear_feature_map} \n")

### 5.2 Pooling

In PyTorch, we can implement pooling layers by creating a nn.Module class. For Maxmimum and Average Pooling we only need to provide the following parameters: kernel_size (in the lecture we called this parameter **windows size**) and stride. For Global Maximum and Global Average Pooling we only need to provide the size of the output (in the lecture we procuded a 1x1 output, but other outputs are possible as well)

Components:

* [torch.nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d): Maximum Pooling
* [torch.nn.AvgPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html#torch.nn.AvgPool2d): Average Pooling
* [torch.nn.AdaptiveMaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html#torch.nn.AdaptiveMaxPool2d): Global maxmimum pooling
* [torch.nn.AdaptiveAvgPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html#torch.nn.AdaptiveAvgPool2d): Global average pooling

#### 5.2.1 Maximum Pooling

* Return the maximum value within a rectangular neighborhood

In [None]:
max_pooling = nn.MaxPool2d(kernel_size=2, stride=2)
pooled_feature_map_1 = max_pooling(nonlinear_feature_map)
pooled_feature_map_1

#### 5.2.2 Average Pooling

* Return the average value within a rectangular neighborhood

In [None]:
avg_pooling = nn.AvgPool2d(kernel_size=2, stride=2)
pooled_feature_map_2 = avg_pooling(nonlinear_feature_map)
pooled_feature_map_2

#### 5.2.3 Global Maximum Pooling

* Return the maximum value for the entire feature map

In [None]:
global_max_pooling = nn.AdaptiveMaxPool2d(output_size=1)
pooled_feature_map_3 = global_max_pooling(nonlinear_feature_map)
pooled_feature_map_3

#### 5.2.4 Global Average Pooling

* Return the average value for the entire feature map

In [None]:
global_avg_pooling = nn.AdaptiveAvgPool2d(output_size=1)
pooled_feature_map_4 = global_avg_pooling(nonlinear_feature_map)
pooled_feature_map_4

## 6. Building Block: Channels

*Recap*
* CNNs can also cope with multiple input channels
* CNNs can also produce outputs (feature maps) with multiple channels

Three scenarios
1. Multiple input channels and single output channel
2. One input channel and multiple output channels
3. Multiple input channels and multiple output channels

#### Imports

In [None]:
import torch.nn.functional as F
from torch import nn
import torch

### 6.1 Mulitple (3) input channels and single (1) output channel

In [None]:
NUM_INPUT_CHANNELS = 3
NUM_OUTPUT_CHANNELS = 1

####  6.1.1 Create an Input Image with Three Channels

We'll use the example from the lecture (see S. #52)

In [None]:
image=torch.zeros(1,NUM_INPUT_CHANNELS,4,4)
# First channel
image[0,0,:,:] = 1
image[0,0,:,0] = 0
# Second channel
image[0,1,:,:] = 1
image[0,1,:,1] = 0
image[0,1,1,:] = 0
# Third channel
image[0,2,:,:] = 1
image[0,2,:,3] = 0
image[0,2,1,2] = 0
print(f"Image shape: {image.shape}")
print(f"Image: \n{image} \n")

#### 6.1.2 Create a Convolutional Layer

As we expect an **input** image with **three channels** and an **output** (feature map) with **one channel**, we will apply a kernel which has **three input channels** and **one output channel**. 

Intuitively, you can think of such a convolutional operation as learning a single feature across all the channels in the input. The problem is that the feature might be different from input channel to input channel, and thus, we apply a different feature for each input channel and summarize the overlap in the end. For example, while each channel in the kernel might be a vertical edge feature, they might differ for each channel of the input. The vertical edge feature for the red channel might be different than the vertical edge feature for the blue channel etc. Finally, to detect whether the whole image contains a vertical edge, we need to summarize the overlap between each feature and each channel of the input. Thus, we don't need a single channel kernel, but a multiple channel kernel (for each input channel a different channel in the kernel)

We only need to change the *in_channels* and *out_channels* parameters. Here, we'll set former to 3 and latter to 1.

In [None]:
conv = nn.Conv2d(in_channels=NUM_INPUT_CHANNELS, out_channels=NUM_OUTPUT_CHANNELS, kernel_size=2, stride = 2, padding = 0)

"""
In practice you would not do that

PyTorch randomly initializes the kernel. In order to illustrate the example from the lecture, we have to set the kernels parameters by ourself.
"""
kernel = torch.tensor([[[-1.0, 1.0],[-1.0, 1.0]], [[1.0, 1.0],[1.0, -1.0]], [[1.0, 1.0],[-1.0, 1.0]]])
conv.state_dict()['weight'][0] = kernel
conv.state_dict()['bias'][0] = 0.0
"""
Until here!

In practice you would not do that
"""

print(f"Kernel shape: {conv.state_dict()['weight'].shape}")
print(f"Kernel: \n{conv.state_dict()['weight']} \n")

#### 6.1.3 Calculate Feature Map

We can now use the convolutional layer from above to calculate the feature map given an input with three channels.

What are you expecting? What is the output (feature map) shape?

In [None]:
feature_map=conv(image)
print(f"Activation Map shape: {feature_map.shape}")
print(f"Activation Map: \n{feature_map} \n")

### 6.2 Single (1) input channels and multiple (3) output channel

In [None]:
NUM_INPUT_CHANNELS = 1
NUM_OUTPUT_CHANNELS = 3

####  6.2.1 Create an Input Image with One Channels

We'll use the example from the lecture (see S. #53)

In [None]:
image=torch.zeros(1,NUM_INPUT_CHANNELS,4,4)
# First channel
image[0,0,:,:] = 1
image[0,0,:,0] = 0
image[0,0,1,2] = 0
image[0,0,3,3] = 0
print(f"Image shape: {image.shape}")
print(f"Image: \n{image} \n")

#### 6.2.2 Create a Convolutional Layer

As we expect an **input** image with **one channel** and an **output** (feature map) with **three channels**, we will apply a kernel which has **one input channel** and **three output channels**. 

Intuitively, you can think of such a convolutional operation as learning *multiple, different* features in a channel simultaneously e.g., vertical edge feature, horizontal edge feature, diagonal feature etc. We measure the overlap between each feature (channel of the kernel) and the channel of the input and thus, we produce multiple channels in the feature map (for each combination of feature (channel in the kernel) and input channel we produce one channel in the feature map)

We only need to change the *in_channels* and *out_channels* parameters. Here, we'll set former to one and latter to three.

In [None]:
conv = nn.Conv2d(in_channels=NUM_INPUT_CHANNELS, out_channels=NUM_OUTPUT_CHANNELS, kernel_size=2, stride = 2, padding = 0)

"""
In practice you would not do that

PyTorch randomly initializes the kernel. In order to illustrate the example from the lecture, we have to set the kernels parameters by ourself.
"""
kernel = []
kernel.append(torch.tensor([[1.0, 1.0],[-1.0, 1.0]]))
kernel.append(torch.tensor([[1.0, 1.0],[1.0, -1.0]]))
kernel.append(torch.tensor([[-1.0, 1.0],[-1.0, 1.0]]))
for channel in range(NUM_OUTPUT_CHANNELS):
    conv.state_dict()['weight'][channel][0]=kernel[channel]
    conv.state_dict()['bias'][channel]=0.0
"""
Until here!

In practice you would not do that
"""

print(f"Kernel shape: {conv.state_dict()['weight'].shape}")
print(f"Kernel: \n{conv.state_dict()['weight']} \n")

#### 6.2.3 Calculate Feature Map

We can now use the convolutional layer from above to calculate the feature map with three channels given an input with one channel.

What are you expecting? What is the output (feature map) shape?

In [None]:
feature_map=conv(image)
print(f"Activation Map shape: {feature_map.shape}")
print(f"Activation Map: \n{feature_map} \n")

### 6.3 Multiple (2) input channels and multiple (2) output channel

It turns out to be essential to have multiple channels at each layer. In the most popular neural network architectures, we actually increase the channel dimension as we go higher up in the neural network, typically downsampling to trade of spatial resolution for greater channel depth

In [None]:
NUM_INPUT_CHANNELS = 2
NUM_OUTPUT_CHANNELS = 2

####  6.3.1 Create an Input Image with Two Channels

We'll use the example from the lecture (see S. #54)

In [None]:
image=torch.zeros(1,NUM_INPUT_CHANNELS,4,4)
# First channel
image[0,0,:,:] = 1
image[0,0,:,0] = 0
# Second channel
image[0,1,:,:] = 1
image[0,1,:,1] = 0
image[0,1,1,:] = 0
print(f"Image shape: {image.shape}")
print(f"Image: \n{image} \n")

#### 6.3.2 Create a Convolutional Layer

As we expect an **input** image with **two channels** and an **output** (feature map) with **two channels**, we will apply a kernel which has **two input channels** and **two output channels**. 

This is a combination of 6.1 and 6.2. Intuitively, you can think of such a convolutional operation as learning *multiple, different* features simultaneously (6.2) *across* multiple input channels (6.1). For example, for each input channel (2) we detect two different features (e.g., vertical and horizontal edge)

We only need to change the *in_channels* and *out_channels* parameters. Here, we'll set former to two and latter to two.

In [None]:
conv = nn.Conv2d(in_channels=NUM_INPUT_CHANNELS, out_channels=NUM_OUTPUT_CHANNELS, kernel_size=2, stride = 2, padding = 0)

"""
In practice you would not do that

PyTorch randomly initializes the kernel. In order to illustrate the example from the lecture, we have to set the kernels parameters by ourself.
"""
kernel = []
output_kernels_0 = []
# First dimension
output_kernels_0.append(torch.tensor([[-1.0, 1.0],[-1.0, 1.0]]))
output_kernels_0.append(torch.tensor([[1.0, 1.0],[1.0, -1.0]]))
kernel.append(output_kernels_0)
# Second dimension
output_kernels_1 = []
output_kernels_1.append(torch.tensor([[-1.0, 1.0],[1.0, -1.0]]))
output_kernels_1.append(torch.tensor([[1.0, 1.0],[1.0, 1.0]]))
kernel.append(output_kernels_1)

for out_channel in range(NUM_OUTPUT_CHANNELS):
    for in_channel in range(NUM_INPUT_CHANNELS):
        conv.state_dict()['weight'][out_channel][in_channel]=kernel[out_channel][in_channel]
        conv.state_dict()['bias'][out_channel]=0.0
"""
Until here!

In practice you would not do that
"""

print(f"Kernel shape: {conv.state_dict()['weight'].shape}")
print(f"Kernel: \n{conv.state_dict()['weight']} \n")

#### 6.3.3 Calculate Feature Map

We can now use the convolutional layer from above to calculate the feature map with two channels given an input with two channel.

What are you expecting? What is the output (feature map) shape?

In [None]:
feature_map=conv(image)
print(f"Activation Map shape: {feature_map.shape}")
print(f"Activation Map: \n{feature_map} \n")

## 7. Building Block: Classification

*Recap:*
* Flatten the output into a two-dimensional representation and apply (multiple) fully-connected layer(s) to produce an n-dimensional output which corresponds to the number of possible output classes

#### Imports

In [None]:
import torch.nn.functional as F
from torch import nn
import torch

####  7.1 Create a Pooled Feature Map

We'll use the example from the lecture (see S. #60)

In [None]:
pooled_feature_map = torch.tensor([[[2.5, 3.5], [2.5, 1.5]], [[0.7, 1.7], [0.7, 3.7]], [[2.3, 1.3], [2.3, -0.7]]])
pooled_feature_map = pooled_feature_map.reshape(1, pooled_feature_map.shape[0], pooled_feature_map.shape[1], pooled_feature_map.shape[2])
print("Image shape: {pooled_feature_map.shape}")
print(f"Image: \n{pooled_feature_map} \n")

####  7.2 Flatten

Transform the input such that the shape is two-dimensional

Components:

* [torch.flatten](https://pytorch.org/docs/stable/generated/torch.flatten.html)

In [None]:
flatten_pooled_feature_map = torch.flatten(pooled_feature_map, 1)
print(f"Flatten pooled feature map shape: {flatten_pooled_feature_map.shape}")

#### 7.3 Fully-Connected Layers

Let's assume you want to have one fully-connected layer which produces two output neurons. 

Note that in PyTorch fully-connected layers are called Linear Layers.

Components:

* [torch.nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear): Fully-connected layer

In [None]:
num_outputs = 2

fc = nn.Linear(in_features=flatten_pooled_feature_map.shape[1], out_features=num_outputs)

#### 7.4 Calculate Output

In [None]:
output = fc(flatten_pooled_feature_map)
print(f"Output shape: {output.shape}")
print(f"Output: \n{output} \n")

## 8. Simple CNN

Now we'll put all the building block together!

#### Imports

In [None]:
import torch.nn.functional as F
from torch import nn
import torch
from torchvision import transforms
import random
import numpy as np
import cv2
import requests

#### Set random seeds

To make PyTorch reproducable, we need to set seeds.

In [None]:
torch.manual_seed(123)
random.seed(123)
np.random.seed(123)
torch.cuda.manual_seed_all(123) 
torch.backends.cudnn.deterministic = True

### 8.1 Load image

Components

* [transforms.ToTensor](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor): Convert an array to a tensor

In [None]:
# BGR is the default color space when reading images with cv2
img_raw = img_raw = cv2.imread("/content/happy_child.jpg")
img_rgb = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)  # Cast into RGB image

tran = transforms.ToTensor() 
img_tensor = tran(img_rgb)
img_tensor = img_tensor.reshape(1,3,img_rgb.shape[0],img_rgb.shape[1])  # Keep in mind, the shape is (number of images, number of channels, height, width)

### 8.2 Create CNN

We'll implement the simply CNN from the lecture (see S. #62) using the building blocks described above. Note that the values of the parameters (within the kernels) are not the same than the one from the lecture!

In [None]:
class SimpleCnn(nn.Module):
    def __init__(self):
        super(SimpleCnn, self).__init__()
        # Create first convolutional layer
        self.conv_1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride = 2, padding = 0)
        # Create second convolutional layer
        self.conv_2 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride = 2, padding = 0)
        # Create max pooling layer
        self.pooling = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        # Fully-connected Layer
        self.fc = nn.Linear(in_features=6936, out_features=2)

    def forward(self, X):
        
        feature_map_1 = self.conv_1(X)                                          #3x277x281
        nonlinear_feature_map_1 = F.relu(feature_map_1)                         #3x277x281
        pooled_nonlinear_feature_map_1 = self.pooling(nonlinear_feature_map_1)  #3x138x140
        feature_map_2 = self.conv_2(pooled_nonlinear_feature_map_1)             #6x68x69
        nonlinear_feature_map_2 = F.relu(feature_map_2)                         #6x68x69
        pooled_nonlinear_feature_map_2 = self.pooling(nonlinear_feature_map_2)  #6x34x34
        feature_map_flattened = pooled_nonlinear_feature_map_2.flatten(1)       #1x6936
        output = self.fc(feature_map_flattened)                                 #1x2
        
        return output

In [None]:
cnn = SimpleCnn()  # Create object from class SimpleCnn

### 8.3 Forward Propagation

In [None]:
output = cnn(img_tensor)
print(output.detach().numpy()[0])

## 9. Questions

1. What is a "feature"?
2. Write out the convolutional kernel matrix for a top edge detector.
3. What is padding?
4. What is stride?
5. What is Flatten? Where is it needed?