# Convolution

In deep learning, where images are often used, more advanced mathematical operations such as convolution are applied. Convolution represents an operation between two functions that expresses how the application of one function affects the shape of the other. In the context of deep learning and image processing, convolution enables the detection of features (lines, edges, higher-level structures).

Before we look at how to apply convolution, let's focus on how images are represented in computers:
1. **How are images encoded and stored in computers?**  
2. **When training neural networks, it is often necessary to normalize input data to a certain range (most commonly 0 - 1). How can data representing images be normalized for deep neural networks?**


In deep neural networks, convolution is performed by applying a filter to an image (the filter dimensions are usually smaller, and both often have a square shape). The computation is then carried out by element-wise multiplication of two "matrices" followed by summing the products. The output will be another matrix, with its dimensions determined by additional convolution parameters such as *padding* and *stride*.

The simplest case of applying convolution is demonstrated in the following example, where the input image has a size of *5 x 5* and the filter has dimensions of *3 x 3*:

![Convolution example](lab02/conv_example_1.jpg)


We start applying the filter in the top-left corner:

![Convolution step 1](lab02/conv_first_step.jpg)

where the result of the operation will be:

\begin{equation*}
3 \cdot 0 + 2 \cdot 1 + 1 \cdot 2 + 0 \cdot 2 + 4 \cdot 2 + 2 \cdot 0 + 0 \cdot 0 + 2 \cdot 1 + 3 \cdot 2 = 20
\end{equation*}


In the next step, we move the filter to the right (according to the *stride*, for now by 1), and if we reach the end of the row, we move the filter one step down and return it to the beginning of the row. We continue this way until we reach the bottom-right corner of the image.

**Calculate the remaining values after applying the filter (the output will be a 3 x 3 matrix).**


## Convolution - Real Application

To better understand the essence and usefulness of convolution, let's look at an example of applying a filter to detect horizontal edges. In the early development of convolutional neural networks, it was precisely about adding existing filters to the neural network. However, deep learning later enabled the training of the filters themselves. **What will be the result of applying the filter to the image in the following case? What effect does it have on the image? How is the detected feature, i.e., the edge, represented?**

![Convolution edge](lab02/conv_edge.jpg)


## Pooling

In convolutional networks, besides convolution itself, the pooling operation is also often used, with the aim of reducing the dimensions of the input image while preserving the information contained in it. During pooling, we traverse the image in a similar manner as in the case of convolution, but in pooling, we do not use a filter—only a small "window" from the image—and compute the output based on the specific type of pooling. There are different types of pooling, and today we will look at the two most common:

- **max pooling** - selects the maximum value from the window  
- **average pooling** - calculates the average of the values in the window.  

**Apply both types of pooling with dimensions *2 x 2* and a *stride* of 1 to the previous examples. What differences do you observe in the results?**


# Implementation of Convolution and Pooling

In this exercise, we will implement the basic operations of convolutional neural networks, namely convolution and *pooling*. We will program in the Python programming language. For the implementation, we will use the `numpy` library, which is very commonly used in deep learning projects, and other libraries such as `tensorflow` and `pytorch` are built on top of it.


## 1. Solution Structure

[You can find the skeleton of today's solution here](lab02/lab02.py) or you can download this Jupyter notebook and work directly in it.

At the beginning, we will import the `numpy` library and define a few sample examples for images and filters:


In [None]:
import numpy as np


image1 = np.array([
    [3, 2, 1, 2, 0],
    [0, 4, 2, 0, 1],
    [0, 2, 3, 1, 1],
    [1, 3, 4, 0, 0],
    [2, 1, 2, 1, 0]
])

filter1 = np.array([
    [0, 1, 2],
    [2, 2, 0],
    [0, 1, 2]
])

image2 = np.array([
    [1, 1, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 1],
    [0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0]
])

filter2 = np.array([
    [1, 2, 1],
    [0, 0, 0],
    [-1, -2, -1]
])

The script declares six functions that we will implement in the following steps—three helper functions and three containing the implementation of convolution, max pooling, and average pooling. You can find their detailed description in the comments, but we can summarize the purpose of these functions as follows:

- `get_result_array` - creates an empty n-dimensional array with values initialized to 0. Its task is to calculate the dimensions of the result map after applying convolution or *pooling*.
- `get_padding_value` - calculates the required padding based on the parameters of convolution/*pooling*.
- `get_padded_image` - adds *padding* to the original input image according to the *padding* value.
- `convolve` - performs convolution with the given parameters.
- `maxPool` - performs *max pooling* with the given parameters.
- `avgPool` - performs *average pooling* with the given parameters.


## 2. `get_result_array`

Before we dive into implementing the convolution and *pooling* operations, we need to understand the role of the *stride* and *padding* parameters.

*Stride* represents how far we move the convolution or *pooling* filter by one step of computation. It most commonly has a value of 1, especially when using *padding*. In today's exercise, we will assume its value is 1; later, you can extend your solution to support other *stride* values.

*Padding* is the adjustment of the input image before applying convolution or *pooling*. Although there are several variants, the most commonly used is *zero padding*, which means adding zero values to the edges of the image in all directions. Of the possible ways to use *padding*, today we will implement two: *no padding* (i.e., we do not apply *padding* to the image at all) and *same padding*, which means adding the same number of zeros in every direction so that the filter fits into the dimensions of the expanded image without ignoring any part of the image.

If the above assumptions hold, we can calculate the dimensions of the output feature map using the following formula:

\begin{equation*}
m_{d} = \frac{I_{d} - k_{d} + 2 \cdot P}{S} + 1,
\end{equation*}

where $m$ is the map, $I$ is the image, $k$ is the filter (or *kernel*), $P$ is the *padding* value (the number of added zeros), $S$ is the *stride*, and $d$ is the dimension (width or height). However, since we very often use square images in convolution, this equation applies to both dimensions simultaneously.

**Implement the `get_result_array` function to return an n-dimensional array of values initialized to 0.** The input parameters of the function have the following meanings:

- `image_shape` - a pair of integer values representing the dimensions of the input image (height x width) ($I$)  
- `kernel_shape` - a pair of integer values representing the dimensions of the filter (height x width) ($k$)  
- `stride` - an integer value representing the shift after applying the operation ($S$)  
- `P` - *padding* value, an integer ($P$)  

In the function, you should perform the following steps:

1. Calculate the dimensions of the output feature map according to the above formula.  
2. Check if the resulting dimensions are integers; if not, raise an appropriate error.  
3. The function should return a two-dimensional array of zeros with the calculated dimensions (height x width).


In [None]:
def get_result_array(image_shape, kernel_shape, stride, P):
    return None

## 3. `get_padding_value`

In the next step, we will implement the `get_padding_value` function, which will calculate the *padding* value, i.e., the number of zeros added according to the given parameter settings. As mentioned in point 2, today we will use only two types of *padding*: *no padding* and *same padding*. In the function, we also need to check whether the parameter settings result in a valid operation when using *padding*.

In the case of *no padding*, we do not need to add zeros, so the *padding* value is zero. For *same padding*, we will assume that *stride* is 1, and then the *padding* value can be calculated using the formula (derived from the above formula, where $m = I$):

\begin{equation*}
P_{d} = \frac{k_{d} - 1}{2},
\end{equation*}

where $P_{d}$ is the *padding* value in dimension $d$ and $k_{d}$ is the filter size in that dimension. If we are working with a square filter, the *padding* value will be the same in both directions.

**Implement the `get_padding_value` function to calculate the *padding* value and also check if all parameter constraints are met:**

- The `padding` parameter can take the values `'none'` and `'same'`.
- In the case of *same padding*, the `stride` parameter must have a value of `1`.

The function will return a single integer and has the following parameters:

- `kernel_shape` - a pair of integer values representing the dimensions of the filter (height x width) ($k$)
- `stride` - an integer value representing the shift after applying the operation ($S$)
- `padding` - a string with the value `'none'` for *no padding* or `'same'` for *same padding*


In [None]:
def get_padding_value(kernel_shape, stride, padding):
    return 0

## 4. `get_padded_image`

Ďalšia funkcia vykoná *padding* nad obrázkom, t.j. pridá potrebný počet 0 vo všetkých smeroch, pôvodný obrázok sa nachádza potom v strede rozšíreného obrázka. Funkcia má dva parametre:

- `image` - pôvodný obrázok
- `P` - *padding* hodnota, počet pridaných núl.

**Implementujte funkciu `get_padded_image`, ktorá vráti numpy pole s rozšíreným obrázkom. Pôvodnú premennú nemeňte.**

In [None]:
def get_padded_image(image, P):
    return image

## 5. `convolve`

In this step, we will implement convolution. To preprocess the image and prepare the output feature map, we can use the helper functions we have already created; our only task is to fill the prepared feature map with values by applying the convolution. The function has four parameters:

- `image` - an n-dimensional array representing the input image
- `kernel` - an n-dimensional array representing the convolution filter
- `stride` - an integer value representing the shift after applying the operation
- `padding` - a string with the value `'none'` for *no padding* or `'same'` for *same padding*

**Implement the `convolve` function so that it returns the feature map. Do not modify the original variables `image` and `kernel`.**

After implementation, you can test your solution with predefined examples or with an image of your choice (make sure to adjust the parameter values in the `main` function).


In [None]:
def convolve(image, kernel, stride=1, padding='none'):
    return image

## 6. `maxPool` and `avgPool`

In the final step, we need to implement the functions for *pooling*. In the script, you have two declared functions for *max pooling* and *average pooling*, and their structure is very similar to the convolution solution. The only difference is in the calculation of values in the feature map. Both functions have four parameters:

- `image` - an n-dimensional array representing the input image
- `kernel_size` - a pair of integer values representing the dimensions of the input image (height x width)
- `stride` - an integer value representing the shift after applying the operation
- `padding` - a string with the value `'none'` for *no padding* or `'same'` for *same padding*

**Implement and test the `max_pool` and `avg_pool` functions so that they return the feature map after applying the *pooling* operation. Do not modify the values of the input variables.**


In [None]:
def max_pool(image, kernel_size, stride=1, padding='none'):
    return image


def avg_pool(image, kernel_size, stride=1, padding='none'):
    return image

## Additional Tasks

1. The example problems used grayscale images. Modify your solution to support working with images that have multiple color channels.
2. Modify the *padding* so that, in the case of an odd number of zeros that need to be added, the input image is primarily extended to the right and downward.
3. Modify the *padding* calculation so that it supports values of *stride* other than 1.


## Used Sources and Additional Links

- [A guide to convolution arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf)
- [Gentle Dive into Math Behind Convolutional Neural Networks](https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9)
- [Convolution Image Size, Filter Size, Padding and Stride](https://jamesmccaffrey.wordpress.com/2018/05/30/convolution-image-size-filter-size-padding-and-stride/)