# Week 10: Convolutional Neural Networks

<hr style="border:2px solid gray">

# Outline

1. [Section One: Introduction to convolutional neural networks](#section-one-introduction-to-convolutional-neural-networks)
1. [Section Two: Convolutional neural networks in PyTorch](#section-two-convolutional-neural-networks-in-pytorch)
1. [Section Three: A convolutional neural network example with Fashion MNIST](#section-three-a-convolutional-neural-network-example-with-fashion-mnist)
1. [Section Four: The CIFAR-10 dataset](#section-four-why-not-normal-neural-networks)
1. [Section Five: Exercises](#section-five-exercises)

<hr style="border:2px solid gray">

# Section One: Introduction to convolutional neural networks [^](#outline)

So far, we have tackled problems we can solve with linear layers and nonlinear activation functions. Let us consider an image classification problem, for example using the MNIST dataset (which is a dataset of 28 by 28 grayscale images). 

If we consider a binary classification problem such that we have a single output, we flatten our 28 by 28 image input into a 784 length input vector, and we construct a fully connected network with a single hidden layer of 512 neurons, we have a total number of weights of $784 \times 512 + 512 \times 1 = 401920$. For just one hidden layer we already have more 400000 parameters! As we construct deeper networks, our number of parameters quickly grows out of hand for any image classification problem. 

If we now consider the same network construction for a full HD image that is 1920 by 1080, we find we have over a billion parameters for this network. If we consider a colour image,  Clearly, it will very quickly become unfeasible to train a standard neural network for image classification tasks. Because of how many parameters we have, we are likely to overfit when we train such a network. How can we solve this problem? 

When we look at an image, a given pixel is likely to share structure with other pixels near it in the image. We can use this to inform how we structure our model; we want to somehow use this shared **local** structure to develop a good classifier. 

While our standard neural networks are made of linear layers and activation functions, convolutional neural networks add a new type of layer: convolutional layers. 



## Convolutional layers

You will be familiar with convolutions from previous courses on Fourier transforms, but we will briefly review the fundamental concepts before we discuss how they are applied in machine learning.

A convolution between two functions produces a third function that expresses how the shape of one of the functions is modified by the other. The convolution of two functions $f$ and $g$ can be written as 

\begin{equation*}
(f * g)(t) = \int_{-\infty}^\infty f(t - \tau) g(\tau) d\tau,
\end{equation*}
where $\tau$ is a dummy variable introduced for the integration. We can see this as "sliding" the function $f(t - \tau)$ over the function $g(t)$ as we vary $\tau$ in the integration. 

If instead our functions $f$ and $g$ are functions of discrete variables, we can write this expression as a sum instead of an integral:

\begin{equation*}
(f * g)(t) = \sum_{\tau = -\infty}^\infty f(t - \tau) g(\tau)
\end{equation*}


When we apply convolutions in convolutional neural networks, we refer to the function $g(\tau)$ as the input, and the function $f(t - \tau)$ as the **kernel**. The output function $(f * g)(t)$ can be referred to as the **feature map**. Typically, our input is some multidimensional array of data (for example, a colour image with 3 colour channels corresponding to red, green and blue) and the kernel is some multidimensional array of parameters which are learned in the training process. 

Because our inputs are in general finite numerical arrays, we can instead of taking the sum to $-\infty$ and $\infty$ we can instead sum over a finite number of array elements. We also often use convolutions over more than one axis at a time. For example, for a two-dimensional image $I$ as the input and a two-dimensional kernel $K$, we can write the convolution as

\begin{equation*}
S(i, j) = (I * K)(i, j) = \sum_m \sum_n I(m, n) K(i - m, j - n)
\end{equation*}

To put it simply when we apply a kernel of size $N \times M$:

* We take all of the pixels in an $N \times M$ grid around a point in the image and performs a weighted sum, where the kernel contains the weights we use
<br>

* Then, move the kernel one pixel over and take another weighted sum (with the same weights) of the new set of pixels. 
<br>

* You repeat this process for the whole image 

The resulting array is called the feature map and effectively encodes relationships between pixels that are close together, depending on the parameters in the kernel. 

The figure below is an animation showing how we build a feature map from a $4 \times 4$ input and a $2 \times 3$ kernel:

<img src='Week10_plots/conv.gif' align='center' width = 800>

## Features of convolutional layers



Convolutional layers learn from data by changing the parameters kept in the kernel. This results in a few key features for this type of network:

1. Sparse connectivity
1. Re-use of parameters
1. Translational invariance

We will discuss each of these in turn.



### Sparse connectivity

Consider a $m \times n$ linear layer. All $m$ input values are connected to all $n$ output values, resulting in an array of weights you have to learn that is $m \times n$ in size. In contrast, for a convolutional network, the weights array we learn in a given layer is just the size of the kernel; e.g. for a kernel of shape $i \times j$, even if we have $m \gt i,\, j$ inputs, we only have to learn $i \times j$ parameters for that kernel. This greatly reduces the number of  weights we have to learn, to get the same sized input/output.



<hr style="border:2px solid gray">

# Section Two: Convolutional neural networks in PyTorch [^](#outline)

<hr style="border:2px solid gray">


# Section Three: A convolutional neural network example with Fashion MNIST [^](#outline) 

<hr style="border:2px solid gray">

# Section Four: The CIFAR-10 dataset

<hr style="border:2px solid gray">

# Section Five: Exercises