![](https://www.domsoria.com/wp-content/uploads/2019/11/keras.png)

# Convolution (again)

__Why we use convolution to analyse images?__

Consider a very famous case of CNN architecture: [LeNet-5](https://en.wikipedia.org/wiki/LeNet)
![](https://miro.medium.com/max/4308/1*1TI1aGBZ4dybR6__DI9dzA.png)

Consider the first layer.


<img src="images/LeNet5-1.png" style="width:500px;height:500px;">

__Question 1__

> How many weights has this first convolutional Layer?


_Answer_

$$\left(k \times k \times n_C + b \right) \times n_f = \left(5 \times 5 \times 3 + 1\right) \times 6 = 456$$

![](images/LeNet5-weights.png)


__Question 2__

> If instead of convolution, we use the classical dense (Fully Connected) layers, with $32\times 32 \times 3 = 3072$ inputs and $28\times 28 \times 6 = 4704$ outputs how many weights we would need?


_Answer_

$\sim 14$ million of weights.

**Note** The whole _LeNet-5_ network has $\sim 6 \cdot 10^4$ weights.


## Other Reasons to use Convolution

### Parameter sharing 

Convolution allows to use shallower layers to detect low-complexity features, and deeper layers to encode more complex skills. This makes the shallow layers feature detectors shared in different part of the same image, later in depth.

### Sparsity of connections

In each layer, each output depends only on a small number of inputs.


Both these properties are numerically useful, because they make CNN have few parameters with respect to a Dense (plain) neural network.

## Other important CNN classical configurations

1. [LeNet-5](https://www.mitpressjournals.org/doi/10.1162/neco.1989.1.4.541): [Keras implementation](https://github.com/TaavishThaman/LeNet-5-with-Keras)
![](https://miro.medium.com/max/4308/1*1TI1aGBZ4dybR6__DI9dzA.png)
2. [AlexNet](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf): [Keras implementation](https://github.com/duggalrahul/AlexNet-Experiments-Keras)
![](https://andreaprovino.it/wp-content/uploads/2020/02/alexnet-architecture-deep-learning-engineer-italia-cnn-network-example-architecture-diagram.png)
3. [VGG-16](https://arxiv.org/abs/1409.1556): [Keras implementation](https://github.com/keras-team/keras-applications/blob/master/keras_applications/vgg16.py)
![](https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png)

## Residual Networks

A really important computational improvement in neural networks is called __Residual Block__. This is in principle an application working in any kind of network, not only in CNN, but because of the amount of data involved in computer vision, these networks are incredibly important to perform complex tasks with sustainable costs.

![](https://cdn-images-1.medium.com/max/1000/1*aqmUx_ONo8KqKNEYsjM8eA.png)

### The working principle

Consider the following case.

![](images/ResBlock.png)

In a classic plain (fully connected) NN we have information flowing from $a^{[l]}$ to $a^{[l+2]}$ through the following path

$$a^{[l]} \rightarrow z^{[l+1]} \rightarrow \mathrm{ReLU} \rightarrow a^{[l+1]} \rightarrow z^{[l+2]} \rightarrow \mathrm{ReLU} \rightarrow a^{[l+2]}$$

such a scheme is called __main path__.

In a __residual block__ we modify the path as follows

![](images/ResConnection.png)

One copies the value $a^{[l]}$ and pass it to the following activation before applying the activation function.

The _feedforward equation_ gets modified as 

$$a^{[l+2]} = \varphi(z^{[l+2]} + a^{[l]})$$ 

![](https://miro.medium.com/max/1400/1*kanYOsFl0MmaPk5ZWDjJmw.png)

A Residual Network is a concatenation of several residual blocks.

![](https://miro.medium.com/max/1400/1*uyXEvYztiv3fGGCCPbm8Jg.png)
<center> When x and x_shortcut have the same shape. <center>

![](https://miro.medium.com/max/1400/1*U5wkA4O1IpY-ekXqFh0tUQ.png)
<center> When you need to reshape x and x_shortcut. <center>

### Why is this useful?

![](images/ResPerformance.png)

In this way we add two extra-layers with almost no cost in training. Thus this increases the ability to learn complex features, without affecting performances too much.

## Code implementation