# Convolutional Neural Networks 
---

Convolutional neural networks (CNNs) are a particular type of neural network architecture that's particularly well suited for working with images / computer vision tasks. 

What are some use cases / examples of computer vision you come across daily or have heard of?
+ object recognition, maybe implemented on a robot
+ facial recognition
+ self driving car / very useful for parking if you are not that great in do that
+ brand recognition
+ detecting cancer from Xrays/tomography


## Warmup

Spend some time looking at [this explanation of Image Kernels](https://setosa.io/ev/image-kernels/) (feel free to play around with all the settings!) and discuss the following questions:
 - What does the number at each pixel location in a greyscale image mean?
 - What is a kernel?
 - How is a kernel applied to an image?
 - How does changing the numbers in the kernel affect the output image?
 - What features of an image can we highlight using a kernel?

Let's imagine we have this very simple 5x5 image of a face. Do you see the eyes, the nose and the big smile?

<img src='input_image.jpeg' width=600>

_Kernel/filter_ is one of the main concepts in CNNs. Let's for example take this 3x3 image to be out filter.

<img src='filter.jpeg' width=600>

We _convolve_ the filter with our image by sliding the 3x3 filter image over the 5x5 image and in each position we multiply the overlapping "pixels", sum them up and take the sum as a value of the resulting image in that position. We then move the filter one cell to the right and repeat the process. When we're done with the first row, we move to the second row, etc... Let's calculate a couple of values together and then you'll do the rest on your own.

<img src= 'Convolution_operation.jpeg' width=600>

<img src='cnn.gif' width=600>

 Now continue on your own until you have covered your whole original image with your filter.

<img src='feature_map.jpeg' width=500>

<img src='vertical_edge_example.png'>

What do you notice about the resulting image:
* convolved image has smaller dimensionality
* extracted the features

**Q**: What is the size of the resulting image, if the size of the original image is `n x n` and the size of the filter is `f x f`?

**A**: `n-f+1`

You just ran your first edge-detection algorithm! 

That is basically how convolutional neural networks work in very simple terms.

It is important to understand that in the CNN, the kernels in the layers are determined during training using backpropagation. We do not tell the network what they are, it learns on its own! This can also mean that some features that it detects are not really interpretable by humans.

So, we don't specify the filters for them to use, filters are the parameters (weights) that they learn (your `w`s).




Let's now look at a couple of other concepts that are important for CNNs.

#### Padding

When performing a convolution on an image, we end up losing the pixels on the edge. In many cases, this is not a huge issue, as we expect the important information (e.g. the face or the object we are trying to detect and classify) to be somewhere nearer the middle. Also smaller images mean fewer parameters for the network, ie. less computational cost.

However if we want to keep the outside pixels after all, we can use **padding**:
 - we add a single layer of pixels around the outside of the original image in order to extend it.
 - this new, slightly larger image is now convolved with the kernel
 - **the output is the same size as the original image**
 
The two most common kinds of padding are:
 - **same** - the fake new pixels are set to zero (it preserves image size)
 

A term often seen in the context of CNNs is:
 - **no padding** - `padding = valid`

No padding is called **valid** because only valid pixels (i.e. those present in the original image) are used.

![same_padding_no_strides.gif](same_padding_no_strides.gif)

**Q**: What is the size of the resulting image, if the size of the original image is `n x n`, the size of the filter is `f x f`, and we use `p` pixels of padding on each side?

**A**: `n-f+1+2p`

#### Stride

Stride value/step controls how the filter moves around and covers the original image. Sometimes you may want to "skip" around the original image and not slide pixel by pixel.

![padding_strides.gif](padding_strides.gif)

[(Source)](https://github.com/vdumoulin/conv_arithmetic)

**Q**: What is the size of the resulting image, if the size of the original image is `n x n`, the size of the filter is `f x f`, we use padding of `p` and stride of `f`?

**A**: `(n-f+2p)/s + 1`

#### Color (RGB) image

In the case of a color (RGB) image we start with three filters of the same type corresponding to each color channel, perform convolution like before, and then sum up three channels to form the resulting image that is now 4x4x1. 

(For filters of the same type we mean that they are specialized in doing the same task, e.g, in extracting horizontal edges)

#### One layer of CNN

![rgb.jpeg](rgb.jpeg)

In CNNs we generally use more than one filter in each layer, so the third dimension of the resulting image will be equal to the number of filters used.

![rgb_2.png](rgb_2.png)

After the operation of convolution, we add a different bias to each obtained feature map and then apply on them an activation function. 

How to calculate the number of parameters to be learned in a Convolutional layer?

$n_{parameters} = (f*f*n_{channels}) * n_{different/filter} + n_{bias}$

where:
+ $n_{bias} =  n_{different/filter}$
+ $n_{channels}$ corresponds to the number of channels of the convolved image/feature map 

So for the above example 
$(3*3*3)*2 + 2$



In diagrams of CNNS, this is usually represented in this way:

![one_layer.png](one_layer.png)

[(Source)](http://datahacker.rs)

#### Pooling

To make our feature space even smaller, we can reduce the convolved images further by **pooling**, where we take some numeric representation for subsets of the convolved image. This reduces the resolution of the image, and in fact makes it more robust against slight differences in locations of features. Pooling is also known as **subsampling**.

In the examples below we have:
- **max pooling** where we take 2x2 windows ($f=2$) of our convolved image, take the **maximum value of each** and **return an even smaller image**. We also have a **stride** of 2 ($s=2$), i.e. we don't have overlapping subsections.
    - max pooling ends up highlighting the most strongly present features in the sections of the image 
- **average pooling** where we take the **mean value** of each 2x2 window.
    - average pooling shows the presence of a feature on average in a section of the image


In addition to convolutional layers that we saw above, in CNN we also have _pooling layers_. Pooling accumulates/pools together the features created in convolutional layers.

Max pooling, the most common pooling method, selects only the highest value in a given pooling window.

![max_pooling.png](max_pooling.png)

Let's pull (pool, heh) everything we've learned and look at an example convolutional neural network.

#### Example CNN

The typical architecture of CNNs has a couple of convolutional layers, followed by _fully connected_ layer, i.e. feed forward NN.

In some of the most famous CNN architectures it is common to alternate convolutional and pooling layers, like this. Shown below is one of the classical CNN architecture, LeNet-5.

<img src='cnn_with_poolin.jpeg' >

#### Why convolutions?

Let's look at the first layer of this last network only. What is the number of parameters we have here?


For convolutional neural network:
* `(5*5*3 +1)*6`

The use of filters:
* lowers the number of parameters
* makes sense because a filter used for detecting a particular feature could be useful in multiple areas of an image

Pixels close by are related:
* flattening the image would lose some of this information
* pixels far away aren't: we don't need to connect each of the pixels of the input with the each of the pixels of the output

## CNNs in Keras

In [4]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from tensorflow.keras import backend as K
import tensorflow.keras as tk
from tensorflow.keras.datasets import mnist, fashion_mnist,cifar10
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, Flatten, BatchNormalization,Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

from matplotlib import pyplot as plt
import numpy as np
# import pandas as pd

ImportError: Matplotlib requires numpy>=1.20; you have 1.19.5

### Preprocess the data:

In [None]:
# Load data:
(xtrain,ytrain),(xtest,ytest) = mnist.load_data()

In [None]:
# Check shapes:
xtrain.shape, ytrain.shape, xtest.shape, ytest.shape

In [None]:
# To feed the images in a CNN we need to reshape our X data to the format 
# (batch/sample, width, heigth, channels):
# The mnist data are black and white images so we need just 1 color channel

Xtrain = xtrain.reshape(60000, 28, 28,1)
Xtest = xtest.reshape(10000, 28, 28,1)

In [None]:
# Always keep checking shapes:
print('Xtrain shape:', Xtrain.shape)
print(Xtrain.shape[0], 'train samples')
print(Xtest.shape[0], 'test samples')
print(Xtrain[0].shape, 'image shape')
print('ytrain shape:', ytrain.shape)

In [None]:
# plot the train images  with the labels
plt.figure(figsize=(16,16))
for i in range(25):
    plt.subplot(5, 5, i+1,title=f'Number: {ytrain[i]}')
    plt.imshow(xtrain[i],cmap='gray')
    plt.axis('off')

In [None]:
ytrain

In [None]:
# The labels are 10 numbers (0...9) we transform it into 10 classes 
# ytrain, ytest from number to categorical/dummies

ytrain_cat = to_categorical(ytrain)
ytest_cat = to_categorical(ytrain)

In [None]:
ytrain_cat

In [None]:
# from categorical to number
np.argmax(ytrain_cat,axis=1),np.argmax(ytest_cat,axis=1)

In [None]:
print('After preprocessing also y:\n')
print('Xtrain shape:', Xtrain.shape)
print(Xtrain.shape[0], 'train samples')
print(Xtest.shape[0], 'test samples')
print(Xtrain[0].shape, 'image shape')
print('ytrain cat shape:', ytrain_cat.shape)

#### LeNet-5 in Keras

Implement LeNet-5 architecture from above. 

Use `relu` activation function for convolutional and fully-connected (dense) layers, and `softmax` for the output layer.

Q: How do we deal with the fact that our images (28x28) are smaller than the input images in LeNet-t (32, 32)?

Links to layers documentation:
+ [Conv2D](https://keras.io/api/layers/convolution_layers/convolution2d/)
+ [MaxPooling2D](https://keras.io/api/layers/pooling_layers/max_pooling2d/)
+ [Flatten](https://keras.io/api/layers/reshaping_layers/flatten/)
+ [Dense](https://keras.io/api/layers/core_layers/dense/)
+ [Weight Initializers](https://keras.io/api/layers/initializers/)
+ [Activation Functions](https://keras.io/api/layers/activations/)


<img src='cnn_with_poolin.jpeg' width=600 heigth=600>

In [None]:
### Implement LeNet-5 architecture 
K.clear_session()
model = Sequential([
    
    ### Convolutional layers
    Conv2D(filters = ..., kernel_size = (...,...), 
           strides = (...,...), padding = ... , 
           activation = tk.activations... ,input_shape = (28,28,1)),  # use relu
           
    
    MaxPooling2D(pool_size = (...,...), strides=(...,...), padding = ...),
    
    Conv2D(filters = ..., kernel_size = (...,...), 
           strides = (...,...), padding = ... , activation = tk.activations...) ,  # use relu
            
    
    MaxPooling2D(pool_size = (...,...), strides=(...,...), padding = ...),
    
    ### Fully connected layers for classification
    
    Flatten(),
    
    Dense(units=..., activation= tk.activations...),  # use relu
    
    Dense(units=..., activation=tk.activations...), # use relu
          
    
    
    # Output layer
    Dense(units=..., activation=tk.activations...)
    
])

In [None]:
# Check the summary:
model.summary()

In [None]:
# Compile
model.compile(optimizer=tf.optimizers.Adam(), # Adam
              loss=tf.losses.categorical_crossentropy , # multiclass: 'categorical_crossentropy' 
              metrics=['accuracy'])

In [None]:
# Train the model
history = model.fit(Xtrain, ytrain_cat, epochs=20,
                    batch_size=60,
                    validation_split = 0.2,
                   )

### Examine the model

In [None]:
history.history

In [None]:
# Plot loss and accuracy curves
pd.DataFrame(data=history.history).plot()
plt.grid(True)
plt.xlabel('epochs')


What do you observe in the loss and the accuracy curves? <br>
Do you see overfitting? If this is the case, try to use some regularization
+ [BatchNormalization](https://keras.io/api/layers/normalization_layers/batch_normalization/)
+ [Dropout](https://keras.io/api/layers/regularization_layers/dropout/)
    

In case you overcome the overfitting problem, you can now try to increase the hyperparameter epochs  in the `model.fit()`. Use the 
`EarlyStopping(monitor='val_loss', patience=5)` as a callback ([link](https://keras.io/api/callbacks/early_stopping/)).
 Remenber to set the hyperparameter callbacks in the `model.fit(...,callbacks=[callback])`

**further ideas to examine the model**
+ decode y_pred and ytest 
    + look above at the cell  `from categorical to number` 
+ calculate the confusion matrix and plot it with seaborn.heatmap() 

## (Bonus if there is time) Visualisation of a CNN:

There is an excellent visual explanation of how a CNN works [here](http://www.cs.cmu.edu/~aharley/nn_vis/cnn/2d.html).

(Credit: A. W. Harley, "An Interactive Node-Link Visualization of Convolutional Neural Networks," in ISVC, pages 867-877, 2015)

## References:

- [Nice visual explanation of convolutions on TDS](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)
- [Another good guide to CNNs](https://towardsdatascience.com/beginners-guide-to-understanding-convolutional-neural-networks-ae9ed58bb17d)
- [A very comprehensive paper on CNN arithmetic](https://arxiv.org/pdf/1603.07285.pdf)
- [Great 15 minute video introduction to CNNs](https://www.youtube.com/watch?v=py5byOOHZM8)
- [Explanation of pooling layers](https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/)
- [Keras API documentation on Pooling Layers](https://keras.io/api/layers/pooling_layers/)
- [GitHub Page for A. W. Harley's visualisation of a CNN](https://github.com/aharley/nn_vis)
- [Good, in-depth paper about CNNs]((http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)
- Some information and examples on [Batch Normalisation in CNNs](https://www.baeldung.com/cs/batch-normalization-cnn)
- [Categorical Cross Entropy](https://gombru.github.io/2018/05/23/cross_entropy_loss/)