# Crash Course in CNN

- **Convolutional Neural Networks** are a powerful artificial neural network technique.

- These networks preserve the spatial structure of the problem and were developed for object recognition tasks, such as handwritten digit recognition. 

- They are popular because people are achieving state-of-the-art results on difficult computer vision and natural language processing tasks.

- **Convolutional layers** and **Pooling layers** and **Dense layers** are the buidling blocks of CNNs.

- Advantages of CNNs over MLPs in the case of images:

  - They use fewer parameters (weights) to learn than a fully connected
  network.

  - They are designed to be invariant to object position and distortion in
  the scene.

  - They automatically learn and generalize features from the input domain.

## Convolutional Layers

<img src="https://i0.wp.com/developersbreach.com/wp-content/uploads/2020/08/cnn_banner.png" 
alt="drawing" width="1000"/>




### Filters

- The filters are essentially the neurons of the layer. 

- They have both weighted inputs and generate an output value like a neuron.

- The input size is a fixed square called a _receptive field_.
  
  - If the convolutional layer is an input layer, then the input patch will be pixel values.
  
  - If they're deeper in the network architecture, then the convolutional layer will take input from a _feature map_ from the previous layer.

<img src="https://www.researchgate.net/publication/316950618/figure/fig4/AS:495826810007552@1495225731123/The-receptive-field-of-each-convolution-layer-with-a-3-3-kernel-The-green-area-marks.png" 
alt="drawing" width="300"/>


### Feature Maps

- The _feature map_ is the output of one filter applied to the previous layer.

- A given _filter_ is drawn across the entire previous layer and moved one pixel at a time. Each position results in an activation of the neuron and the output is collected in the _feature map_. 

- **Stride**: If the receptive field is moved one pixel from activation to activation, then the field will overlap with the previous activation. The distance that filter is moved across the input from the previous layer each activation is referred to as the stride. 

- **Padding**: If the size of the previous layer is not cleanly divisible by the size of the filter's receptive field and the size of the stride then it is possible for the receptive field to attempt to read off the edge of the input feature map. In this case, techniques like _zero padding_ can be used to invent mock inputs with zero values for the receptive field to read.

<img src="https://miro.medium.com/max/790/1*1okwhewf5KCtIPaFib4XaA.gif" 
alt="drawing" width="300"/>

## Pooling Layers

- The pooling layers **down-sample** the previous layers feature map.

- Pooling layers follow a sequence of one or more convolutional layers and are intended to **consolidate the features learned** and
expressed in the previous layer's feature map.

- As such, pooling may be considered a technique to **compress or generalize feature representations** and generally reduce the overfitting of the training data by the model.

- Receptive field & Stride:

  Their receptive field is much smaller than the convolutional layer.
  
  The stride is often equal to the size of the receptive field to avoid any overlap. 
  
- Pooling layers are often very simple, taking
the _average_ or the _maximum_ of the input value in order to create its own feature map.

<img src="https://developers.google.com/machine-learning/practica/image-classification/images/maxpool_animation.gif"
alt="drawing" width="500"/>

## Fully Connected Layers

- Fully connected layers are the normal at feedforward neural network layer.

- These layers may have a nonlinear activation function or a softmax activation in order to output probabilities of class predictions. 

- Fully connected layers are used at the end of the network after feature extraction and consolidation has been performed by the convolutional and pooling layers.

- They are used to create final nonlinear combinations of features and for making predictions by the network.

# Best practices

- **Input Receptive Field Dimensions**: The default is 2D for images, but could be 1D such as for words in a sentence, signals in a time series or 3D for video that adds a time dimension.

- **Receptive Field Size**: The patch should be as small as possible, but large enough to see features in the input data. It is common to use 3 x 3 on small images and 5 x 5 or 7 x 7 and more on larger image sizes.

- **Stride Width**: Use the default stride of 1. It is easy to understand and you don't need padding to handle the receptive field falling off the edge of your images. This could be increased to 2 or larger for larger images.

- **Number of Filters**: Filters are the feature detectors. Generally fewer filters are used at the input layer and increasingly more filters used at deeper layers.

- **Padding**: Set to zero and called zero padding when reading non-input data. This is useful when you cannot or do not want to standardize input image sizes or when you want to use receptive field and stride sizes that do not neatly divide up the input image size.

- **Pooling**: Pooling is a destructive or generalization process to reduce overfitting. Receptive field size is almost always set to 2 x 2 with a stride of 2 to discard 75% of the activations from the output of the previous layer.

- **Data Preparation**: Consider standardizing input data, both the dimensions of the images and pixel values.

- **Pattern Architecture**: It is common to pattern the layers in your network architecture. This might be one, two or some number of convolutional layers followed by a pooling layer. This structure can then be repeated one or more times. Finally, fully connected layers are
often only used at the output end and may be stacked one, two or more deep.

- **Dropout**: CNNs have a habit of overfitting, even with pooling layers. Dropout should be used such as between fully connected layers and perhaps after pooling layers.