<a href="https://colab.research.google.com/github/foroughkarandish/Convolutional-Neural-Network-Architecture/blob/master/Convolutional_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project is not completed yet and I am thinking about the best way to explain and implement it. Stay tuned for updates!!!

In this project I will explain CNN and use them to build an image classifier.

**Table of Contents**

1. [Introduction](#1)<br>
2. [Load Packages](#2)<br>
    A. [Import](#21)<br>
    B. [Setup](#22)<br>
    C. [Version](#23)<br>
3. [CNN Architecture](#43)<br>
  A. [Convolutional Layers](#31)<br>
  B. [Spatial arrangement](#32)<br>
  C. [Pooling layers](#33) <br>
  F. [Fully connected layer](#34) <br>
  G. [A simple convolution network example](#37) <br>
  H. [Pooling layers](#38) <br>
  I. [Convolutional neural network example](#39) <br> 
  J. [Why convolutions?](#399) <br>
5. [Model developement](#5)<br>
    A. [Keras](#51)<br>   
6. [References](#6)<br>

<a id="1"></a>
## 1. Introduction

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.<br>


Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.<br>


CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage [1].<br>

<a id="2"></a>
### A. Load Packages

In [2]:
from sklearn import model_selection, preprocessing, metrics
from sklearn.metrics import mean_squared_error
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
from pandas import get_dummies
import lightgbm as lgb
import xgboost as xgb
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib
import sklearn
import warnings

  import pandas.util.testing as tm


<a id="22"></a>
### B. Setup

I'm setting up essential visualisation here.

In [3]:
%matplotlib inline
%precision 4
plt.style.use('ggplot')
np.set_printoptions(suppress=True)
pd.set_option("display.precision", 15)

#ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Graphics in retina format are more sharp and legible
%config InlineBackend.figure_format = 'retina'

<a id="33"></a>
### C. Version

Checking the versions of my main libraries that I'm going to use in this project.

In [4]:
print('matplotlib: {}'.format(matplotlib.__version__))
print('sklearn: {}'.format(sklearn.__version__))
print('seaborn: {}'.format(sns.__version__))
print('pandas: {}'.format(pd.__version__))
print('numpy: {}'.format(np.__version__))

matplotlib: 3.2.2
sklearn: 0.22.2.post1
seaborn: 0.10.1
pandas: 1.0.5
numpy: 1.18.5


<a id="3"></a>
## 3. CNN Architecture

A convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. The activation function is commonly a RELU layer, and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution.

Though the layers are colloquially referred to as convolutions, this is only by convention. Mathematically, it is technically a sliding dot product or cross-correlation. This has significance for the indices in the matrix, in that it affects how weight is determined at a specific index point[1].

 <a id="31"></a>
### A. Convolutional Layers

The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field, but extend through the full depth of the input volume. During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a 2-dimensional activation map of that filter. As a result, the network learns filters that activate when it detects some specific type of feature at some spatial position in the input.<br>

Stacking the activation maps for all filters along the depth dimension forms the full output volume of the convolution layer. Every entry in the output volume can thus also be interpreted as an output of a neuron that looks at a small region in the input and shares parameters with neurons in the same activation map.

<img src="https://lh3.googleusercontent.com/proxy/vwhvX8emIF533r5aLt90gF9nNyYbGwx0LAj-heVwcdRb3q3XugTymFohgtRqKLClE4MFd1pC-yQt27Wor5jhOJQi0I-qdRaukls9pI-_yq9SdoKg85JlD0IEZXiEUytrbcgLOqzByfzggzQ" width="800" height="400">

<a id="32"></a>
### B. Spatial arrangement

- **Depth:**<br>
 The depth of the output volume controls the number of neurons in a layer that connect to the same region of the input volume. These neurons learn to activate for different features in the input. For example, if the first convolutional layer takes the raw image as input, then different neurons along the depth dimension may activate in the presence of various oriented edges, or blobs of color.
- **Stride:**<br>
 Stride controls how depth columns around the spatial dimensions (width and height) are allocated. When the stride is 1 then we move the filters one pixel at a time. This leads to heavily overlapping receptive fields between the columns, and also to large output volumes. When the stride is 2 then the filters jump 2 pixels at a time as they slide around. Similarly, for any integer $S > 0 $, a stride of S causes the filter to be translated by S units at a time per output. In practice, stride lengths of $S \geq 3$ are rare. The receptive fields overlap less and the resulting output volume has smaller spatial dimensions when stride length is increased.
- **Padding:**<br>
 Sometimes it is convenient to pad the input with zeros on the border of the input volume. The size of this padding is a third hyperparameter. Padding provides control of the output volume spatial size. In particular, sometimes it is desirable to exactly preserve the spatial size of the input volume.
The spatial size of the output volume can be computed as a function of the input volume size $W$, the kernel field size of the convolutional layer neurons $K$, the stride with which they are applied $S$, and the amount of zero padding $P$ used on the border. The formula for calculating how many neurons "fit" in a given volume is given by

$$\frac{W - K + 2P}{S} +1$$

If this number is not an integer, then the strides are incorrect and the neurons cannot be tiled to fit across the input volume in a symmetric way. In general, setting zero padding to be $$ P=\frac{K-1}{2}$$ when the stride is $S=1$ ensures that the input volume and output volume will have the same size spatially. However, it's not always completely necessary to use all of the neurons of the previous layer. For example, a neural network designer may decide to use just a portion of padding.

<img src="https://miro.medium.com/max/325/1*b77nZmPH15dE8g49BLW20A.png" width="300" height="300">

<a id="33"></a>
### C. Pooling layers

Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling among which max pooling is the most common. It partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum.<br>

Intuitively, the exact location of a feature is less important than its rough location relative to other features. This is the idea behind the use of pooling in convolutional neural networks. The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters, memory footprint and amount of computation in the network, and hence to also control overfitting. It is common to periodically insert a pooling layer between successive convolutional layers (each one typically followed by a ReLU layer) in a CNN architecture.The pooling operation can be used as another form of translation invariance.<br>

<img src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png" width="500" height="500">


In addition to max pooling, pooling units can use other functions, such as **average pooling** or **ℓ2-norm pooling**.<br> Average pooling was often used historically but has recently fallen out of favor compared to max pooling, which performs better in practice.

Due to the aggressive reduction in the size of the representation, there is a recent trend towards using smaller filters or discarding pooling layers altogether.

**RoI pooling** to size 2x2. In this example region proposal (an input parameter) has size 7x5.
"Region of Interest" pooling (also known as RoI pooling) is a variant of max pooling, in which output size is fixed and input rectangle is a parameter.

Pooling is an important component of convolutional neural networks for object detection based on Fast R-CNN architecture.

<img src="https://upload.wikimedia.org/wikipedia/commons/d/dc/RoI_pooling_animated.gif" width="350" height="350">

<a id="34"></a>
### D. Fully connected layer

Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular (non-convolutional) artificial neural networks. Their activations can thus be computed as an affine transformation, with matrix multiplication followed by a bias offset (vector addition of a learned or fixed bias term).

<a id="6"></a>
## 6. References

[1]. [wiki](https://en.wikipedia.org/wiki/Convolutional_neural_network)<br>
[2]. 