# Deep Learning - Convolutional Neural Network

Deep learning is a class of machine learning algorithms that:[1]
* use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.
* learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners.
* learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. [2]

A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, RELU layer i.e. activation function, pooling layers, fully connected layers and normalization layers.[3]

Convolutional neural networks have been one of the most influential innovations in the field of computer vision. They have performed a lot better than traditional computer vision and have produced state-of-the-art results.[4] These neural networks have proven to be successful in many different real-life case studies and applications, like:
* Image classification, object detection, segmentation, face recognition;
* Self driving cars that leverage CNN based vision systems;
* Classification of crystal structure using a convolutional neural network;

CNNs, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.[5] A convolution multiplies a matrix of pixels with a filter matrix or ‘kernel’ and sums up the multiplication values. Then the convolution slides over to the next pixel and repeats the same process until all the image pixels have been covered. This process is visualized below. [6]
<img src='convo.jpeg'> Image from datacamp.com

## Implementation

### Setting up Keras (and Tensorflow)

Since we will be using Keras for implementing our deep learning model. We need to set it up first. Keras works on top of Tensorflow. So we start by installing Tensorflow.

* Install Tensorflow using - pip install tensorflow 
*Note - Avoid using conda to install tensorflow (pip is recommended on the tensoflow official documenation)

* Install Keras using - pip install keras or conda install keras


### Loading the Dataset

The dataset that we will be using for this tutorial is the mnist dataset by keras. This dataset consists of 70,000 images of handwritten digits from 0–9. We shall try to identify the digits using a CNN.

In [4]:
from keras.datasets import mnist
#download mnist data and split into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


Exploring the data

In [26]:
y_train[:4]

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

This shows that the first 4 digits in the training set are 5, 0, 4, and 1

In [7]:
#check image shape
X_train[0].shape

(28, 28)

By default, the shape of every image in the mnist dataset is 28 x 28, so we will not need to check the shape of all the images. But in real scenarios, we might need to reshape every image to 28 x 28 to fit the model.

Data Pre-processing

In [31]:
#reshape data to fit model
X_train = X_train.reshape(60000,28,28,1)
X_test = X_test.reshape(10000,28,28,1)

We need to ‘one-hot-encode’ our target variable. This means that a column will be created for each output category and a binary variable is inputted for each category. For example, we saw that the first image in the dataset is a 5. This means that the sixth number in our array will have a 1 and the rest of the array will be filled with 0.

In [10]:
from keras.utils import to_categorical
#one-hot encode target column
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_train[0]

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

Building the Model

* The model type that we will be using is Sequential. It allows vuilding a model layer by layer.
* add() function is used to add layers to the model.
* The first 2 layers are Conv2D layers. These are convolution layers that will deal with our input images, which are seen as 2-dimensional matrices.
* 64 in the first layer and 32 in the second layer are the number of nodes in each layer. This number can be adjusted to be higher or lower, depending on the size of the dataset.
* Kernel size is the size of the filter matrix for our convolution. So a kernel size of 3 means that there will be a 3x3 filter matrix. 
* Activation is the activation function for the layer. The activation function we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. ReLU outputs anything less than 0 as 0 and anything greater than 0 will be output as it is.
* Our first layer also takes in an input shape. This is the shape of each input image, 28,28,1 as seen earlier on, with the 1 signifying that the images are greyscale.
* In between the Conv2D layers and the dense layer, there is a ‘Flatten’ layer. Flatten serves as a connection between the convolution and dense layers.*
* ‘Dense’ is the layer type we will use in for our output layer. Dense is a standard layer type that is used in many cases for neural networks.
* There will be 10 nodes in our output layer, one for each possible outcome (0–9).
* The activation is ‘softmax’. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.

In [12]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten
#create model
model = Sequential()
#add model layers
model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

Compiling the Model

* The optimizer controls the learning rate. We will be using ‘adam’ as our optmizer. Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate throughout training.
* The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.
* 'categorical_crossentropy' is the most common choice of loss function for classification. A lower score indicates that the model is performing better.
* The metrics being used is 'accuracy which basically tells us how accurate the classification is.


In [13]:
#compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Training the Model

For this tutorial, let us train the model of 3 epochs.

In [32]:
#train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x24c367e8f98>

Prediction

In [33]:
#predict first 4 images in the test set
model.predict(X_test[:4])

array([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]], dtype=float32)

In [34]:
#actual results for first 4 images in test set
y_test[:4]

array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

The predicted results show that the first 4 digits are all 8s but the actual results show that the first four are actually 7, 2,1 and 0. The error is evident as our accuracy rate is just 0.097 which is less than 10%. As we increase the number of epochs for training the model, the 

## References
1. Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications". Foundations and Trends in Signal Processing. 7 (3–4): 1–199. URL - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf
2. LeCun, Yann. "LeNet-5, convolutional neural networks". URL - http://yann.lecun.com/exdb/lenet/
3. "CS231n Convolutional Neural Networks for Visual Recognition". URL - cs231n.github.io.
4. "Convolutional Neural Networks in Python". URL - https://www.datacamp.com/community/tutorials/convolutional-neural-networks-python
5. "The Best Explanation of Convolutional Neural Networks on the Internet". URL - https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad5df8
6. "Building a Conolutional Neural Network in Keras". URL - https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5