# Deep Learning Model Training

This is a simple Deep Learning Model Training Sample using Keras High Level API.

Keras is a high-level neural networks API, written in Python which contains a set of helper methods and libraries to define neural networks. 

Keras also comes with a access to some certain set of open data set for training purposes and Keras also provides some utility methods for pre-processing training data. 

Keras is self is a high level API so it needs a backend, it can work on Tensorflow, CNTK and Theano.

In following example, we will go over training a model with famous MNIST (hand-written digits) data set to create a DL model which can predict hand written digits. 

- Good visual ilustration of the model we will build
http://scs.ryerson.ca/~aharley/vis/conv/

First, we start importing required libraries to start with. 

## MNIST Database - Handwritten digits (0-9)

On this tutorial we will use Python* to implement one Convolutional Neural Network - a simplified version of LeNet - that will recognized Handwritten digits. A project like this one, using the MNIST dataset is considered as the "Hello World" of Machine Learning.

We will use Keras*, TensorFlow* and the MNIST database.

According to the description on their website, "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."*

We will use TensorFlow as the backend for Keras. TensorFlow is an open source software library for high performance numerical computation.

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. MNIST database is also available as a Keras dataset, with 60k 28x28 images of the 10 digits along with a test set of 10k images, so it is very easy to import and use it on our code.

One good visual and interactive reference on what we are developing can be found here. The basic difference between our code and this interactive sample is the number and size of convolutional and fully-connected layers (LeNet uses two of each, we will use a single one, to reduce training time). We also adjusted the layers size to balance between accuracy and training time. We are achieving 98,54% of accuracy with less than 2 minutes training time on an Intel® Core™ processor.

This code can also be optimized by several ways to increase accuracy, and we would like to invite you to explore this later, changing the number of epochs, filters, fully-connected neurons and also including additional convolutional and fully connected layers. You can also use flattening, dropout and batch normalization layers. Other optimization techniques can also be applied, so feel free to use this tutorial code as a base to explore those optimization techniques.

In a nutshell, the convolutional and pooling layers are responsible for extracting a set of features from the input images, and the fully-connected layers are responsible for classification.

Convolutional layers applies a set of filters to the input image to extract important features from the image. The filters are small matrixes also called image kernels that can be repeatedly applied to the input image ("sliding" the filter on the image). You may already used those filters on traditional image processing applications such as GIMP (i.e. blurring, sharpening or embossing). This article gives a good overview on image kernels with some live experiments. Each filter will generate a new image that will be the input for the next layer, typically a pooling layer.

Pooling layers reduces the spatial size of the image (downsampling), reducing the computation in the network and also controlling overfitting.

Fully connected layers are traditional Neural Network layers.

In [1]:
# Sequential Network Model https://keras.io/models/sequential/
from keras.models import Sequential
# Core Layers https://keras.io/layers/core/
# Dense: densely-connected NN layer, to be used as classification layer
# Flatten: layer to flatten the convolutional layers
from keras.layers import Dense, Flatten
# Convolutional Layers https://keras.io/layers/convolutional/
# Conv2D: 2D convolution Layer
from keras.layers import Conv2D
# Pooling Layer: https://keras.io/layers/pooling/
# MaxPooling2D: Max pooling operation for spatial data
from keras.layers import MaxPooling2D
# Utilities https://keras.io/utils/
from keras.utils import np_utils
# MNIST Dataset https://keras.io/datasets/
# Dataset of 60,000 28x28 handwritten images of the 10 digits, along with a test set of 10,000 images.
from keras.datasets import mnist

Using TensorFlow backend.


At this stage, we will first load the dataset. using mnist interface. Then, we go with pre-processing data set to make it ready to be accepted in Input layer for DL.

Then, we make sure type is float, DL models only uses floating point values.

In [2]:
# Load MNIST data set in two sets: Trainning (60K IMAGES) and Testing (10k images)
(train_dataset, train_classes),(test_dataset, test_classes) = mnist.load_data()

# Adjust datasets to TensorFlow
# Reduce image channels from 3 to 1
train_dataset = train_dataset.reshape(train_dataset.shape[0], 28, 28, 1)
test_dataset = test_dataset.reshape(test_dataset.shape[0], 28, 28, 1)

# Covert data from int8 to float32
train_dataset = train_dataset.astype('float32')
test_dataset = test_dataset.astype('float32')

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


Here, we are doing normalization of the dataset. DL models, can work on normalized data a lot faster and accurate. heterogenous numbers tend to over-fit and hard to converge during training process. 

In [3]:
# Normalize data to speed up processing time
train_dataset = train_dataset / 255
test_dataset = test_dataset / 255

At this part, we convert output to a categorical class representation. During the training process, our model will try to converge certain values, mainly 0 or 1. However in this case we have 10 different category. Therefore, we create a tensor with 1D shape where training value only can be one of the values.

- e.g. label of training input is 2.
- DL output/class is [0., 0., 1., ....]

In [4]:
# Convert class data from numerical to categorical
train_classes = np_utils.to_categorical(train_classes, 10)
test_classes = np_utils.to_categorical(test_classes, 10)

Here is the final part where we create a basic Convolutional Neural Network with using Keras Layers. 

Below is a very basic CNN.

In [5]:
# Create the Convolutional Neural Network
cnn = Sequential()

# Add the convolutional layer with 32 filters, 3x3 convolution window,
# 28 x 28 x 1 pixels imput array and Rectified Linear Unit activation function
cnn.add(Conv2D(32, (3,3), input_shape = (28, 28, 1), activation = 'relu'))

# Add one Pooling layer with default 2x2 size
cnn.add(MaxPooling2D())

# Add one flattening layer to convert the output matrix to a vector to be the Deep Neural Network input
cnn.add(Flatten())

# Add one hidden layer with 128 neurons and Rectified Linear Unit activation function
cnn.add(Dense(units = 128, activation = 'relu'))

# Add the output layer with 10 neurons (one for each class) with Softmax as the activation function
cnn.add(Dense(units = 10, activation = 'softmax'))

cnn.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 5408)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               692352    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
Total params: 693,962
Trainable params: 693,962
Non-trainable params: 0
_________________________________________________________________


Here is the last part, where loss function, optimizer and metrics has been defined for training process. 

`.fit` method used to start training process with provided training and label data.

When, fit finished your model is ready to predict. 

`.evaluate` used to check results with test data set and see the accuracy of your model with a data set never seen.

In [6]:
# Compile the CNN with:
#  - Categorical crossentropy as the loss function
#  - Adam optimizer
#  - Accuracy as the results evaluation metric
cnn.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

# Execute the training on 5 epochs, validating the generated model with test dataset on each epoch
cnn.fit(train_dataset, train_classes, batch_size = 128, epochs = 5, validation_data = (test_dataset, test_classes))

# Extract and print the Accuracy results
result = cnn.evaluate(test_dataset, test_classes)
print ('Accuracy = ' + str(result[1] * 100) + "%")

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Accuracy = 98.77%
