## Introduction

This tutorial will introduce you to some basic methods for modeling data through the Keras library specifically using convolutional neural networks. We will model the data using the training set and check our prediction accuracy on the testing dataset. The tutorial content focuses on understanding and implementing the keras library for understanding applied deep learning. In a nutshell, deep learning referes to multiple layers of neural network which can be trained on very abstract representations of data but predicts test data with very high accuracy. The hidden layers learn complex representations of data the deeper we go into them. For example, the first hidden layer might learn only the basic lines but the last layer can classify the image correctly as an airplane or a boat 

### Tutorial content

We'll be using the mnist dataset from the keras libary by building and implementing a neural network

We will cover the following topics in this tutorial:
- [Installing the libraries](#Installing-the-libraries)
- [Example application: Classification of mnist data](#Example-application:-Classification-of-mnist-data)
- [Summary & References](#Summary-&-References)

## Installing the libraries

We will have to install different libraries before getting started with the tutorial. We have to install Keras, TensorFlow, Theano and scikit-learn using `conda`:

    $ conda install keras
    
    $ conda install theano
    
    $ conda install tensorFlow
    
    $ conda install scikit-learn

Keras requires a back-end engine to run. Hence, we have to install tensorFlow as well as theano

We will have to import the following libraries for the tutorial. The sequential type of model provides a simple linear stucture for stacking layers. Activation, Dense and Flatten are types of core layers. The Convolution2D is the Convolutional neural network layer which will be used to effectively build the model using the training data.
Note that we have to set the backend for setting the image dimensionality ordering as we are implementing and processing the image dataset

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split as tts
import keras
from keras import backend as K
K.set_image_dim_ordering('th')
from keras.datasets import mnist
from keras.layers import Convolution2D
from keras.layers import Dense as de
from keras.layers import Flatten as flat
from keras.models import Sequential
from keras.utils import np_utils as npu

## Example application: Classification of mnist data

In [3]:
(train_x, train_y), (test_x, test_y) = mnist.load_data()

.shape keyword can be used to find out the dimensionality. Using the below line of code we can imply that the training dataset consists of 60000 examples and each image data example is of size 28 X 28 pixels. 

In [4]:
train_x.shape

(60000, 28, 28)

Similarly, the test dataset consists of of 10000 examples with every example of size 28 X 28 pixels.

In [5]:
test_x.shape

(10000, 28, 28)

In [6]:
train_x

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ..., 
       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, 

In [7]:
train_y

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [8]:
test_x

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ..., 
       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, 

In [9]:
test_y

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

Extracting the number of data points, width and height of every data point for the training and testing dataset.

In [10]:
train_shape = train_x.shape[0]
test_shape = test_x.shape[0]
train_width = train_x.shape[1]
test_width = test_x.shape[1]
train_height = train_x.shape[2]
test_height = test_x.shape[2]

When using the backend as theano, we have to explicitly specify a parameter "depth" which is 1. For understanding depth with an example,the depth of an RGB image is 3 as it has 3 component attributes of Red, Green and Blue. 

In [11]:
train_x = train_x.reshape(train_shape, 1, train_width, train_height)
test_x = test_x.reshape(test_shape, 1, test_width, test_height)

Converting the values to float before normailzing it to have all the values between 0 and 1. Normalization is essential so that each input will have the same range of values. If a feature has a large range of values then the values in the higher ranges will be considered as having far more effect than some other features with a low band of range. Stable convergence can be guaranteed through Normalization. 

In [12]:
train_x = train_x.astype('float32')
test_x = test_x.astype('float32')

In [13]:
train_x = train_x / 255
test_x = test_x / 255

In [14]:
train_x

array([[[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       ..., 
       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [

In [15]:
test_x

array([[[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         ..., 
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.]]],


       ..., 
       [[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [ 0.,  0.,  0., ...,  0.,  0.,  0.],
         [

As a quick check, we can examine the length of the training dataset.

In [16]:
len(train_y.shape)

1

In [17]:
train_y.shape

(60000,)

The length of the training sample should not be 1. It should be 2 as each example should have the data and the label associated with it. There seems to be a problem as there are no clear class labels present.

In [18]:
train_y[:5]

array([5, 0, 4, 1, 9], dtype=uint8)

In [19]:
test_y.shape

(10000,)

In [20]:
test_y[:5]

array([7, 2, 1, 0, 4], dtype=uint8)

The above lines of code implies that we need to do data processing. The Dataset is a one dimensional array without any explicitly defined categories. Hence, we will have to specify the number of categories explicitly.

In [21]:
no_of_classes = 10
train_y = npu.to_categorical(train_y, no_of_classes)
test_y = npu.to_categorical(test_y, no_of_classes)

In [22]:
train_y

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

In [23]:
test_y

array([[ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

The following 4 blocks of code consists of defining the neural network. Firstly, we have finalized going with a sequential model because of its simplicity. The other option of a model is "the Model class used with functional API" which has a different set of syntax but the methods are almost the same.

In [24]:
theano_model = Sequential()

We declare the input layer. The input shape corresponds to the depth and the shape of each example, similar to the training and the testing datasets. The first three parameters consist of the number of convolution filters, the number of rows in the convolution setting and the number of columns in the convolution setting. 

In [25]:
theano_model.add(Convolution2D(16, 2, 2, init='glorot_uniform', activation=None, border_mode='valid', input_shape = (1, train_width, train_height)))

  """Entry point for launching an IPython kernel.


Once the input layer is defined, we can add any layers in between on top of the other. The type and the number of connecting layers to use varies between use cases. There is no hard rule of thumb as such about the number of layers to use but, the general case is that the higher the number of connecting layers, the higher will be the accuracy at the cost of increasing complexity in the understanding of the actual mechanism of the working model.  

In [26]:
theano_model.add(flat())

The following line of code is the crux of the network. We have constructed 10 neurons in the hidden layer. The intuition is that basically each unique neuron will be activated and fired when a particular digit pattern is recognized.  

The second argument "activation" is related to the following-
Regression: It consists of a 'linear' function. In this type of activation function, the total number of neurons match the total number of outputs.
Classification (Binary): This type of activation function consists of 'sigmoid' or Logistic activation. There is one neuron present in the output layer.
Classification (Multiclass): Multiclass activation function is called 'softmax' activation. There is a unique output neuron for every class present.

In [27]:
theano_model.add(de(10, activation ='softmax'))

After defining the network, we have to complie it. Complilation leads to the transformation of the basic layers defined in the network definition phase into a series of matrices which perform efficent computations on the way to building the eventual model. There is a need to specify the metric(s) for the eventual evaluation of the model, an optimization function and loss function to be minimized by the optimization algorithm. 

we are going to use the standard accurascy metric for evaluating the network. We are using the stochastic gradient descent optimizer (sgd). 
The different optimizers are as follows- 
RMSprop - The gradient is divided by a running average of the recent magnitude
Adagrad - Used for stochastic optimization and online learning
Adadelta - It is an adaptive learning rate method
TFOptimizer - It is used to run native TensorFlow optimization techniques.

Some other optimizers are as follows - 
Adam, Adamax, Nadam

The third argument is "loss". The standard loss functions defined for predictive models are as follows - 
Regression: 'mse'
Classification (Binary): 'binary_crossentropy'
Classification (Multiclass): 'categorical_crossentropy'

In [28]:
met = ['accuracy']
opt = 'sgd'
ls = 'categorical_crossentropy'
theano_model.compile(metrics = met, optimizer = opt, loss = ls)

In [29]:
theano_model.fit(train_x, train_y, batch_size=16, nb_epoch=4)



Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1c05629a198>

In [30]:
acc = theano_model.evaluate(test_x, test_y)



In [31]:
acc

[0.29703568347096443, 0.9123]

## Summary & References

The tutorial only underlined a few topics from the Keras library. I found many webpages and links helpful for the Detailed understanding, explaination about the library and the methods implemented. The information is available from the following links.
1. https://elitedatascience.com 
2. https://keras.io/ 
3. https://machinelearningmastery.com