# Convolutional Neural Network Models

In the previous notebook, we saw that the performance of the simple neural network was very poor even after adding many hidden nodes. We can add more hidden layers to that network but this would result in the neural network taking a lot of time to fit to the training data.

Convolutional neural networks performance best for image and character recognition problems because if we consider any image, proximity has a strong relation with similarity in it and convolutional neural networks specifically take advantage of this fact. This implies, in a given image, two pixels that are nearer to each other are more likely to be related than the two pixels that are apart from each other. Nevertheless, in a usual neural network, every pixel is linked to every single neuron. The added computational load makes the simple neural network less accurate in this case. By killing a lot of these less significant connections, convolution solves this problem. In technical terms, convolutional neural networks make the image processing computationally manageable through filtering the connections by proximity.

## Loading Libraries and Data

In [1]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.optimizers import Adam
from keras.layers.normalization import BatchNormalization
from keras.utils import to_categorical
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, GlobalAveragePooling2D
from keras.layers.advanced_activations import LeakyReLU 
import numpy as np
import pandas as pd

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Loading and preparing data

In [2]:
train = pd.read_csv("../data/emnist-balanced-train.csv", header = None)
test = pd.read_csv("../data/emnist-balanced-test.csv", header = None)
print(train.head())
print(test.head())
print(train.shape)
print(test.shape)

   0    1    2    3    4    5    6    7    8    9   ...   775  776  777  778  \
0   45    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
1   36    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
2   43    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
3   15    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
4    4    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   

   779  780  781  782  783  784  
0    0    0    0    0    0    0  
1    0    0    0    0    0    0  
2    0    0    0    0    0    0  
3    0    0    0    0    0    0  
4    0    0    0    0    0    0  

[5 rows x 785 columns]
   0    1    2    3    4    5    6    7    8    9   ...   775  776  777  778  \
0   41    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
1   39    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
2    9    0    0    0    0    0    0    0    0    0

Preparing the data in the correct format to be able to feed to the convolutional neural network.

In [3]:
# separate out the train data from the response variable
X_train = train.iloc[:, 1:]
X_test = test.iloc[:, 1:]

# separate out the response variable from the data
Y_train = train[0]
Y_test = test[0]

# converting the pandas dataframe to numpy matrices
X_train = X_train.values
Y_train = Y_train.values
X_test = X_test.values
Y_test = Y_test.values

# reshaping the data into the format which can be passed to the neural network
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

# converting the data type to float32
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

Another change that we make in this notebook from the previous one is to actually normalize all the predictors so that they takes values from [0,1].

In [4]:
# Normalizing the predictors
X_train/=255
X_test/=255

# Defining the number of classes in the response variable
number_of_classes = 47

# One hot encoding the response variable
Y_train = to_categorical(Y_train, number_of_classes)
Y_test = to_categorical(Y_test, number_of_classes)

## Building the ConvNet

Building a sparse convolutional neural network model by following these methods:

1. Add convolution layers
2. Add activation function
3. Add pooling layers
4. Repeat Steps 1,2,3 for adding more hidden layers
5. Finally, add a fully connected softmax layer giving the CNN the ability to classify the samples

In [5]:
# Defining the number of classes in the response variable
number_of_classes = 47

# Adding the first set of convolutional and pooling layers with ReLu activation
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28,28,1)))
model.add(BatchNormalization(axis=-1))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(BatchNormalization(axis=-1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

# Adding the second set of convolutional and pooling layers with ReLu activation
model.add(Conv2D(64,(3, 3)))
model.add(BatchNormalization(axis=-1))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(BatchNormalization(axis=-1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())

# Adding fully connected layers with softmax activation and 20% dropout
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(number_of_classes))
model.add(Activation('softmax'))

# Compiling the model with categorical crossentropy loss function to handle multiple classes
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

## Fitting the ConvNet

Fit the convolutional neural network model on the training data and evaluate on the test data.

In [6]:
# Setting the batch size and number of epochs
batch_size=256
epochs=10

# Training the model on the train data
model.fit(X_train, Y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(X_test, Y_test))

# Evaluating the model on the test data
score = model.evaluate(X_test, Y_test, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])                                                      

Train on 112800 samples, validate on 18800 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test score: 0.34364881123634095
Test accuracy: 0.8838829787234043


As we can clearly see, the performance of the convolutional neural network as a lot better than that of the simple neural nets we built in the previous notebook. This is beacuse of the reasons we discussed at the beginning of this notebook. Now, that we are done with the exploratory part of the model building process, we will actually start building more complex convolutional neural network models on the balanced data and also on the byclass data. The code for these models can be found in the directory [develop/src/models](../src/models).

We tried the following different models.

1. Sparse convolutional neural network on balanced data (exactly the one trained above)
2. Dense convolutional neural network on the balanced data which has 64 filters in the first two convolutional layers (instead of 32 here) and 128 filters in the last two convolutional layers (instead of 64 here) and finally 1024 hidden nodes in the last fully connected layer.
3. Sparse convolutional neural network on the byclass data (same as the one described in 1 but for the byclass data).
4. Dense covolutional neural network on the byclass data (same as the one described in 2 but for the byclass data).