<a href="https://colab.research.google.com/github/rramjee/rramjee/blob/master/Copy_of_Session2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Not an ideal network**

In [0]:
# https://keras.io/
# Installing Keras Library. This is designed to run on top of the tensorflow framework
!pip install -q keras
import keras

In [0]:
# Importing all the required classes from numpy and keras libraries. We will be calling the methods from these classes to perform required action in our following code.
import numpy as np

from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Convolution2D
from keras.utils import np_utils

from keras.datasets import mnist

In [0]:
# Downloading MNIST Dataset and also spliting the data into Test and Train set
(X_train, y_train), (X_test, y_test) = mnist.load_data()


In [0]:
# Printing the shape of Training set dimensions this will help to be able to pass the input layer dimesions to the neural network.
print (X_train.shape)
# Viewing the first image from training set to get an idea of the images in training set. 
# So it appears that we have 60,000 samples in our training set, and the images are 28 pixels x 28 pixels each. We can confirm this by plotting the first sample in matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
plt.imshow(X_train[0])

In [0]:
# All gray scale images will have only one channel and that needs to be explicity mentioned. So, we are reshaping all the training and test images to 28*28*1. 
# Specifically mentioning only one channel.
X_train = X_train.reshape(X_train.shape[0], 28, 28,1)
X_test = X_test.reshape(X_test.shape[0], 28, 28,1)

In [0]:
# Since it is a gray scale image. All the values will be in the range of 0 to 255. Hence normalizing the values to the range of 0 to 1.
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

In [31]:
# Checking the values of the first ten images in the training set
y_train[:10]

array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4], dtype=uint8)

In [0]:
# Using categorical function to get a one hot encoding of the output values. This will help perform the loss function calculations 
# Convert 1-dimensional class arrays to 10-dimensional class matrices
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In [0]:
#Looking at the first ten rows of one hot encoded values of training set
Y_train[:10]


In [0]:

# Building the Neural Network layers sequentially and defining the input shape, number of kernals, max pooling layers, etc.
# Relu is used as activation function in the internal layers and softmax is used for classification in the last layer
from keras.layers import Activation, MaxPooling2D

model = Sequential() 
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1)))  # i/p : 28*28*1 rec : 3*3 o/p : 26*26*32
model.add(Convolution2D(64, 3, 3, activation='relu')) # i/p : 26*26*32 rec : 5*5 o/p : 24*24*64
model.add(Convolution2D(128, 3, 3, activation='relu')) # i/p : 24*24*64 rec : 7*7 o/p : 22*22*128

model.add(MaxPooling2D(pool_size=(2, 2))) # i/p : 22*22*128 rec : 14*14 o/p : 11*11*128

model.add(Convolution2D(256, 3, 3, activation='relu')) # i/p : 11*11*128 rec : 16*16 o/p : 9*9*256
model.add(Convolution2D(512, 3, 3, activation='relu')) # i/p : 9*9*256 rec : 18*18 o/p : 7*7*512
model.add(Convolution2D(1024, 3, 3, activation='relu')) # i/p : 7*7*512 rec : 20*20 o/p : 5*5*1024
model.add(Convolution2D(2048, 3, 3, activation='relu')) # i/p : 5*5*1024 rec : 22*22 o/p : 3*3*2048
model.add(Convolution2D(10, 3, 3, activation='relu')) # i/p : 3*3*2048 rec : 24*24 o/p : 1*1*10

model.add(Flatten())
model.add(Activation('softmax'))

model.summary()

In [0]:
# Compiling the model by sepcifying the optimizer function, loss function and metrics.
# adam is generally the preferred optimizer
model.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

In [36]:
# Fit functiona is called to train the network with the training dataset with 10 epcohs
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)

  """Entry point for launching an IPython kernel.


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f38a05565f8>

In [0]:
#evaluating the trained model by testing the against the test data to get accuracy and other metrics
score = model.evaluate(X_test, Y_test, verbose=0)

In [38]:
print(score)
#printing score

[2.3025851249694824, 0.098]


In [0]:
y_pred = model.predict(X_test)
#predicting the data on test set and storing in y pred

In [40]:
print(y_pred[:9])
print(y_test[:9])

[[0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
 [0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]]
[7 2 1 0 4 1 4 9 5]


##What according to me is wrong with the network


1.   I find that the number of channels(kernels) used at the later layers are way too much for problem of classifiying the digits 0 to 9 in gray scale. In my view, this could be achieved by using lesser number of channels (kernels). This also oincreases the number of parameters significantly.
2.   The penultimate layer has 2048 channels and immediately there is a drop to 10 channels in the last layer. This could lead to missing lot of important features and information. Instead it should have been gradually decreased.
3. a 3*3 kernel is used in the last layer that has an input size of 3*3*2048. It is usally not a good idea to use a 3*3 kernel on a 3*3 input to get to a 1*1 output. you will loose a lot of critical information. Instead the maxpooling could have been done is to use a max pooling at 20*20 to get to a 10*10 layer and reduced it to a 2*2 and use a 2*2 filter to reach 1*1. This way curcuial information will not be lost.

