We are going to create a convolutional neural network capable of recognizing handwritten digits! 

The principle is simple:
using a dataset of images containing handwritten digits (annotated with the corresponding digit labels), provided by Keras (a deep learning framework), we will train three different models( three different architectures) and compare them.

In [21]:
#Imports
 
import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D

In [22]:
K.set_image_data_format('channels_first')
seed = 7
np.random.seed(seed)

Setting the seed in machine learning is important for reproducibility: Machine learning models often involve randomness, such as initializing weights, shuffling datasets, or splitting data into training and validation sets.By setting the seed, the randomness is controlled, and you can reproduce the same results each time you run the model.

### Load the data

In [23]:
#load data
(X_train, y_train),(X_test, y_test)= mnist.load_data()
print (X_train)
print (X_train.shape)
print (y_train)
print (y_train.shape)
print (y_test)

[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]
(60000, 28, 28)
[5 0 4 ... 5 6 8]
(60000,)
[7 2 1 ... 4 5 6]


In [24]:
#Normalization
X_train= X_train/255
X_test=X_test/255

In [25]:
# reshape to be [samples][channels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28,28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')


In [26]:
#one hot encode outputs
#converting class vectors to binary class matrices. Each label is converted into a binary matrix representation :
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
#calculate the number of classes in the classification task
num_classes = y_test.shape[1]
print(num_classes)
print (y_test)
print (y_test.shape)

10
[[0. 0. 0. ... 1. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
(10000, 10)


One-Hot Encoding: In machine learning, especially in classification tasks, it's common to represent categorical labels using one-hot encoding.It transforms categorical labels into a binary matrix representation, where each class is represented by a binary vector.

## Build the CNN

### 1.Small Model

Our goal is to have successively:

 A convolution with 64 filters in 3×3(for detecting patterns and features) followed by a ReLU activation layer(introduces non-linearity by replacing all negative pixel values in the feature map with zero)

 A convolution with 32 filters in 3×3 followed by a ReLU activation layer

 A flatten layer that transforms the 2D feature maps into a 1D vector that can be fed into a traditional artificial neural network.s.

 A dense layer, an artificial neural network with 10 neurons, followed by a softmax to make predictions based on the features learned from the convolutional layers
The 10 neurons in the output layer correspond to the possible classes ( digits 0 through 9). The softmax activation function is used to convert the network's raw output into probabilities.

In [27]:
# create model
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=(1, 28, 28), activation='relu'))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

#compile model
model.compile(  optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history= model.fit(X_train, y_train, epochs=10, batch_size=200, validation_data=(X_test, y_test))
print (history)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
<keras.callbacks.History object at 0x0000022109F79D00>


In [29]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Model score : %.2f%%" % (scores[1]*100))
print("Model error rate : %.2f%%" % (100-scores[1]*100))


Model score : 98.71%
Model error rate : 1.29%


### 2.Medium Model

defined as follows:

A convolution with 32 filters in 5×5 with a ReLU activation .

A 2×2 max-pooling.

A dropout of 0.2

A flatten layer.

A dense layer with 128 outputs and ReLU activation.

A final dense layer with 10 outputs and softmax activation.

In [30]:
# create model
model = Sequential()
model.add(Conv2D(64, (5, 5), input_shape=(1, 28, 28), activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='ReLU'))
model.add(Dense(10, activation='softmax'))

#compile model
model.compile(  optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history= model.fit(X_train, y_train, epochs=10, batch_size=200, validation_data=(X_test, y_test))
print (history)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
<keras.callbacks.History object at 0x000002210D783F10>


In [31]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Model score : %.2f%%" % (scores[1]*100))
print("Model error rate : %.2f%%" % (100-scores[1]*100))

Model score : 98.87%
Model error rate : 1.13%


### 3.Large Model

Defined as follows:

A convolution with 30 filters of size 5x5 with ReLU activation 

A max-pooling layer of size 2x2

A convolution with 15 filters of size 3x3 with ReLU

A dropout of 0.2

A flatten layer

A dense layer with 128 outputs and ReLU activation

A dense layer with 50 outputs and ReLU activation

A dense layer with 10 outputs and softmax.

In [32]:
# create model
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(15, (3,3), input_shape=(1, 28, 28), activation='relu'))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='ReLU'))
model.add(Dense(50, activation='ReLU'))
model.add(Dense(10, activation='softmax'))

#compile model
model.compile(  optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history= model.fit(X_train, y_train, epochs=10, batch_size=200, validation_data=(X_test, y_test))
print (history)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
<keras.callbacks.History object at 0x000002210DB0B8E0>


In [33]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Model score : %.2f%%" % (scores[1]*100))
print("Model error rate : %.2f%%" % (100-scores[1]*100))

Model score : 99.07%
Model error rate : 0.93%
