# **Deep Neural Network for MNIST Classification**

The goal of this work is to write a program capable of detecting which digit is written based on the MNIST dataset. The MNIST dataset refers to handwritten digit recognition and provides 70000 images (28x28 pixels) of handwritten digits (1 digit per image). This classification is a problem with 10 classes since we have 10 digits (0,1,2,3,4,5,6,7,8,9). 
We aim at building a neural network with 2 hidden layers.  

Note: this code was written on Google Colab.

**Import the relevant packages**

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds # this package provides access to the MNIST dataset

**Data preprocessing**

In [None]:
# Load the MNIST dataset
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

# the argument name helps us specify the name of the dataset we want to load
# the argument with_info returns a tuple that stores information about the dataset
# the argument as_supervised, when True, returns a 2-tuple structure (input, target), when False, it returns a dictionary containg all the features
# the argument split splits the dataset into a train and test sets

# now let's extract the train and test set separately
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

[1mDownloading and preparing dataset mnist/3.0.1 (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...[0m


local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead pass
`try_gcs=True` to `tfds.load` or set `data_dir=gs://tfds-data/datasets`.



Dl Completed...:   0%|          | 0/4 [00:00<?, ? file/s]


[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


In [None]:
# print the mnist_train set to see what it looks like
mnist_train

<PrefetchDataset shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>

In [None]:
# tensorflow has a training and test datasets by default
# however it doesn't have a validation set, so we need to do the split ourselves
# first let's see what the variable mnist_info looks like
mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    version=3.0.1,
    description='The MNIST database of handwritten digits.',
    homepage='http://yann.lecun.com/exdb/mnist/',
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=70000,
    splits={
        'test': 10000,
        'train': 60000,
    },
    supervised_keys=('image', 'label'),
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
    redistribution_info=,
)

In [None]:
# mnist_info is a variable that stores multiple items
# the items (some of them are dictionaries) needed to define a validation set are "splits" and "total_num_examples" 

# we're going to define the number of validation samples as % of the training samples.  
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples # 10% of the samples extracted
# we need to convert this variable to an integer
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# we also need to define a variable in which we can store the number of test samples
# this will avoid using the minst_inf.splits method to extract the test samples when we need to
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64) # convert the variable to an integer

# next step is to scale our data in order to have inputs between 0 and 1
# we write a function that accepts two inputs, an input image and a label
# the goal is to make the results more numerically stable as the input images will take values between 0 and 1
def scale(image, label):
  # first we convert the image to a float
  image = tf.cast(image, tf.float32)
  # image pixel intensities are comprised between 0 and 255 (256 shades of grey)
  # we divide each input by 255
  image /= 255.
  return image, label

# use the method map to apply the function scale on the train and test sets
train_validation_scaled = mnist_train.map(scale)
test_scaled = mnist_test.map(scale)

# once the data is scaled, we will define a buffer_size to shuffle it
buffer_size = 10000
# this buffer_size is a parameter we use when dealing with very large datasets
# in this case, the dataset cannot be shuffled at once as it can't be fit in memory
# so instead, tensorflow stores only samples at a time and shuffles them

# use the shuffle method
train_validation_scaled_shuffled = train_validation_scaled.shuffle(buffer_size)

# now that the train data is scaled and shuffled, we need to extract the validation data from it
# the number of validation samples is defined previously with num_validation_samples variable
# use the .take() method to extract that many samples
validation_data = train_validation_scaled_shuffled.take(num_validation_samples)

# now we use the .skip() method to extract the rest of the data as train data
train_data = train_validation_scaled_shuffled.skip(num_validation_samples)

# let's now define a batch size and batch the train data
batch_size = 100
train_data = train_data.batch(batch_size)

# batch the validation data
validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_scaled.batch(num_test_samples)

# as_supervised is a 2-tuple structure
# so take the next batch
validation_inputs, validation_targets = next(iter(validation_data))

**Outline the model**

In [None]:
# our goal is to build a neural network with an input layer, output layer and 2 hidden layers
input_size = 784 # image size is 28 x 28
output_size = 10 # 10 classes (digits)
hidden_layer_size = 50 # I chose the same size for both hidden layers

# define the model
model = tf.keras.Sequential([tf.keras.layers.Flatten(input_shape=(28,28,1)),
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # first hidden layer
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # second hidden layer
                             tf.keras.layers.Dense(output_size, activation='softmax')]) # output layer
                             # the method Flatten reorders the input image (28,28,1) into a (784,) vector
                             # the method Dense implements: output = activation(dot(input,weight) + bias)

In [None]:
# define the optimizer, the loss function and the metric 
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# since the labels are not one-hot encoded (integers) we use sparse_cross_entropy as a loss function

**Training**

In [None]:
# next step is to train our data
# we specify the train_data, the number of epochs and the validation data we created
model.fit(train_data, epochs=10, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/10
540/540 - 9s - loss: 0.4159 - accuracy: 0.8819 - val_loss: 0.2370 - val_accuracy: 0.9347 - 9s/epoch - 16ms/step
Epoch 2/10
540/540 - 4s - loss: 0.1900 - accuracy: 0.9455 - val_loss: 0.1732 - val_accuracy: 0.9488 - 4s/epoch - 8ms/step
Epoch 3/10
540/540 - 4s - loss: 0.1434 - accuracy: 0.9584 - val_loss: 0.1315 - val_accuracy: 0.9620 - 4s/epoch - 8ms/step
Epoch 4/10
540/540 - 4s - loss: 0.1180 - accuracy: 0.9658 - val_loss: 0.1164 - val_accuracy: 0.9647 - 4s/epoch - 8ms/step
Epoch 5/10
540/540 - 4s - loss: 0.1001 - accuracy: 0.9712 - val_loss: 0.1012 - val_accuracy: 0.9720 - 4s/epoch - 8ms/step
Epoch 6/10
540/540 - 4s - loss: 0.0862 - accuracy: 0.9745 - val_loss: 0.1027 - val_accuracy: 0.9695 - 4s/epoch - 8ms/step
Epoch 7/10
540/540 - 4s - loss: 0.0770 - accuracy: 0.9765 - val_loss: 0.0883 - val_accuracy: 0.9750 - 4s/epoch - 8ms/step
Epoch 8/10
540/540 - 4s - loss: 0.0675 - accuracy: 0.9791 - val_loss: 0.0732 - val_accuracy: 0.9790 - 4s/epoch - 8ms/step
Epoch 9/10
540/540 - 4s

<keras.callbacks.History at 0x7f735a075590>

**Testing**

In [None]:
# after training our model, we need to test it on the test data 
test_loss, test_accuracy = model.evaluate(test_data) 



In [None]:
print('The test loss of our model is', round(test_loss*100,2),'%')
print('The test accuracy of our model is', round(test_accuracy*100,2),'%')

The test loss of our model is 10.39 %
The test accuracy of our model is 96.92 %


**Hyperparameter optimiztion**

Many adjustements have been applied in order to improve the accuracy of the model. 
Please refer to the "Optimized Deep Neural Network for MNIST Classification_Sofiane_Ikkour" notebook to see the adjustements made.