# Deep Neural Network for MNIST Classification

The problem I chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf
# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module
import tensorflow_datasets as tfds


## Data

That's where we load and preprocess our data.

In [2]:
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info
# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 

# once we have loaded the dataset, we can easily extract the training and testing dataset with the built references
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
# let's also store the number of test samples in a dedicated variable (instead of using the mnist_info one)
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    # we make sure the value is a float
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 different shades of grey)
    # if we divide each element by 255, we would get the desired result -> all elements will be between 0 and 1 
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)

# let's also shuffle the data
BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# once we have scaled and shuffled the data, we can proceed to actually extracting the train and validation
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# determine the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

[1mDownloading and preparing dataset mnist/3.0.1 (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...[0m


local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead pass
`try_gcs=True` to `tfds.load` or set `data_dir=gs://tfds-data/datasets`.



HBox(children=(FloatProgress(value=0.0, description='Dl Completed...', max=4.0, style=ProgressStyle(descriptio…



[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [6]:
input_size = 784
output_size = 10
hidden_layer_size = 200
model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [7]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [8]:
# determine the maximum number of epochs
NUM_EPOCHS = 10
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/10
540/540 - 5s - loss: 0.2727 - accuracy: 0.9211 - val_loss: 0.1374 - val_accuracy: 0.9582
Epoch 2/10
540/540 - 5s - loss: 0.1045 - accuracy: 0.9680 - val_loss: 0.0919 - val_accuracy: 0.9722
Epoch 3/10
540/540 - 5s - loss: 0.0728 - accuracy: 0.9777 - val_loss: 0.0666 - val_accuracy: 0.9808
Epoch 4/10
540/540 - 5s - loss: 0.0514 - accuracy: 0.9835 - val_loss: 0.0542 - val_accuracy: 0.9830
Epoch 5/10
540/540 - 5s - loss: 0.0394 - accuracy: 0.9876 - val_loss: 0.0514 - val_accuracy: 0.9838
Epoch 6/10
540/540 - 5s - loss: 0.0316 - accuracy: 0.9894 - val_loss: 0.0467 - val_accuracy: 0.9843
Epoch 7/10
540/540 - 5s - loss: 0.0268 - accuracy: 0.9915 - val_loss: 0.0356 - val_accuracy: 0.9895
Epoch 8/10
540/540 - 5s - loss: 0.0220 - accuracy: 0.9932 - val_loss: 0.0207 - val_accuracy: 0.9933
Epoch 9/10
540/540 - 5s - loss: 0.0178 - accuracy: 0.9943 - val_loss: 0.0150 - val_accuracy: 0.9955
Epoch 10/10
540/540 - 5s - loss: 0.0158 - accuracy: 0.9949 - val_loss: 0.0180 - val_accuracy: 0.9947

<tensorflow.python.keras.callbacks.History at 0x7f7fee7dc590>

## Test the model

we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.


In [9]:
test_loss, test_accuracy = model.evaluate(test_data)



In [10]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.08. Test accuracy: 97.79%


Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly around 97%.

Each time the code is rerun, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, we have intentionally reached a suboptimal solution.
