## MNIST 

### Introduction

In this project, I am going to implement a Deep Neural Network using TensorFlow 2.0 in a dataset called MNIST, which is about handwritten digit recognition. This is a classification problem.

The dataset I am going to use is the one provided by TensorFlow and consist of 70,000 images with 28x28 pixels of handwritten numbers. That means, each image corresponds to a number between 0 and 9.

My goal is to write an algorithm that identifies the handwritten number of the given images.

### Packages 

In [1]:
# import the relevant packages
import numpy as np
import tensorflow as tf

# the dataset is download the first time to my local 
# machine and the next time the code is run, the computer
# will just load the data from the stored folder
import tensorflow_datasets as tfds

### Data

In [2]:
# load the MNIST dataset for tensorflow dataset
# with_info=True to know version, size, etc. of the data 
# as_supervised=True return 2-tuple, input and target, structure
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# we define the train and test set which are from default 
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

In [3]:
# there is no validation set, so we take a 10% from train set
num_validation_samples = 0.1*mnist_info.splits['train'].num_examples
# just in case the number of samples is not an integer
# we transform it to an integer
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

In [4]:
# also we create a variable with the number of test
# samples and we tranform from float to an integer
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

The dataset is normalised to have values between 0 and 1, instead of between 0 and 255. This is a common practice in ML, since it is desirable to have the same range of values for all the variables. This is done dividing by the maximum with a function and applying it to all the pixels in each image.

In [5]:
# the data is a grey scale from 0 to 255
# I define a function called scale to 
# translate it into a numbers from 0 to 1
def scale(image, label):
    # the value has to be a float to scale it
    image = tf.cast(image, tf.float32)
    image = image/255.
    
    return image, label

# application of the function for dataset
# first from the train that will be split 
# into train and validation
scaled_train_and_validation_data = mnist_train.map(scale)
# scale function applied to test dataset
test_data = mnist_test.map(scale)

Also, a common practice in ML is to shuffle the dataset. This is very useful practice to avoid local minimum for the loss function and to avoid ordered data that could confuse the neural network to make a wrong assumption.

In [6]:
# dealing with big datasets can make the computer run
# out of memory, therefore it is define a buffer_size 
# to shuffle by smaller groups of data
# 1 (no shuffle) < buffer_size <= num samples 
buffer_size = 1000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(buffer_size)

Now is time to extract the validation data from the train dataset

In [7]:
# the validation data is taken from the train dataset
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
# the train data is the rest 
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [8]:
# choosing the batch size
batch_size = 100

# the train data is also batched
# to be able to iterate over diff batches
train_data = train_data.batch(batch_size)

validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_data.batch(num_test_samples)

# as_supervized=True means there are two tuples 
# for each data (input and targets)
validation_inputs, validation_targets = next(iter(validation_data))

## Model

The data is already clean and prepare to be used, therefore now is time to work on the model (deep learning algorithm).

In [17]:
input_size = 28*28 # 28 times 28 pixels are 784 total inputs
output_size = 10 # ten possible digits from 0 till 9

hidden_layer_size = 500 # also called depth

# the model used is sequential
model = tf.keras.Sequential([
    # the first layer is the input layer, it is defined as a
    # tensor 28x28x1 that with the 'Flatten' is translated to 
    # a vector
    tf.keras.layers.Flatten(input_shape=(28,28,1)), # input layer
    
    # the hidden layer needs to choose the activation function 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 6th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 7th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 8th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 9th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 10th hidden layer
    
    #the last layer is the output, the activation function is with 'softmax'
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

It is needed to define the optimizer, the loss function and the metric that will be used

In [18]:
# this is done inside the 'compile' options
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

Now it's time to train the model built

In [19]:
# number of epochs
num_epochs = 8

# it is choose to fit the model the train and validation data cleaned above
model.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/8
540/540 - 25s - loss: 0.3215 - accuracy: 0.9065 - val_loss: 0.1882 - val_accuracy: 0.9530
Epoch 2/8
540/540 - 28s - loss: 0.1458 - accuracy: 0.9641 - val_loss: 0.1535 - val_accuracy: 0.9635
Epoch 3/8
540/540 - 25s - loss: 0.1116 - accuracy: 0.9721 - val_loss: 0.1483 - val_accuracy: 0.9687
Epoch 4/8
540/540 - 25s - loss: 0.0943 - accuracy: 0.9773 - val_loss: 0.1593 - val_accuracy: 0.9635
Epoch 5/8
540/540 - 26s - loss: 0.0850 - accuracy: 0.9799 - val_loss: 0.1336 - val_accuracy: 0.9703
Epoch 6/8
540/540 - 27s - loss: 0.0695 - accuracy: 0.9834 - val_loss: 0.1561 - val_accuracy: 0.9695
Epoch 7/8
540/540 - 33s - loss: 0.0627 - accuracy: 0.9854 - val_loss: 0.1194 - val_accuracy: 0.9738
Epoch 8/8
540/540 - 23s - loss: 0.0673 - accuracy: 0.9847 - val_loss: 0.1240 - val_accuracy: 0.9730


<tensorflow.python.keras.callbacks.History at 0x24364014688>

It is needed to play around the hyperparameters to find the better fit to the validation data.

### Test

The model has to be tested with data that has never seen the model before to check the accuracy

In [20]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 97.53%
