# Deep Neural Network for MNIST Classification

The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module.

import tensorflow_datasets as tfds

# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

## Data

Here, we load and preprocess our data.

In [2]:
# tfds.load loads a dataset (or downloads and then loads if that's the first time you use it) 
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
# there are other arguments we can specify.
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary

[1mDownloading and preparing dataset mnist/3.0.1 (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...[0m


local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead pass
`try_gcs=True` to `tfds.load` or set `data_dir=gs://tfds-data/datasets`.



Dl Completed...:   0%|          | 0/4 [00:00<?, ? file/s]


[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


we prefer to have our inputs and targets separated.
Once we have loaded the dataset, we can easily extract the training and testing dataset with the built references

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

By default, TF has training and testing datasets, but no validation sets splitting of validation set will be taken care by us

In [4]:
# we start by defining the number of validation samples as a % of the train samples
# we can make use of mnist_info for getting the numbers
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# number is cast to an integer, as a float may cause an error along the way
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# the number of test samples is stored in a dedicated variable
num_test_samples = mnist_info.splits['test'].num_examples
# number is cast to an integer
num_test_samples = tf.cast(num_test_samples, tf.int64)

Normally, we would like to scale our data in some way to make the result more numerically stable. In this case we prefer to have inputs between 0 and 1

In [5]:
# a function called: scale is defined that will take an MNIST image and its label
def scale(image, label):
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 different shades of grey)
    # each element is divided by 255
    image /= 255.

    return image, label


In [6]:
# the method .map() allows us to apply a custom transformation to a given dataset 
scaled_train_and_validation_data = mnist_train.map(scale)

# finally, we scale and batch the test data
# we scale it so it has the same magnitude as the train and validation
# there is no need to shuffle it
# there would be a single batch, equal to the size of the test data
test_data = mnist_test.map(scale)


Shuffle the data

In [7]:

BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

# shuffling the dataset
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)


once we have scaled and shuffled the data, we can proceed to actually extracting the train and validation

In [8]:

# our validation data would be equal to 10% of the training set, which we've already calculated
# we use the .take() method to take that many samples
# finally, we create a batch with a batch size equal to the total number of validation samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


In [9]:
# determine the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

#no need to convert the validation data to batches, but the model expects it in batches.
#so converting the data to batch of the same size
validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_data.batch(num_test_samples)


# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model


In [10]:
input_size = 784
output_size = 10
# We use same size for both hidden layers
hidden_layer_size = 50
    
# define the model
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # we must flatten the images since we are not using CNN
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer 
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer. 
    #The Activation function used is 'softmax' since we are dealing with classification
])

### Choose the optimizer and the loss function

In [11]:
# we define the optimizer we'd like to use, 
# the loss function, 
# and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [12]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

# we fit the model, specifying the training data, the total number of epochs and the validation data
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 10s - loss: 0.4212 - accuracy: 0.8804 - val_loss: 0.2068 - val_accuracy: 0.9400
Epoch 2/5
540/540 - 4s - loss: 0.1826 - accuracy: 0.9472 - val_loss: 0.1697 - val_accuracy: 0.9492
Epoch 3/5
540/540 - 4s - loss: 0.1374 - accuracy: 0.9598 - val_loss: 0.1274 - val_accuracy: 0.9642
Epoch 4/5
540/540 - 4s - loss: 0.1118 - accuracy: 0.9661 - val_loss: 0.1071 - val_accuracy: 0.9680
Epoch 5/5
540/540 - 4s - loss: 0.0946 - accuracy: 0.9720 - val_loss: 0.1018 - val_accuracy: 0.9688


<keras.callbacks.History at 0x7fbd1c3c9550>

## Test the model

After training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has never seen before.


In [13]:
test_loss, test_accuracy = model.evaluate(test_data)



In [14]:
#Test loss and Test accuracy are printed
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.63%


Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly around 96%.

Each time the code is rerun, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, we have intentionally reached a suboptimal solution.