# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition.

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [54]:
#pip install tensorflow

In [55]:
#pip install tensorflow-datasets

In [56]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module, therefore, if you haven't please install the package using
# pip install tensorflow-datasets 
# or
# conda install tensorflow-datasets

import tensorflow_datasets as tfds

# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

## Data

That's where we load and preprocess our data.

In [57]:
# remember the comment from above
# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

# tfds.load actually loads a dataset (or downloads and then loads if that's the first time you use it) 
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
# there are other arguments we can specify, which we can find useful
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated 

In [58]:
mnist_train,mnist_test = mnist_dataset["train"], mnist_dataset["test"]
# there is no validation dataset within mnist. its 60.000train and 10.000 test
# we get the validation from the train, 10% of it
#validation dataset is used to make sure our parameters, weights and biases do not overfit

num_validation_samples = 0.1 * mnist_info.splits["train"].num_examples #create validation dataset
num_validation_samples = tf.cast(num_validation_samples, tf.int64) #cast the value of samples into integer, prevent potential issues

num_test_samples = mnist_info.splits["test"].num_examples #store the test samples into a variable
num_test_samples = tf.cast(num_test_samples, tf.int64)

#create a function to scale the inputs

def scale(image,label):
    image = tf.cast(image,tf.float32)
    #mnist images contain color levels from 0 to 255
    image /=255. # dot at the end is becasue we want a float value
    return image, label

# the method .map() allows us to apply a custom transformation to a given dataset
# we have already decided that we will get the validation data from mnist_train
sclaed_train_and_validation_data = mnist_train.map(scale)

#scale the test data
test_data = mnist_test.map(scale)

#shuffling the data - the same information but in a different order
#its better the shuffle batch data, it needs to be randomly distributed

buffer_size = 10000 #tensorflow take samples of 10k

shuffled_train_and_validation_data = sclaed_train_and_validation_data.shuffle(buffer_size)

#create validation dataset
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

batch_size = 100

train_data = train_data.batch(batch_size)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

#validation data must have the same shape and properties as train/test data
#mnist is iterable
#iter creates an object which can be iterated on element at a time, like in a for loop
#next loads the next element of an iterable object
validation_inputs, validation_targets = next(iter(validation_data))



Outline the model

In [59]:
input_size = 784
output_size = 10 #one for every digit
hidden_layer_size = 200 #assume all of them same

#tf.keras.layers.Flatten(original shape) - transforms a tensor into a vector

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation="relu"),#calculates the dot product of inputs and the weights and adds the bias
    tf.keras.layers.Dense(hidden_layer_size, activation="relu"), # because we plan to have 2 hidden layers, we can stack more layers if we want
    tf.keras.layers.Dense(output_size, activation="softmax")
 ])

choosing the optimizer and the loss function

In [60]:
#specify optimizer and the loss with model.optimizer

model.compile(optimizer="adam",loss="sparse_categorical_crossentropy", metrics=["accuracy"])

Training the model

In [61]:
num_epochs = 5

model.fit(train_data,epochs=num_epochs,validation_data=(validation_inputs,validation_targets), verbose=2)
# verbose=2 generates only the most important info in each epoch

# 1-at the beginnig of each epoch, the training loss will be set to 0
# 2-the algorihm iterate over number of batches, all from train_data
# 3-the weights and biases will be updated as many times there are batch_size
# 4-we get a value for the loss func, indicates how the training is going
# 5-we also see accuarcy
# 6-at the end of each eopch, the algorithm forward propagate the whole validation set
# 7-when we reach max number of epochs, the training is over



Epoch 1/5
540/540 - 3s - loss: 0.2748 - accuracy: 0.9193 - val_loss: 0.1352 - val_accuracy: 0.9590 - 3s/epoch - 5ms/step
Epoch 2/5
540/540 - 2s - loss: 0.1057 - accuracy: 0.9687 - val_loss: 0.0812 - val_accuracy: 0.9757 - 2s/epoch - 4ms/step
Epoch 3/5
540/540 - 2s - loss: 0.0718 - accuracy: 0.9784 - val_loss: 0.0618 - val_accuracy: 0.9822 - 2s/epoch - 4ms/step
Epoch 4/5
540/540 - 3s - loss: 0.0527 - accuracy: 0.9834 - val_loss: 0.0471 - val_accuracy: 0.9860 - 3s/epoch - 5ms/step
Epoch 5/5
540/540 - 2s - loss: 0.0388 - accuracy: 0.9876 - val_loss: 0.0532 - val_accuracy: 0.9818 - 2s/epoch - 4ms/step


<keras.callbacks.History at 0x1530859c5b0>

test the model

In [67]:
#test the model on test dataset to get actual accuracy, the one we have above is on validation dataset

test_loss,test_accuracy = model.evaluate(test_data)
print("Test loss:{0:.2f} and Test accuracy: {1:.2f}%".format(test_loss,test_accuracy*100.))

Test loss:0.08 and Test accuracy: 97.61%


Latest Comments

In [None]:
# After we test the model, conceptually, we are no longer allowed to change it.
# If you start changing the model after this point, the test data will no longer be a data set the model has never seen.
# You would have feedback that it has around 97.5% accuracy.
# With this particular configuration, the main point of the test data set is to simulate model deployment.
# If we get 50% or 60% testing accuracy, we will know for sure that our model has overfit and it will fail miserably in real life.
# However, getting a value very close to the validation accuracy shows that we have not overfitted.
# Finally, the test accuracy is the accuracy we expect to observe if we deploy the model in the real world.