## Import relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# import data provider module
import tensorflow_datasets as tfds

## Import MNIST dataset
The dataset is called MNIST and refers to handwritten digit recognition.

*   The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).
*   The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

In [2]:
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info
# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated

mnist_dataset, mnist_info = tfds.load(name="mnist", with_info=True, as_supervised=True)

In [3]:
def scale(image, label):
  """Function to normalize pixel values (scale between 0 and 1)"""
  image = tf.cast(image, tf.float32)
  image /= 255.
  return image, label

In [4]:
# define train and test sets
mnist_train, mnist_test = mnist_dataset["train"], mnist_dataset["test"]

# define size of validation set (10% of train set)
num_validation_samples = 0.1 * mnist_info.splits["train"].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = 0.1 * mnist_info.splits["test"].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

# normalize data
scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

# set a buffer size to allow shuffling on a large dataset
BUFFER_SIZE = 10000

# shuffle train data
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# define train and validation sets
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

BATCH_SIZE = 150

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model
### Outline the model

In [5]:
# define layer size
input_size = 784
output_size = 10
hidden_layer_size = 200

# define model
model = tf.keras.Sequential([
                             # convert image to vector
                             tf.keras.layers.Flatten(input_shape=(28,28,1)),
                             # define hidden layers
                             tf.keras.layers.Dense(hidden_layer_size, activation="relu"),
                             tf.keras.layers.Dense(hidden_layer_size, activation="relu"),
                             tf.keras.layers.Dense(hidden_layer_size, activation="relu"),
                             # define output layer
                             tf.keras.layers.Dense(output_size, activation="softmax")
])

### Choose the optimizer and the loss function

In [6]:
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

## Training

1.   At the beginning of each epoch, the training loss will be set to 0
2.   The algorithm will iterate over a preset number of batches, all from train_data
3.   The weights and biases will be updated as many times as there are batches
4.   We will get a value for the loss function, indicating how the training is going
5.   We will also see a training accuracy
6.   At the end of the epoch, the algorithm will forward propagate the whole validation set

*When we reach the maximum number of epochs, the training will be over

In [7]:
NUM_EPOCHS = 10

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/10
360/360 - 8s - loss: 0.2939 - accuracy: 0.9144 - val_loss: 0.1466 - val_accuracy: 0.9548
Epoch 2/10
360/360 - 4s - loss: 0.1106 - accuracy: 0.9668 - val_loss: 0.0922 - val_accuracy: 0.9727
Epoch 3/10
360/360 - 4s - loss: 0.0742 - accuracy: 0.9768 - val_loss: 0.0660 - val_accuracy: 0.9797
Epoch 4/10
360/360 - 4s - loss: 0.0526 - accuracy: 0.9834 - val_loss: 0.0627 - val_accuracy: 0.9823
Epoch 5/10
360/360 - 4s - loss: 0.0402 - accuracy: 0.9877 - val_loss: 0.0496 - val_accuracy: 0.9862
Epoch 6/10
360/360 - 4s - loss: 0.0311 - accuracy: 0.9902 - val_loss: 0.0531 - val_accuracy: 0.9853
Epoch 7/10
360/360 - 4s - loss: 0.0301 - accuracy: 0.9902 - val_loss: 0.0519 - val_accuracy: 0.9867
Epoch 8/10
360/360 - 4s - loss: 0.0227 - accuracy: 0.9926 - val_loss: 0.0293 - val_accuracy: 0.9925
Epoch 9/10
360/360 - 4s - loss: 0.0218 - accuracy: 0.9931 - val_loss: 0.0743 - val_accuracy: 0.9790
Epoch 10/10
360/360 - 4s - loss: 0.0185 - accuracy: 0.9938 - val_loss: 0.0331 - val_accuracy: 0.9917

<tensorflow.python.keras.callbacks.History at 0x7f0276b54510>

## Test the model
After training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

In [8]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.09. Test accuracy: 97.94%


We achevied an accuracy of 97.94% on the test set. We could try to play with the number of hidden layers and hyperparameters to see if the model can perform even better.