## Importing lybraries

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [2]:
"""
Here I'll just put all the hyperparameters so when I'm tweeking the model, it is easier.
"""

BUFFER_SIZE = 10000
BATCH_SIZE = 200
hidden_layer_size = 600
NUM_EPOCHS = 10

In [3]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True, shuffle_files=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

As the mnist dataset does not have a specifically separated validation dataset, we have to do by ourselves. Let's start with selecting the number of samples for validation and test.

In [4]:
# I separate 10% od the training dataset to be validation samples
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

We want to scale the data so all input values are between 0 and 1, so we will define a function

In [5]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)

We want to shuffle the data so when we batch it, there is no pattern that will harm our model

In [6]:
"""This buffer size parameter is used in cases when we are dealing with enormous datasets.
In such cases we can't shuffle the whole data set in one go because we can't possibly fit it all in the memory of the computer.
So instead we must instruct tensor flow to take samples ten thousand at a time shuffle them and then 
take the next ten thousand. Note that if the buffer size is equal to 1, no shuffling will actually happen.
So if the buffer size is equal or bigger than the total number of samples shuffling will take place at
once and shuffle them uniformly."""

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

In [7]:
"""Now we can extract the train and validation data"""

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [8]:
"""Now we want to set the batches. We must set a batch size and prepare our data for batching.
Just a quick memory refresh, a batch size of one is the stochastic gradient descent, while a batch size equal 
to the number of samples equals the gradient descent we've seen until now.

So we want a number relatively small with regard to the data set but reasonably high.
So it would allow us to preserve the underlying dependencies"""

train_data = train_data.batch(BATCH_SIZE)

"""Well since we won't be backpropagating on the validation data but only forward propagating we don't really need
to batch it. Remember that batching was useful in updating weights. However the model expects our validation set in the batch form 
too. That's why we should overwrite validation data with:"""

validation_data = validation_data.batch(num_validation_samples)

"""As the test data we also just want to forward propagate, we only need to make into a batch with size num_test_samples"""
test_data = test_data.batch(num_test_samples)


validation_inputs, validation_targets = next(iter(validation_data))

### Outlining the model

In [9]:
input_size = 784 # 28x28 pixels
output_size = 10 # due to the 10 digits

model = tf.keras.Sequential([
    
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

Now we have to choose the optimizer and the loss function

In [10]:
""" 
There are several different optimizers, loss functions and metrics. One should be VERY careful when choosing them.
"""
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])

### Training

In [19]:
VALIDATION_STEPS = num_validation_samples

"""
early_stopping will check when the validation loss will increase and stop it. patience is a argument that will allow to skip the 
first 'n' epochs that has an increase in val_loss.
"""
early_stop = tf.keras.callbacks.EarlyStopping(patience=1)

model.fit(train_data,
          epochs = NUM_EPOCHS,
          callbacks = [early_stop],
          validation_data=(validation_inputs, validation_targets), 
          validation_steps=VALIDATION_STEPS,
          verbose =2)

Epoch 1/10
270/270 - 22s - loss: 0.0163 - accuracy: 0.9948 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
270/270 - 21s - loss: 0.0095 - accuracy: 0.9969 - val_loss: 0.0089 - val_accuracy: 0.9977
Epoch 3/10
270/270 - 20s - loss: 0.0102 - accuracy: 0.9964 - val_loss: 0.0098 - val_accuracy: 0.9962


<tensorflow.python.keras.callbacks.History at 0x19b896cf908>

### Testing

In [20]:
test_loss, test_accuracy = model.evaluate(test_data)

print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.08. Test accuracy: 98.11%
