#### MNIST dataset

Creator of dataset - http://yann.lecun.com/

http://yann.lecun.com/exdb/mnist/ <br>
https://en.wikipedia.org/wiki/MNIST_database

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects the math digits. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

I will build a neural network with 2 hidden layers.

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

#### Data Handling

In [2]:
#stored in C:\Users\*USERNAME*\tensorflow_datasets\...
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

# tfds.load(name, as_supervised=True): loads a dataset in a 2-tuple structure [input, target]
# supervised=False, would return a dictionary



In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

In [4]:
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples

6000.0

In [5]:
# for safety purposes let me cast it into integer
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

In [6]:
num_test_samples = mnist_info.splits['test'].num_examples
print(num_test_samples)
num_test_samples = tf.cast(num_test_samples, tf.int64)
num_test_samples

10000


<tf.Tensor: shape=(), dtype=int64, numpy=10000>

In [7]:
# scale our data in some way to make the result more numerically stable
# in this case we will simply prefer to have inputs between 0 and 1

In [8]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 shades of grey)
    # if we divide each element by 255, we would get the desired result -> all elements will be between 0 and 1 
    image /= 255.
    return image, label

In [9]:
# dataset.map(): applies a custom transformation to a given dataset.
# It takes as an input a function which determines a transformation.
scaled_train_and_validation_data = mnist_train.map(scale)

In [10]:
test_data = mnist_test.map(scale)

In [11]:
# When we are dealing with enormous datasets, we can't shuffle all the data at once.
# Hence, we have to create a buffer size/ batch size
BUFFER_SIZE = 10000
# luckily for us, there is a shuffle method readily available and we just need to specify the buffer size
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

In [12]:
# once we have scaled and shuffled the data, we can proceed to actually extracting the train and validation
# we use the .take() method to take that many samples
# finally, we create a batch with a batch size equal to the total number of validation samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

In [13]:
# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [14]:
BATCH_SIZE = 100
# dataset.batch(batch_size): method to combine a consecutive elements of a dataset into batches.
train_data = train_data.batch(BATCH_SIZE)
# Since we are not backpropagating on validation data, but only forward propagating, we don't really need the batch.
# The model expects the validation dataset in batch form too.
# Hence, I'm overriding the validation_data
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

In [15]:
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))
# iter(): creates an object which can be iterated one element at a time (eg. in for loop)

In [16]:
# Revise TF2 (previous ipynb file)

#### Model

In [17]:
# Don't know about the hyperparameters (height & width) of the problem yet

In [18]:
input_size = 784
output_size = 10
hidden_layer_size = 50
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

#### Optimization and loss function

In [19]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

#### Training

In [20]:
NUM_EPOCHS = 5
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), validation_steps=1, verbose =2)

Epoch 1/5
540/540 - 4s - loss: 0.4172 - accuracy: 0.8791 - val_loss: 0.2188 - val_accuracy: 0.9367
Epoch 2/5
540/540 - 3s - loss: 0.1869 - accuracy: 0.9447 - val_loss: 0.1762 - val_accuracy: 0.9455
Epoch 3/5
540/540 - 3s - loss: 0.1419 - accuracy: 0.9570 - val_loss: 0.1369 - val_accuracy: 0.9595
Epoch 4/5
540/540 - 3s - loss: 0.1167 - accuracy: 0.9650 - val_loss: 0.1226 - val_accuracy: 0.9653
Epoch 5/5
540/540 - 3s - loss: 0.1013 - accuracy: 0.9689 - val_loss: 0.1031 - val_accuracy: 0.9710


<tensorflow.python.keras.callbacks.History at 0x198748bc4c8>

What happens inside an epoch?
1. At the beginning of each epoch, the training loss will be set to zero
2. The algorithm will iterate over a preset number of batches, all from train_data
3. The weights and biases are updated as many times as there are batches
4. We will get a value from a loss function, indicating how training is going.
5. We will see training accuracy
6. At the end of the epoch, the algorithm will forward propogate the whole validation set

#### Testing

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.
If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [21]:
test_loss, test_accuracy = model.evaluate(test_data)

      1/Unknown - 1s 819ms/step - loss: 0.1175 - accuracy: 0.9645

In [22]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.12. Test accuracy: 96.45%


Each time the code is rerun, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.
Finally, we have intentionally reached a suboptimal solution, so you can have space to build on it.