# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. It provides 70.000 images (28x28 pixels) of handwritten digits (1 digit per image)

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0,1,2,3,4,5,6,7,8,9), this is a classification problem with 10 classes.

OUr goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

  from .autonotebook import tqdm as notebook_tqdm


## Data

In [2]:
mnist_dataset,mnist_info=tfds.load(name='mnist', with_info=True, as_supervised=True)



In [3]:
mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test'] #this dataset doesn´t have test dataset, so we will 
#take some from the test

In [4]:
num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples,tf.int64)  #cast the value of stored in the number of validation 
#samples variable to an integer, preventing any potential issues.

num_test_samples=mnist_info.splits['test'].num_examples # store the number of test samples in a dedicated variable
num_test_samples=tf.cast(num_test_samples, tf.int64)

In [5]:
#we'd like to scale our data in some way to make the result more numerically stable (e.g. inputs between 0 and 1). 
#it'll take an MNIST image and its label.
def scale(image,label):
 image=tf.cast(image, tf.float32)
 image/=255.  #images contains integers between 0 and 255. if we divide for 255, floats between 0 and 1
 return image,label
#there is a tensorflow method called 'map', which allows to apply a custom transformation to a given dataset.

In [6]:
scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)

shuflle the data and then create a validation dataset.

shuffling = keeping the same information but in a different order

If buffer_size=1, no shuffling will actually happen.

If buffer_size >= num_samples, shuffling will happen at once (uniformly)

If 1<buffer_size<num_samples, we will be optmizing the computational power.

In [7]:
BUFFER_SIZE=10000  # Used when dealing with enormous datasets, because you can´t shuffle all data at once
shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)

we will use mini-batch GD to train our model. It is the most efficient way to perform deep learning. For that, we must use a batch size:

batch size = 1 = Stochastic gradient descent (SGD)

batch size = # samples = (single batch) GD

1 < batch size < # samples = mini-batch GD

In [8]:
BATCH_SIZE=100
train_data=train_data.batch(BATCH_SIZE)#we overwrite it as there is no need to preserve a version this data that is not batched.
# since we won´t be back propagating on the validation data, but only forward propagating, we don´t need to batch it. 
# When batching we usually find the average loss and average accuracy. During validation and testing we want the exact values. 
#Therefore, we should take all the data at once. 

# The model expects our validation set in batch form too, that's why we should overwrite:
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs,validation_targets=next(iter(validation_data))

## Model

### Outline the model

There are 784 inputs = input layer.
10 outputs nodes = output layer.
We will work woth 2 hidden layers, consisting of 50 nodes each.
I don´t know the optimal width and depth for this problem, but I surely know tese values are suboptimal.

In [31]:
input_size=784
output_size=10
hidden_layer_size=100

#Our data (from tfds) is such that each input is 28x28x1. We need to flat the images into vectors.
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size,activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size,activation='relu'),
    tf.keras.layers.Dense(output_size,activation='softmax')])

### Choose the optimizer and the loss function

In [32]:
from tensorflow.keras.optimizers import Adam 
model.compile(loss='sparse_categorical_crossentropy',metrics=['accuracy'])#,optmizer= 'adam')

### Training

What happens inside an epoch
1. At the beginning of each epoch, the training loss will be set to 0
2. The algorithm will iterate over a preset number of batches, all from train_data
3. The weights and biases will be updated as many times as there are batches
4. We will get a value for the loss function, indicating how the training is going.
5. We will also see a training accuracy
6. At the end of the epoch, the algorithm will forward propagate the whole validation set.

When we reach the maximum number of epochs the training will be over. 

In [33]:
NUM_EPOCHS = 5
model.fit(train_data,epochs=NUM_EPOCHS, validation_data=(validation_inputs,validation_targets),verbose=2)

Epoch 1/5
540/540 - 7s - loss: 0.3127 - accuracy: 0.9104 - val_loss: 0.1796 - val_accuracy: 0.9475 - 7s/epoch - 13ms/step
Epoch 2/5
540/540 - 7s - loss: 0.1344 - accuracy: 0.9599 - val_loss: 0.1180 - val_accuracy: 0.9663 - 7s/epoch - 13ms/step
Epoch 3/5
540/540 - 8s - loss: 0.0961 - accuracy: 0.9710 - val_loss: 0.1065 - val_accuracy: 0.9705 - 8s/epoch - 15ms/step
Epoch 4/5
540/540 - 7s - loss: 0.0751 - accuracy: 0.9767 - val_loss: 0.0890 - val_accuracy: 0.9742 - 7s/epoch - 14ms/step
Epoch 5/5
540/540 - 7s - loss: 0.0626 - accuracy: 0.9809 - val_loss: 0.0748 - val_accuracy: 0.9802 - 7s/epoch - 13ms/step


<keras.callbacks.History at 0x11171ac8>

Loss should be decresing within epoch.

The accuracy show in what % of the cases our outputs were equal to the targets.

We usually keep an eye on the validation loss (or set early stopping mechanisms) to determine whether the model is overfitting.
val_accuracy of the last epoch = true accuracy of the model.

## Test the model

In [34]:
test_loss,test_accuracy=model.evaluate(test_data)



In [35]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss,test_accuracy*100))

Test loss: 0.09. Test accuracy: 97.33%


After we test the model, conceptually we are no longer allowed to change it.

Getting a test accuracy very close to the validation accuracy shows that we have not overfit.