# Deep Neural Network for MNIST Classification


The dataset is called MNIST and refers to handwritten digit recognition. 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 


## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST.

import tensorflow_datasets as tfds

## Data

In [2]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

# with_info=True will also provide us with a tuple containing information about the version, features, number of samples

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

# by default, TF has training and testing datasets, but no validation sets
# defining the number of validation samples as a % of the train samples

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples

# cast this number to an integer, as a float may cause an error along the way

num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples

# once more, we'd prefer an integer (rather than the default float)

num_test_samples = tf.cast(num_test_samples, tf.int64)


# scale our data in some way to make the result more numerically stable

def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)


# shuffle the data

BUFFER_SIZE = 10000

# TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /Users/hridayashinde/tensorflow_datasets/mnist/3.0.1...[0m


Dl Completed...:   0%|          | 0/4 [00:00<?, ? file/s]

[1mDataset mnist downloaded and prepared to /Users/hridayashinde/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


## Model

### Outline the model

In [3]:
input_size = 784
output_size = 10
hidden_layer_size = 50
    

model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # method 'Flatten' that simply takes 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [4]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

In [5]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 4s - loss: 0.4076 - accuracy: 0.8840 - val_loss: 0.1958 - val_accuracy: 0.9418
Epoch 2/5
540/540 - 2s - loss: 0.1767 - accuracy: 0.9493 - val_loss: 0.1421 - val_accuracy: 0.9563
Epoch 3/5
540/540 - 2s - loss: 0.1356 - accuracy: 0.9610 - val_loss: 0.1108 - val_accuracy: 0.9673
Epoch 4/5
540/540 - 2s - loss: 0.1131 - accuracy: 0.9669 - val_loss: 0.1104 - val_accuracy: 0.9695
Epoch 5/5
540/540 - 2s - loss: 0.0971 - accuracy: 0.9717 - val_loss: 0.0943 - val_accuracy: 0.9723


<keras.callbacks.History at 0x7f94badead00>

## Test the model

In [6]:
test_loss, test_accuracy = model.evaluate(test_data)



In [7]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.54%
