# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets

## Data

That's where we load and preprocess our data.

In [2]:
mnist_dataset, mnist_info = tensorflow_datasets.load(name='mnist', with_info=True, as_supervised=True)

## Preprocessing

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

n_validation_samples = tf.cast(0.1 * mnist_info.splits['train'].num_examples, tf.int64)
n_test_samples = tf.cast(mnist_info.splits['test'].num_examples, tf.int64)

# Scale the image values from 0-255 to 0-1 taking advantage of the map function
def scale(img, label):
    img = tf.cast(img, tf.float32)
    img /= 255.
    return img, label
scaled_train_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)

# Shuffle the data
BUFFER_SZ = 10000
shuffled_train_data = scaled_train_data.shuffle(BUFFER_SZ)
validation_data = shuffled_train_data.take(n_validation_samples)
train_data = shuffled_train_data.skip(n_validation_samples)

# Separate into batches to use Mini-batch Gradient Descent
BATCH_SIZE = 100
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(n_validation_samples)
test_data = scaled_test_data.batch(n_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model

Outline

In [4]:
input_sz = 28*28
output_sz = 10
hidden_layer_sz = 100

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    tf.keras.layers.Dense(hidden_layer_sz, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_sz, activation='relu'),
    tf.keras.layers.Dense(output_sz, activation='softmax')
])

Optimizer and Loss function

In [5]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Training

In [6]:
N_EPOCHS = 5

model.fit(train_data, epochs = N_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5


540/540 - 15s - loss: 0.3273 - accuracy: 0.9067 - val_loss: 0.1571 - val_accuracy: 0.9550 - 15s/epoch - 28ms/step
Epoch 2/5
540/540 - 7s - loss: 0.1365 - accuracy: 0.9598 - val_loss: 0.1041 - val_accuracy: 0.9718 - 7s/epoch - 13ms/step
Epoch 3/5
540/540 - 7s - loss: 0.0962 - accuracy: 0.9703 - val_loss: 0.0883 - val_accuracy: 0.9752 - 7s/epoch - 13ms/step
Epoch 4/5
540/540 - 8s - loss: 0.0745 - accuracy: 0.9777 - val_loss: 0.0669 - val_accuracy: 0.9812 - 8s/epoch - 14ms/step
Epoch 5/5
540/540 - 7s - loss: 0.0591 - accuracy: 0.9816 - val_loss: 0.0636 - val_accuracy: 0.9817 - 7s/epoch - 13ms/step


<keras.src.callbacks.History at 0x1741bf370d0>

Testing

In [7]:
test_loss, test_accuracy = model.evaluate(test_data)

print(f'Test loss: {test_loss:.2f}\nTest accuracy: {test_accuracy*100:.2f}%')

Test loss: 0.09
Test accuracy: 97.40%
