# Deep Neural Network for MNIST Classification 

This problem is referred to as the "Hello World" of deep learning. The dataset is called MNIST and refers to handwritten digit recognition.

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (from 0 to 9), this is a classification problem with 10 classes.

 We'll build a neural network of 2 hidden layers.

## Import Python Libraries

In [1]:
import numpy as np
import tensorflow as tf

# tensorflow-datasets provides lots of datasets ready for modeling
import tensorflow_datasets as tfds

## Data Preprocessing 

### Load data

1. First time the load function will download the dataset
2. Subsequent calls will reuse the downloaded data

In [2]:
# with_info=True, it will return a tuple containing the dataset and info associated with the dataset
# as_supervised=True, the returned dataset will contain (input, target)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

In [3]:
mnist_dataset

{'test': <PrefetchDataset shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>,
 'train': <PrefetchDataset shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>}

In [4]:
mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    version=3.0.1,
    description='The MNIST database of handwritten digits.',
    homepage='http://yann.lecun.com/exdb/mnist/',
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=70000,
    splits={
        'test': 10000,
        'train': 60000,
    },
    supervised_keys=('image', 'label'),
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
    redistribution_info=,
)

### Split dataset to training and test datasets 

In [5]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']


# use 10% of training data as validation data
# tf.cast can casts a tensor to a new type
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

### Scale the data 

In [7]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

# dataset.map function applies a custom function to a given dataset
scaled_train_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

### Shuffle the training data and split validation data from it

In [8]:
# buffer size defines number of samples for each shuffle
buffer_size = 10000

shuffled_train_validation_data = scaled_train_validation_data.shuffle(buffer_size)

# dataset.take(N) function extracts first N samples from the dataset
# dataset.skip(N) function extracts all samples except first N samples from the dataset
validation_data = shuffled_train_validation_data.take(num_validation_samples)
train_data = shuffled_train_validation_data.skip(num_validation_samples)

### Batching 

1. Batching is used in SGD where in each epoch, updating the weights as many times as number of batches
2. Validation data is not required for batching as it's only used in forward propagation

In [9]:
# batch size defines how many samples a model should take in each batch
batch_size = 100

# here we also batch validation data and test data because model expects them in batch form too
train_data = train_data.batch(batch_size)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

### Create inputs and targets for validation

In [10]:
# split validation data to inputs and targets as they are needed in fitting the model
validation_inputs, validation_targets = next(iter(validation_data))

## Model 

### Outline the model 

In [16]:
# define hypeparameters
input_size = 784
output_size = 10
hidden_layer_size = 50

model = tf.keras.Sequential([
                            # tf.keras.layers.Flatten(original shape) transforms a tensor into a vector
                            # input layer
                            tf.keras.layers.Flatten(input_shape=(28,28,1)),
                            # tf.keras.layers.Dense(output size) takes inputs and calculates (xw + b)
                            # and can also apply activation function
                            # hiddent layers
                            tf.keras.layers.Dense(units=hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(units=hidden_layer_size, activation='relu'),
                            # output layer
                            tf.keras.layers.Dense(units=output_size, activation='softmax')
                            ])

### Choose objection function and optimizer 

In [17]:
# adam combines momentum and AdaGrad
# sparse_categorical_crossentropy applies one-hot encoding to the targets so target becomes a vector instead of scalar
# we can add metric that we want to compute throughout the training

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Train the model 

In [18]:
num_epochs = 10

# we can feed a 2-tuple object or two arrays, train_data is a tuple containing both inputs and targets
model.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs,validation_targets),validation_steps=1,verbose=2)

Epoch 1/10
540/540 - 7s - loss: 0.4250 - accuracy: 0.8794 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
540/540 - 5s - loss: 0.1744 - accuracy: 0.9490 - val_loss: 0.1548 - val_accuracy: 0.9572
Epoch 3/10
540/540 - 6s - loss: 0.1343 - accuracy: 0.9600 - val_loss: 0.1232 - val_accuracy: 0.9643
Epoch 4/10
540/540 - 6s - loss: 0.1071 - accuracy: 0.9686 - val_loss: 0.1095 - val_accuracy: 0.9688
Epoch 5/10
540/540 - 6s - loss: 0.0919 - accuracy: 0.9728 - val_loss: 0.0970 - val_accuracy: 0.9717
Epoch 6/10
540/540 - 6s - loss: 0.0801 - accuracy: 0.9754 - val_loss: 0.0862 - val_accuracy: 0.9763
Epoch 7/10
540/540 - 6s - loss: 0.0711 - accuracy: 0.9787 - val_loss: 0.0877 - val_accuracy: 0.9740
Epoch 8/10
540/540 - 6s - loss: 0.0635 - accuracy: 0.9809 - val_loss: 0.0659 - val_accuracy: 0.9798
Epoch 9/10
540/540 - 6s - loss: 0.0575 - accuracy: 0.9826 - val_loss: 0.0652 - val_accuracy: 0.9808
Epoch 10/10
540/540 - 5s - loss: 0.0506 - accuracy: 0.9840 - val_loss: 0.0623 - val_accuracy

<tensorflow.python.keras.callbacks.History at 0x15b4d19bcc8>

Key information:
1. 540 is number of batches per epoch
2. Accuracy measures % of outputs matching with targets
3. Train accuracy is the average accuracy of all batches in an epoch, validation accuracy is the true accuracy of whole validation datasets for each epoch

### Test the model 

In [19]:
test_loss, test_accuracy = model.evaluate(test_data)



In [20]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.10. Test accuracy: 97.08%
