# Deep Neural Network for MNIST Classification

This simple example demonstrates how to plug TensorFlow Datasets (TFDS) into a Keras model.

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

This tutorial is largely based on https://www.tensorflow.org/datasets/keras_example. 

## Import the relevant packages

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

## Step 1: Data

### Load a dataset

Load the MNIST dataset with the following arguments:

* `shuffle_files=True`: The MNIST data is only stored in a single file, but for larger datasets with multiple files on disk, it's good practice to shuffle them when training.
* `as_supervised=True`: Returns a tuple `(img, label)` instead of a dictionary `{'image': img, 'label': label}`.

In [None]:
# see https://www.tensorflow.org/datasets/catalog/mnist for details
ds, ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

In [None]:
(ds_train, ds_test) = ds

### Build a training pipeline

Apply the following transformations:

* `tf.data.Dataset.map`: TFDS provide images of type `tf.uint8`, while the model expects `tf.float32`. Therefore, you need to normalize images.
* `tf.data.Dataset.cache` As you fit the dataset in memory, cache it before shuffling for a better performance.<br/>
__Note:__ Random transformations should be applied after caching.
* `tf.data.Dataset.shuffle`: For true randomness, set the shuffle buffer to the full dataset size.<br/>
__Note:__ For large datasets that can't fit in memory, use `buffer_size=1000` if your system allows it.
* `tf.data.Dataset.batch`: Batch elements of the dataset after shuffling to get unique batches at each epoch.
* `tf.data.Dataset.prefetch`: It is good practice to end the pipeline by prefetching [for performance](https://www.tensorflow.org/guide/data_performance#prefetching).

In [None]:
BATCH_SIZE = 100

In [None]:
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(BATCH_SIZE)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

### Build an evaluation pipeline

Your testing pipeline is similar to the training pipeline with small differences:

 * You don't need to call `tf.data.Dataset.shuffle`.
 * Caching is done after batching because batches can be the same between epochs.

In [None]:
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(BATCH_SIZE)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

## Step 2: Create and train the model

Plug the TFDS input pipeline into a simple Keras model, compile the model, and train it.

In [None]:
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50

output_size = 10

model = tf.keras.models.Sequential([
  # the first layer (the input layer)
  # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
  # since we don't know CNNs yet, we must flatten the images
  # there is a convenient method 'Flatten' 
  # it takes our 28x28x1 tensor and orders it into a (28x28x1,) = (784,) vector
  # this allows us to actually create a feed forward neural network
  tf.keras.layers.Flatten(input_shape=(28, 28)),

  # tf.keras.layers.Dense is basically implementing: 
  # output = activation(dot(input, weight) + bias)
  # most important arguments are the hidden_layer_size and the activation function
  tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
  tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

  # the final layer is no different, we just make sure to activate it with softmax
  tf.keras.layers.Dense(output_size, activation='softmax')
])

# print a summary of the model to
# review how the network configuration and shape of the training data affect the nhe number of trainable parameters
model.summary()

In [None]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
)

In [None]:
NUM_EPOCHS = 5

In [None]:
model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

## Test the model

In [None]:
test_loss, test_accuracy = model.evaluate(ds_test)

In [None]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly around 97%. Each time the code is rerun, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, we have intentionally reached a suboptimal solution:
- Try to optimize the NN with different hyperparameters (width, depth, etc.)