## Deep Neural Network for MNIST Classification
We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.
The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 
The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

<img src="mnist_example1.png">

In [1]:
#

<img src="mnist_example2.png">

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

from tensorflow import keras
from tensorflow.keras import layers

### Data
https://www.tensorflow.org/guide/keras/train_and_evaluate

In [2]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Model

### Outline the model

<img src="mnist_example3.png">

In [4]:
## We need to choose the hyperparameters: width abd depth
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

### Specify the optimizer, the loss function, metrics

In [5]:
model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              # List of metrics to monitor
              metrics=['sparse_categorical_accuracy'])

In [None]:
# For later reuse, let's put our model definition and compile step in functions; we will 
# call them several times across different examples in this guide.
def get_uncompiled_model():
      inputs = keras.Input(shape=(784,), name='digits')
      x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
      x = layers.Dense(64, activation='relu', name='dense_2')(x)
      outputs = layers.Dense(10, name='predictions')(x)
      model = keras.Model(inputs=inputs, outputs=outputs)
      return model

def get_compiled_model():
      model = get_uncompiled_model()
      model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
                    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                    metrics=['sparse_categorical_accuracy'])
      return model

### Training
Train the model by slicing the data into "batches" of size "batch_size", and repeatedly iterating over the entire dataset for a given number of "epochs"

In [7]:
NUM_EPOCHS = 5
BATCH_SIZE = 64

print('# Fit model on training data')
history = model.fit(x_train, y_train,
                    batch_size=BATCH_SIZE,
                    epochs=NUM_EPOCHS,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)


# Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

history dict: {'loss': [0.06996830299705267, 0.06113493979990482, 0.053630697073340415, 0.04642009671598673, 0.04130049764379859], 'sparse_categorical_accuracy': [0.97892, 0.9813, 0.9838, 0.98606, 0.98752], 'val_loss': [0.10111498665576801, 0.10168680063411593, 0.1211118485538289, 0.11101543993377126, 0.10816750014146091], 'val_sparse_categorical_accuracy': [0.9715, 0.9721, 0.9684, 0.9713, 0.9719]}


In [11]:
# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
test_loss, test_accuracy = model.evaluate(x_test, y_test, batch_size=128)


# Evaluate on test data


In [12]:
print('test loss: {0:.2f}. test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100))

test loss: 0.09. test accuracy: 97.42%


In [13]:
# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)


# Generate predictions for 3 samples
predictions shape: (3, 10)


In [15]:
predictions

array([[-13.214791  , -13.512929  ,  -6.2735662 ,  -0.5937292 ,
        -21.578306  , -11.733843  , -40.197746  ,   8.087853  ,
        -10.386228  ,  -6.7207737 ],
       [-29.960686  ,  -0.47871104,   7.378034  , -14.287243  ,
        -52.775364  ,  -7.1917696 , -15.551835  , -20.008337  ,
        -15.764638  , -39.912136  ],
       [-12.649894  ,   3.7511764 ,  -7.483942  , -12.724268  ,
         -9.072611  ,  -9.280073  ,  -8.545737  ,  -4.4279494 ,
         -6.9926987 , -12.505471  ]], dtype=float32)

In [17]:
y_test[:3]

array([7., 2., 1.], dtype=float32)