.

# E11-2 Deep Learning with TensorFlow 2 and Keras

Here the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) classification task is solved by use of [Keras](https://www.tensorflow.org/guide/keras/overview) in the following steps:

1. Build a neural network that classifies images.
2. Train this neural network.
3. And, finally, evaluate the accuracy of the model.

The solution require TensorFlow 2 and Keras installed in your environment.
See the [install guide](https://www.tensorflow.org/install) for details.

In [1]:
import tensorflow as tf

## Data Sets

In [3]:
# Load and prepare the datasets with Keras
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [4]:
# Normalize the integer numbers of grey color to floating-point numbers between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0

## Build ANN

Build the `tf.keras.Sequential` model by stacking layers. <br>
We choose one input, one output and two hidden layers.

In [5]:
# Create a model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "[logits](https://developers.google.com/machine-learning/glossary#logits)" or "[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)" scores, one for each class.<br>

__Logit__ is a vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class. __Log-odds__ serve the loss function.

In [7]:
predictions = model(x_train[:1]).numpy()
predictions

array([[-0.54169554,  0.22656852,  0.41457993, -0.20984833, -0.24008794,
         0.4404894 , -0.10653532,  0.48102096, -0.8223822 ,  0.1864722 ]],
      dtype=float32)

The array predictions shows a logit for each of the ten possible outputs. The `tf.nn.softmax` function converts these logits to "probabilities" for each class. 

In [8]:
# Calculate probability
tf.nn.softmax(predictions).numpy()

array([[0.05456621, 0.11764586, 0.1419806 , 0.07604019, 0.07377519,
        0.14570731, 0.08431629, 0.15173437, 0.04121195, 0.11302201]],
      dtype=float32)

Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to
provide an exact and numerically stable loss calculation for all models when using a softmax output. 

The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [9]:
#Loss function
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class:
It is zero if the model is sure of the correct class.
This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.log(1/10) ~= 2.3`.

In [10]:
loss_fn(y_train[:1], predictions).numpy()

1.9261553

In [11]:
# Build the model
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])

The `Model.fit` method adjusts the model parameters to minimize the loss: 

In [12]:
# Train the model
model.fit(x_train, y_train, epochs=20)

Train on 60000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fce2ae2f3d0>

The `Model.evaluate` method checks the models performance.

In [13]:
# Validate the accuracy
model.evaluate(x_test,  y_test, verbose=2)

10000/1 - 1s - loss: 0.0431 - accuracy: 0.9796


[0.08614668145755931, 0.9796]

The image classifier is now trained to ~98% accuracy on this dataset.

If you want your model to return also a probability estimation, you can wrap the trained model, and attach the softmax to it.

In [14]:
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

In [15]:
probability_model(x_test[:5])

<tf.Tensor: id=114433, shape=(5, 10), dtype=float32, numpy=
array([[1.4965229e-16, 2.3063259e-14, 1.1908144e-10, 3.6451417e-09,
        1.7619488e-17, 5.7447577e-14, 8.1771446e-19, 1.0000000e+00,
        9.1487245e-13, 1.3524924e-10],
       [2.2020433e-17, 4.7369336e-10, 1.0000000e+00, 8.6937998e-13,
        5.5708088e-31, 8.6283731e-15, 7.2754353e-17, 2.4672047e-26,
        2.1685915e-16, 5.4294788e-26],
       [1.6744691e-11, 9.9999118e-01, 8.5327963e-07, 1.7860832e-12,
        6.4013460e-07, 3.3877232e-08, 1.2655883e-08, 1.0709130e-06,
        6.1745973e-06, 2.4548085e-12],
       [1.0000000e+00, 1.1052904e-17, 2.5127034e-09, 3.0129411e-15,
        2.1920003e-12, 2.8606840e-13, 2.7656666e-09, 5.2499971e-12,
        4.4298791e-16, 1.5736802e-11],
       [3.2348982e-12, 9.0754761e-15, 6.3672450e-09, 6.0184185e-15,
        9.9924684e-01, 5.3711229e-13, 3.2400558e-11, 1.2167212e-07,
        5.0461866e-12, 7.5307209e-04]], dtype=float32)>

## Referrence
The TensorFlow Authors