# Tutorial III: Handwritten digit recognition

<p>
Bern Winter School on Machine Learning, 2021<br>
Prepared by Mykhailo Vladymyrov.
</p>

This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

This is a supplementary material describing the fully-connected neural network for handwritten digit recognition using TensorFlow 2.

## 1. Load necessary libraries

In [None]:
colab = True # set to True is using google colab

In [None]:
if colab:
    %tensorflow_version 2.x

In [None]:
import sys

import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
import tensorflow as tf
import tensorflow.keras.datasets.mnist as mnist

# We'll tell matplotlib to inline any drawn figures like so:
#%matplotlib inline
#plt.style.use('ggplot')

from IPython.core.display import HTML
HTML("""<style> .rendered_html code { 
    padding: 2px 5px;
    color: #0000aa;
    background-color: #cccccc;
} </style>""")

%load_ext tensorboard

## 1. Load the data

First we will load the data: 60000 training images and 10000 images for validation. We will keep the images 2D and slatten them directly in the model.

We will as well keep the labels as class ID, intead of the one-hot encoding.

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train/255.0
x_test = x_test/255.0


print ('train: data shape', x_train.shape, 'label shape', y_train.shape)
print ('test: data shape', x_test.shape, 'label shape', y_test.shape)

## 2. Bulding a neural network

The following creates a 'model'. It is an object containing the ML model itself - a simple 3-layer fully connected neural network, optimization parameters, as well as tha interface for model training.

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(1500, activation='relu'),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.005) ,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Model summary provides information about the model's layers and trainable parameters

In [None]:
model.summary()

## 3. Model training

The `fit` function is the interface for model training. 
Here one can specify training and validation datasets, minibatch size, and the number of training epochs.

**Warining**: call to `model.fit` does NOT reinitialize trainable variables. Every time it continues from the previous state.

We will also save the state of the trainable variables after each epoch: 

In [None]:
%tensorboard --logdir logs

In [None]:
save_path = 'save/mnist_{epoch}.ckpt'
save_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_path, save_weights_only=True)

logdir="logs/"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

hist = model.fit(x=x_train, y=y_train,
                 epochs=25, batch_size=128, 
                 validation_data=(x_test, y_test),
                 callbacks=[save_callback, tensorboard_callback])

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(10,5))
axs[0].plot(hist.epoch, hist.history['loss'])
axs[0].plot(hist.epoch, hist.history['val_loss'])
axs[0].legend(('training loss', 'validation loss'), loc='lower right')
axs[1].plot(hist.epoch, hist.history['accuracy'])
axs[1].plot(hist.epoch, hist.history['val_accuracy'])

axs[1].legend(('training accuracy', 'validation accuracy'), loc='lower right')
plt.show()

Current model performance can be evaluated on a dataset:

In [None]:
model.evaluate(x_test,  y_test, verbose=2)

We cat test trained model on a image:

In [None]:
im_id = 0
y_pred = model(x_test[im_id:im_id+1])
print('true lablel: ', y_test[im_id], 'predicted: ', np.argmax(y_pred[0]) )
plt.imshow(x_test[im_id])

## 4. Loading trained model

In [None]:
model.load_weights('save/mnist_1.ckpt')
model.evaluate(x_test,  y_test, verbose=2)

model.load_weights('save/mnist_12.ckpt')
model.evaluate(x_test,  y_test, verbose=2)

model.load_weights('save/mnist_18.ckpt')
model.evaluate(x_test,  y_test, verbose=2)

## 5. Inspecting trained variables

We can obtain the trained variables from model layers:

In [None]:
l = model.get_layer(index=1)
w, b = l.weights

w = w.numpy()
b = b.numpy()
print(w.shape, b.shape)
w = w.reshape((28,28,-1)).transpose((2, 0, 1))

Let's visualize first 5:

In [None]:
n = 5
fig, axs = plt.subplots(1, n, figsize=(4.1*n,4))
for i, wi in enumerate(w[:5]):
  axs[i].imshow(wi, cmap='gray')

## 6. Inspecting gradients

We can also evaluate the gradients of each output with respect to an input:

In [None]:
idx = 111
inp_v = x_train[idx:idx+1]  # use some image to compute gradients with respect to

inp = tf.constant(inp_v)  # create tf constant tensor
with tf.GradientTape() as tape:  # gradient tape for gradint evaluation
  tape.watch(inp)  # take inp as variable
  preds = model(inp) # evaluate model output

grads = tape.jacobian(preds, inp)  # evaluate d preds[i] / d inp[j]
print(grads.shape, '<- (Batch_preds, preds[i], Batch_inp, inp[y], inp[x])')
grads = grads.numpy()[0,:,0]

In [None]:
print('prediction:', np.argmax(preds[0]))
fig, axs = plt.subplots(1, 11, figsize=(4.1*11,4))
axs[0].imshow(inp_v[0])
axs[0].set_title('raw')
vmin,vmax = grads.min(), grads.max()
for i, g in enumerate(grads):
  axs[i+1].imshow(g, cmap='gray', vmin=vmin, vmax=vmax)
  axs[i+1].set_title(r'$\frac{\partial\;P(digit\,%d)}{\partial\;input}$' % i, fontdict={'size':16})