# HOML Chapter 12 Exercise 13

## Exercise: Train a model using a custom training loop to tackle the Fashion MNIST dataset.



*a. Display the epoch, iteration, mean training loss, and mean accuracy over each epoch (updated at each iteration), as well as the validation loss and accuracy at the end of each epoch.*

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

In [None]:
# Random seeds from both Numpy and Tensorflow
from numpy.random import seed
seed(999)
tf.random.set_seed(999)   

Let's begin with importing the Fashion MNIST dataset and dividing it into training, validation, and test sets. We'll also normalize them by dividing by 255.

In [None]:
# Import the Fashion MNIST dataset and split it. 
(X_train_all, y_train_all), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

In [None]:
# Normalize images while splitting
(X_val, X_train) = X_train_all[5000:]/255., X_train_all[:5000]/255.  
(y_val, y_train) = y_train_all[5000:], y_train_all[:5000]
X_test = X_test/255. 

We'll build a basic model with a flattening layer, two dense ReLu layers,  and a softmax ouput layer. 

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation="softmax"))

We're asked to develop models with two different optimizers and learning rates. For the first model, we'll choose Nadam with a learning rate of 0.005. There's no real reason for this other than to vary a bit from the author's choice, who used the Nadam optimizer with a 0.01 learning rate. 

In [None]:
# Define epoch, batch size, optimizer, and loss.
n_epochs = 10
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.SGD(lr=0.005)
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

In order to "display the epoch, iteration, mean training loss, and mean accuracy over each epoch (updated at each iteration), as well as the validation loss and accuracy at the end of each epoch", we need to create a custome training loop.

 Tqdm allows for the creation of a progress bar. 

In [None]:
from tqdm.notebook import trange
from collections import OrderedDict

We'll create a function that samples batches of instances from the dataset. 

In [None]:
# Randomly sample batches of a select size from the dataset.
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

We'll also setup a status bar function that allows us to view the progress of each epoch. 

In [None]:
# Progress bar function for the format of the status bar 
def progress_bar(iteration, total, size=30):
    running = iteration < total
    c = ">" if running else "="
    p = (size - 1) * iteration // total
    fmt = "{{:-{}d}}/{{}} [{{}}]".format(len(str(total)))
    params = [iteration, total, "=" * p + c + "." * (size - p - 1)]
    return fmt.format(*params)

In [None]:
# Function to print the status bar
def print_status_bar(iteration, total, loss, metrics=None, size=30):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                         for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{} - {}".format(progress_bar(iteration, total), metrics), end=end)

In [None]:
# Define the mean loss and mean square loss
import time

mean_loss = keras.metrics.Mean(name="loss")
mean_square = keras.metrics.Mean(name="mean_square")
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)




We'll now create the custom training loop taking some cues from the author, reusing his code.

From the book:

"• We create two nested loops: one for the epochs, the other for the batches within an epoch.

• Then we sample a random batch from the training set.

• Inside the tf.GradientTape() block, we make a prediction for one batch (using the model as a function), and we compute the loss: it is equal to the main loss plus the other losses (in this model, there is one regularization loss per layer). Since the mean_squared_error() function returns one loss per instance, we compute the mean over the batch using tf.reduce_mean() (if you wanted to apply different weights to each instance, this is where you would do it). The regu‐
larization losses are already reduced to a single scalar each, so we just need to sum them (using tf.add_n(), which sums multiple tensors of the same shape and data type).

• Next, we ask the tape to compute the gradient of the loss with regard to each trainable variable (not all variables!), and we apply them to the optimizer to perform a Gradient Descent step.

• Then we update the mean loss and the metrics (over the current epoch), and we display the status bar.
The truth is we did not process every single instance in the training set, because we sampled instances randomly: some were processed more than once, while others were not processed at all. Likewise, if the training
set size is not a multiple of the batch size, we will miss a few instances. In practice that’s fine. With the exception of optimizers, as very few people ever customize these; see the “Custom Optimizers” sec‐
tion in the notebook for an example.

• At the end of each epoch, we display the status bar again to make it look complete and to print a line feed, and we reset the states of the mean loss and the
metrics." (pages 404-405)


In [None]:
# Custom loop
with trange(1, n_epochs + 1, desc="All epochs") as epochs:
    for epoch in epochs:
        with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
            for step in steps:
                X_batch, y_batch = random_batch(X_train, y_train)
                with tf.GradientTape() as tape:
                    y_pred = model(X_batch)
                    main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                    loss = tf.add_n([main_loss] + model.losses)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                for variable in model.variables:
                    if variable.constraint is not None:
                        variable.assign(variable.constraint(variable))                    
                status = OrderedDict()
                mean_loss(loss)
                status["loss"] = mean_loss.result().numpy()
                for metric in metrics:
                    metric(y_batch, y_pred)
                    status[metric.name] = metric.result().numpy()
                steps.set_postfix(status)
            y_pred = model(X_val)
            status["val_loss"] = np.mean(loss_fn(y_val, y_pred))
            status["val_accuracy"] = np.mean(keras.metrics.sparse_categorical_accuracy(
                tf.constant(y_val, dtype=np.float32), y_pred))
            steps.set_postfix(status)
        for metric in [mean_loss] + metrics:
            metric.reset_states()

HBox(children=(FloatProgress(value=0.0, description='All epochs', max=10.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 1/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 2/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 3/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 4/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 5/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 6/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 7/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 8/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 9/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 10/10', max=156.0, style=ProgressStyle(description_…





Through the ten epochs, the both the training and validation loss decreased and the accuracy for both sets continued to increase. 

*b. Try using a different optimizer with a different learning rate for the upper layers and the lower layers.*

We'll now split the model into upper and lower layers, keeping the SGD optimizer for the upper layer and changing the lower layer to Nadam with a 0.0005 learning rate.

In [None]:
keras.backend.clear_session()
np.random.seed(99)
tf.random.set_seed(99)

In [None]:
lower = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation="relu")])

upper = keras.models.Sequential([
    keras.layers.Dense(10, activation="softmax"),])

model = keras.models.Sequential([
    lower, upper])

In [None]:
lower_optimizer = keras.optimizers.Nadam(lr=5e-4)
upper_optimizer = keras.optimizers.SGD(lr=5e-3)

In [None]:
n_epochs = 10
batch_size = 32
n_steps = len(X_train) // batch_size
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

Again, we'll use the custom training loop discussed above. However, there is a slight change in that a for loop has been added to account for the upper and lower optimizers. 

In [None]:
with trange(1, n_epochs + 1, desc="All epochs") as epochs:
    for epoch in epochs:
        with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
            for step in steps:
                X_batch, y_batch = random_batch(X_train, y_train)
                with tf.GradientTape(persistent=True) as tape:
                    y_pred = model(X_batch)
                    main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                    loss = tf.add_n([main_loss] + model.losses)
                for layers, optimizer in ((lower, lower_optimizer),
                                          (upper, upper_optimizer)):
                    gradients = tape.gradient(loss, layers.trainable_variables)
                    optimizer.apply_gradients(zip(gradients, layers.trainable_variables))
                del tape
                for variable in model.variables:
                    if variable.constraint is not None:
                        variable.assign(variable.constraint(variable))                    
                status = OrderedDict()
                mean_loss(loss)
                status["loss"] = mean_loss.result().numpy()
                for metric in metrics:
                    metric(y_batch, y_pred)
                    status[metric.name] = metric.result().numpy()
                steps.set_postfix(status)
            y_pred = model(X_val)
            status["val_loss"] = np.mean(loss_fn(y_val, y_pred))
            status["val_accuracy"] = np.mean(keras.metrics.sparse_categorical_accuracy(
                tf.constant(y_val, dtype=np.float32), y_pred))
            steps.set_postfix(status)
        for metric in [mean_loss] + metrics:
            metric.reset_states()

HBox(children=(FloatProgress(value=0.0, description='All epochs', max=10.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 1/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 2/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 3/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 4/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 5/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 6/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 7/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 8/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 9/10', max=156.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Epoch 10/10', max=156.0, style=ProgressStyle(description_…





Though there isn't a significant increase in time, this second model converges much sooner and to a better solution than the original model. Originally, our training and validation accuracies were in the low 80's and high 70's, respectively. With our newer model, those accuracies have increased to the low 90's and high 80's, respectively. 

If the purpose of the exercise was to show that loop customization may allow for a potentially stronger model and greater model versatility, then that seems to have been effectively demonstrated. 