# Exercise 1.5.3 - Learning Rate Schedules
#### By Jonathan L. Moran (jonathan.moran107@gmail.com)
From the Self-Driving Car Engineer Nanodegree programme offered at Udacity.

## Objectives

* Implement two different [learning rate schedules](https://en.wikipedia.org/wiki/Learning_rate#Learning_rate_schedule): [exponential decay](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay) and step-wise annealing.

In [2]:
import argparse
import logging
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.preprocessing import image_dataset_from_directory
from typing import List, Tuple

## Details

To do so, you will have to leverage Keras `callbacks`. Callbacks performs various action
at different stages of training. For example, Keras uses a callback to save the models weights at 
the end of each training epoch.

In [None]:
### From Udacity's `utils.py`

In [None]:
class LrLogger(tf.keras.callbacks.Callback):
    def __init__(self):
        super().__init__()
        
    def on_train_begin(self, logs=None):
        history = self.model.history.history
        history['lr'] = []

    def on_epoch_end(self, epoch, logs=None):
        history = self.model.history.history
        optimizer = self.model.optimizer
        decayed_lr = optimizer._decayed_lr('float32').numpy()
        history['lr'].append(decayed_lr)

You can either use pre-implemented schedulers (see Tips) or implement a scheduler yourself 
using your own custom decay function, as shown below:

```
def decay(model, callbacks, lr=0.001):
    """ create custom decay that does not do anything """
    def scheduler(epoch, lr):
        return lr 

    callbacks.append(tf.keras.callbacks.LearningRateScheduler(scheduler))

    # compile model
    model.compile()
    
    return model, callbacks 
```

In [3]:
### From Udacity's `training.py`

In [5]:
def exponential_decay(
        model: tf.keras.Model, callbacks: List[tf.keras.callbacks.Callback]=[], 
        initial_lr: float=0.001
) -> Tuple[tf.keras.Model, List[tf.keras.callbacks.Callback]]:
    """Compiles and returns Model instance with exponential decay LR schedule.
    
    :param model: the tf.keras.Model instance to compile.
    :param callbacks: the list of tf.keras.callbacks to pass alongside model.
    :param initial_lr: the value to fix the learning rate at before annealing.
    :returns: tuple, the compiled Model instance and its callbacks.
    """
    # IMPLEMENT THIS FUNCTION
    
    # Instantiate the learning rate schedule
    lr_scheduler = tf.keras.optimizers.schedules.ExponentialDecay(
                        initial_learning_rate=initial_lr,
                        decay_steps=100,
                        decay_rate=0.95,
                        staircase=False
    )
    # Instantiate the optimiser
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr_scheduler)
    # Compile the model
    model.compile(optimizer=optimizer, 
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
                  metrics=['accuracy']
    )
    # Return the model and any specified callbacks
    return model, callbacks

In [7]:
model = tf.keras.Model()

2022-09-28 14:19:24.263631: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [8]:
model, callbacks = exponential_decay(model=model, callbacks=[], initial_lr=0.001)

In [22]:
def step_decay(
        model: tf.keras.Model, callbacks: List[tf.keras.callbacks.Callback]=[], 
        initial_lr: float=0.001
) -> Tuple[tf.keras.Model, List[tf.keras.callbacks.Callback]]:
    """Compiles and returns Model instance with step-wise decay LR schedule.
    
    :param model: the tf.keras.Model instance to compile.
    :param callbacks: the list of tf.keras.callbacks to pass alongside model.
    :param initial_lr: the value to fix the learning rate at before annealing.
    :returns: tuple, the compiled Model instance and its callbacks.
    """
    #  IMPLEMENT THIS FUNCTION
    
    def scheduler(epoch: int, lr: float):
        """Simple custom constant step-wise annealing schedule."""
        return lr / 2 if epoch % 10 == 0 and epoch > 0 else lr
    
    # Instantiate a custom Keras callback to perform LR annealing
    lr_schedule = tf.keras.callbacks.LearningRateScheduler(
                            schedule=scheduler, verbose=1
    )
    callbacks.append(lr_schedule)
    # Instantiate the optimiser with the initial learning rate value
    optimizer = tf.keras.optimizers.Adam(learning_rate=initial_lr)
    # Compile the model
    model.compile(optimizer=optimizer,
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
                  metrics=['accuracy']
    )
    # Return the compiled model and the custom LR scheduler and any other callback
    return model, callbacks

In [23]:
model = tf.keras.Model()

In [24]:
model, callbacks = step_decay(model=model, callbacks=[], initial_lr=0.001)

Feel free to use any decay rates as well as a step size of your choice for the stepwise scheduler.

You can run `python training.py` to see the effect of different annealing strategies on your training and model performances. Make sure to feed in the GTSRB dataset as the image directory, and use the Desktop to view the visualization of final training metrics.

In [None]:
### From Udacity's `utils.py`

In [None]:
def get_module_logger(mod_name):
    logger = logging.getLogger(mod_name)
    handler = logging.StreamHandler()
    formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.DEBUG)
    return logger

In [None]:
### From Udacity's `training.py`

In [None]:
logger = get_module_logger(__name__)

In [None]:
parser = argparse.ArgumentParser(description='Download and process tf files')
parser.add_argument('-d', '--imdir', required=True, type=str,
                    help='data directory')
parser.add_argument('-e', '--epochs', default=10, type=int,
                    help='Number of epochs')
args = parser.parse_args()    

logger.info(f'Training for {args.epochs} epochs using {args.imdir} data')

In [None]:
### From Udacity's `utils.py`

In [None]:
def process(image,label):
    """ small function to normalize input images """
    image = tf.cast(image/255. ,tf.float32)
    return image,label

In [None]:
def get_datasets(imdir):
    """ extract GTSRB dataset from directory """
    train_dataset = image_dataset_from_directory(imdir, 
                                       image_size=(32, 32),
                                       batch_size=32,
                                       validation_split=0.2,
                                       subset='training',
                                       seed=123,
                                       label_mode='int')

    val_dataset = image_dataset_from_directory(imdir, 
                                        image_size=(32, 32),
                                        batch_size=32,
                                        validation_split=0.2,
                                        subset='validation',
                                        seed=123,
                                        label_mode='int')
    train_dataset = train_dataset.map(process)
    val_dataset = val_dataset.map(process)
    return train_dataset, val_dataset

In [None]:
### From Udacity's `training.py`

In [None]:
# get the datasets
train_dataset, val_dataset = get_datasets(args.imdir)

In [None]:
logger = LrLogger()
callbacks = [logger]

In [None]:
### From Udacity's `utils.py`

In [None]:
def create_network():
    net = tf.keras.models.Sequential()
    input_shape = [32, 32, 3]
    net.add(Conv2D(6, kernel_size=(3, 3), strides=(1, 1), activation='relu', 
                   input_shape=input_shape))
    net.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    net.add(Conv2D(16, kernel_size=(3, 3), strides=(1, 1), activation='relu'))   
    net.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    net.add(Flatten())
    net.add(Dense(120, activation='relu'))
    net.add(Dense(84, activation='relu'))
    net.add(Dense(43))
    return net

In [None]:
### From Udacity's `training.py`

In [None]:
model = create_network()

In [None]:
# model, callbacks = exponential_decay(model, callbacks)
model, callbacks = step_decay(model, callbacks)

In [None]:
history = model.fit(x=train_dataset, 
                    epochs=args.epochs, 
                    validation_data=val_dataset,
                    callbacks=callbacks)

In [None]:
### From Udacity's `utils.py`

In [None]:
def display_metrics(history):
    """ plot loss and accuracy from keras history object """
    f, ax = plt.subplots(1, 3, figsize=(15, 5))
    ax[0].plot(history.history['loss'], linewidth=3)
    ax[0].plot(history.history['val_loss'], linewidth=3)
    ax[0].set_title('Loss', fontsize=16)
    ax[0].set_ylabel('Loss', fontsize=16)
    ax[0].set_xlabel('Epoch', fontsize=16)
    ax[0].legend(['train loss', 'val loss'], loc='upper right')
    ax[1].plot(history.history['accuracy'], linewidth=3)
    ax[1].plot(history.history['val_accuracy'], linewidth=3)
    ax[1].set_title('Accuracy', fontsize=16)
    ax[1].set_ylabel('Accuracy', fontsize=16)
    ax[1].set_xlabel('Epoch', fontsize=16)
    ax[1].legend(['train acc', 'val acc'], loc='upper left')
    ax[2].plot(history.history['lr'], linewidth=3)
    ax[2].set_title('Learning rate', fontsize=16)
    ax[2].set_ylabel('Learning Rate', fontsize=16)
    ax[2].set_xlabel('Epoch', fontsize=16)
    ax[2].legend(['learning rate'], loc='upper right')
    # ax[2].ticklabel_format(axis='y', style='sci')
    ax[2].yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'))
    plt.tight_layout()
    plt.show()

In [None]:
display_metrics(history)

## Tips

You can find pre-implemented schedulers (Keras naming convention for learning rate annealing strategies) 
[here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules).

## Credits

This assignment was prepared by Thomas Hossler et al., Winter 2021 (link [here](https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd0013)).

Helpful resources:
* [Learning Rate Schedule in Practice: an example with Keras and TensorFlow 2.0 by B. Chen | Medium](https://towardsdatascience.com/learning-rate-schedule-in-practice-an-example-with-keras-and-tensorflow-2-0-2f48b2888a0c)