# Using Callbacks

Callbacks are an integral part of Keras. Callbacks are used to get performance information, log progress, halt in the event of errors, tune parameters, save model state (in case of crash, etc.), finish training once loss is minimized. The list goes on.
Callbacks can be passed to fit, evaluate and predict methods of keras.Model. 

## Goals

The overarching goal is to learn to use callbacks for some typical tasks. These include:
- Reporting about training progress.
- Stoping once training no longer reduces loss.
- Tuning hyperparameters.
- Implementing adaptive learning rate decay.
- Finding an optimal batch-size for training.
- Putting some of this into ```my_keras_utils.py``` so that they can be easily called and reused.

## What's Here?

I continue working with MNIST data, which I began working with in [my first Keras models](first_model.ipynb). 

My **concrete objective** is to tune a model that does well on Kaggle: 97th percentile? That's tough, but I think I can make it work.

In [4]:
import numpy as np
from datetime import datetime, time, timedelta

import pandas as pd
import tensorflow as tf
import kerastuner as kt
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt


import my_keras_utils as my_utils

In [2]:
tf.__version__
tf.config.experimental.list_physical_devices('GPU')

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [3]:
## Load our data.
## Since the load process is a little slow, the try-except allows us to re-run all 
## cells without having to wait. 
try:
    ## Raises NameError and loads data if X_train is not defined.
    X_train.shape
except NameError: 
    ((X_train, y_train), (X_dev, y_dev), (X_test, y_test)) = my_utils.load_kaggle_mnist()
X_train.shape




(38000, 784)

In [34]:
X_dev.shape

(2000, 784)

In [None]:
## Let's use the dropout model from my first_model notebook.

inputs = keras.Input(shape=(784))
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
x = layers.Dropout(rate = .05)(x)
x = layers.Dense(100, activation='relu',)(x)
x = layers.Dropout(rate = .15)(x)
x = layers.Dense(100, activation='relu',)(x)
x = layers.Dropout(rate = .15)(x)
outputs = layers.Dense(10, activation='softmax')(x)

model_dropout = keras.Model(inputs=inputs, outputs=outputs, name='Dropout')
model_dropout.summary()
optimizer = keras.optimizers.Adam(.001)

model_dropout.compile(optimizer=optimizer, 
                loss="sparse_categorical_crossentropy",
                metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
            )
init_weights = model_dropout.get_weights()


In [None]:
early_stopping = keras.callbacks.EarlyStopping(
                    monitor='loss', 
                    patience = 10, 
                    restore_best_weights = True)

adaptive_lr = keras.callbacks.ReduceLROnPlateau(
                    monitor='loss',
                    patience= 4,
                    cooldown= 8, 
                    factor=0.3,)
progress_update = my_utils.TimedProgressUpdate(1)
callbacks = [progress_update, early_stopping]
epochs = 0
initial_epoch = 0
batch_size = 128

In [None]:
if True: ## reinitialize
    model_dropout.set_weights(init_weights)    
increment_epochs = 50
epochs += increment_epochs

history1 = model_dropout.fit(X_train, y_train, 
                        epochs=epochs,
                        initial_epoch = initial_epoch,
                        batch_size=batch_size, 
                        validation_data=(X_dev, y_dev),
                        callbacks = [progress_update],
                        verbose = 0)
initial_epoch += increment_epochs

In [None]:
if True: ## reinitialize
    model_dropout.set_weights(init_weights)    
increment_epochs = 50
epochs += increment_epochs

history2 = model_dropout.fit(X_train, y_train, 
                        epochs=epochs,
                        initial_epoch = initial_epoch,
                        batch_size=batch_size, 
                        validation_data=(X_dev, y_dev),
                        callbacks = [adaptive_lr],
                        verbose =0)
initial_epoch += increment_epochs

In [None]:

def overlay_histories(histories, metric):
    fig, ax = plt.subplots()
    n = 0
    for h in histories:
        x = range(0,len(h.history[metric]))
        y = np.array(h.history[metric])
        label = 'history_' + str(n)
        ax.plot(x,y, label=label)
        n += 1
    ax.legend()

In [None]:
overlay_histories([history1, history2], 'val_loss')

## Tuning

Time to work with Keras Tuner.

#### Read the Docs

Reading the documentation was really helpful. Note that the search function _will_ use callbacks. So, instead of worrying about 
the ```max_epochs```, you can (if you have the resources and time--which _is_ a resource) just add a realistic ```EarlyStopping``` callback. Below, I use this to stop if training loss is not decreasing.

In [5]:
def model_builder(hp):
    ## Define the parameter search space.

    hp_dropout_x1 = hp.Float('rate1', min_value = .05, max_value = .5, step=.01)
    hp_dropout_w1 = hp.Float('rate2', min_value = .05, max_value = .5, step=.01)
    hp_dropout_w2 = hp.Float('rate3', min_value = .05, max_value = .5, step=.01)
    hp_l1_units = hp.Int('l1_units', min_value = 20, max_value = 200, step = 20)
    hp_l2_units = hp.Int('l2_units', min_value = 20, max_value = 200, step = 20)

    ## ### I need to learn about the options here. 'Choice' means "here are your choices"
    ## ### 'Int' is a different option that searches an integer range by steps.

    inputs = keras.Input(shape=(784))
    x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
    x = layers.Dropout(rate = hp_dropout_x1)(x)
    x = layers.Dense(hp_l1_units, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w1)(x)
    x = layers.Dense(hp_l2_units, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w2)(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)

    model.compile(optimizer = keras.optimizers.Adam(.003),
                    loss = "sparse_categorical_crossentropy",
                    metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )

    return model

	


In [6]:
tuner = kt.Hyperband(model_builder,
                     objective = 'val_loss', 
                     max_epochs = 120,
                     hyperband_iterations = 2,
                     factor = 3,
                     directory = 'ignored/kt_trials',
                     project_name = 'dropout_mnist',
                     )	

tuner.search_space_summary()

INFO:tensorflow:Reloading Oracle from existing project ignored/kt_trials\dropout_mnist\oracle.json
INFO:tensorflow:Reloading Tuner from ignored/kt_trials\dropout_mnist\tuner0.json


In [7]:
## prevent bloated ipython output during training.
clear_output = my_utils.ClearTrainingOutput()
timed_update = my_utils.TimedProgressUpdate(3)
## stop when training loss is not happening.
train_loss_stopping = keras.callbacks.EarlyStopping(monitor='loss', 
                            patience = 10, 
                            restore_best_weights = False
                            )

adaptive_lr = keras.callbacks.ReduceLROnPlateau(
                    monitor='loss',
                    patience= 4,
                    cooldown= 8,
                    min_lr=.0003, 
                    factor=0.334,)

tuner_callbacks = [adaptive_lr, train_loss_stopping, timed_update]


In [8]:
start = datetime.now()
tuner.search(X_train, y_train, 
            epochs=120,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = tuner_callbacks,
            verbose = 0
            )


INFO:tensorflow:Oracle triggered exit


In [9]:
tuner.results_summary()

## Ran the Hyperband, 

with the following params:
```
tuner = kt.Hyperband(model_builder,
                     objective = 'val_loss', 
                     max_epochs = 200,
                     hyperband_iterations = 10,
                     factor = 3,
                     directory = 'ignored/kt_trials',
                     project_name = 'dropout_mnist')	
###
tuner.search(X_train, y_train, 
            epochs=50,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = tuner_callbacks,
            )
```
Which involved way too many iterations. I didn't notice the hyperband_iterations param (10!).

### os.path.normpath

Per [Issue #198](https://github.com/keras-team/keras-tuner/issues/198) you may need to add os.path.normpath() to the directory keyword arg in Windows and the path to the logs needs to be short. (E.g., you won't be able to use my_trials/this_type_of_trial/this_trial--too long. Try mt/t/this).

In [13]:
def rand_search_model_builder(hp):
    ## Define the hyperparameter search space.

    hp_dropout_x1 = hp.Float('rate1', min_value = .1, max_value = .4, step=.01)
    hp_dropout_w1 = hp.Float('rate2', min_value = .05, max_value = .4, step=.01)
    hp_dropout_w2 = hp.Float('rate3', min_value = .05, max_value = .4, step=.01)

    # Define the hypermodel
    inputs = keras.Input(shape=(784))
    x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
    x = layers.Dropout(rate = hp_dropout_x1)(x)
    x = layers.Dense(100, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w1)(x)
    x = layers.Dense(100, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w2)(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)

    model.compile(optimizer = keras.optimizers.Adam(.003),
                    loss = "sparse_categorical_crossentropy",
                    metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )

    return model

#kt.tuners.RandomSearch?
rand_tuner = kt.tuners.RandomSearch(rand_search_model_builder,
                     objective = 'val_loss', 
                     max_trials = 25,
                     executions_per_trial=1,
                     directory = os.path.normpath('ignored/mnist'),
                     project_name = 'rs')

rand_tuner.search_space_summary()

adaptive_lr = keras.callbacks.ReduceLROnPlateau(
                    monitor='loss',
                    patience= 4,
                    cooldown= 8,
                    min_lr=.0003, 
                    factor=0.334,)

progress_update = my_utils.TimedProgressUpdate(2)


rand_tuner.search(X_train, y_train, 
            epochs=120,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = [adaptive_lr, 
                        progress_update, 
                        #clear_output
                        ],
            verbose = 0
            )

INFO:tensorflow:Reloading Oracle from existing project ignored\mnist\rs\oracle.json
INFO:tensorflow:Reloading Tuner from ignored\mnist\rs\tuner0.json


INFO:tensorflow:Oracle triggered exit


In [11]:
def rand_search_model_builder_2(hp):
    ## Define the hyperparameter search space.

    hp_dropout_x1 = hp.Float('rate1', min_value = .27, max_value = .38, step=.001)
    hp_dropout_w1 = hp.Float('rate2', min_value = .05, max_value = .15, step=.001)
    hp_dropout_w2 = hp.Float('rate3', min_value = .11, max_value = .21, step=.001)

    # Define the hypermodel
    inputs = keras.Input(shape=(784))
    x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
    x = layers.Dropout(rate = hp_dropout_x1)(x)
    x = layers.Dense(100, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w1)(x)
    x = layers.Dense(100, activation='relu',)(x)
    x = layers.Dropout(rate = hp_dropout_w2)(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)

    model.compile(optimizer = keras.optimizers.Adam(.003),
                    loss = "sparse_categorical_crossentropy",
                    metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )

    return model

#kt.tuners.RandomSearch?
new_tuner = kt.tuners.RandomSearch(rand_search_model_builder_2,
                     objective = 'val_loss', 
                     max_trials = 25,
                     executions_per_trial=1,
                     directory = os.path.normpath('ignored/mnist'),
                     project_name = 'rs_2')

new_tuner.search_space_summary()

adaptive_lr = keras.callbacks.ReduceLROnPlateau(
                    monitor='loss',
                    patience= 4,
                    cooldown= 8,
                    min_lr=.0003, 
                    factor=0.334,)

progress_update = my_utils.TimedProgressUpdate(.5)

new_tuner.search(X_train, y_train, 
            epochs=1,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = [adaptive_lr, 
                        progress_update, 
                        #clear_output
                        ],
            verbose = 0
            )

INFO:tensorflow:Reloading Oracle from existing project ignored\mnist\rs_2\oracle.json
INFO:tensorflow:Reloading Tuner from ignored\mnist\rs_2\tuner0.json


INFO:tensorflow:Oracle triggered exit


### A dubious fist run

I ran a random_search, lr = .001 and a 0-.5 range on the params; 50 epochs, 2 executions per trial. The results were okay, but it seemed like the number of epochs was too few. Also, the max_trials was too big. At least on the Surface, it took three minutes per trial. Ouch. 5 hours later...

### I ran a second time

This a second time. This time with more epochs (120), but 1 execution per trial and a slightly different band of paramters. There was a clear stand out

Trial ID: 39f9768e653c7179ddb0917bd84a4b9d
|-Score: 0.05155862867832184
|-Best step: 0
Hyperparameters:
|-rate1: 0.32999999999999985
|-rate2: 0.1
|-rate3: 0.16000000000000003

In [12]:
new_tuner.results_summary()

## Baysian Tuning

Let's see how well the Baysian Optimizer does with the rand_search_builder.

In [11]:
bayesian_tuner = kt.tuners.BayesianOptimization(
            rand_search_model_builder,
            objective='val_loss',
            max_trials = 25,
            directory = os.path.normpath('ignored/mnist'),
            project_name = 'bayes'
)

bayesian_tuner.search(X_train, y_train, 
            epochs=120,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = [adaptive_lr, 
                        progress_update, 
                        clear_output
                        ],
            verbose = 0
            )


INFO:tensorflow:Oracle triggered exit


In [12]:
bayesian_tuner.results_summary()

## Let's put it all together. 

The Bayesian tuner got pretty bad results compared with the other two. I'm not really sure what happened, except that it didn't seem exploratory enough. I should probably figure out how to get better Bayesian results.

But for now, we have several high scoring models. Let's extract three and cross validate.  

In [60]:
best = rand_tuner.get_best_models()[0]
config = best.get_config()
model_1 = best.from_config(config)
config



{'name': 'functional_1',
 'layers': [{'class_name': 'InputLayer',
   'config': {'batch_input_shape': (None, 784),
    'dtype': 'float32',
    'sparse': False,
    'ragged': False,
    'name': 'input_1'},
   'name': 'input_1',
   'inbound_nodes': []},
  {'class_name': 'Rescaling',
   'config': {'name': 'rescaling',
    'trainable': True,
    'dtype': 'float32',
    'scale': 0.00392156862745098,
    'offset': 0.0},
   'name': 'rescaling',
   'inbound_nodes': [[['input_1', 0, 0, {}]]]},
  {'class_name': 'Dropout',
   'config': {'name': 'dropout',
    'trainable': True,
    'dtype': 'float32',
    'rate': 0.32999999999999985,
    'noise_shape': None,
    'seed': None},
   'name': 'dropout',
   'inbound_nodes': [[['rescaling', 0, 0, {}]]]},
  {'class_name': 'Dense',
   'config': {'name': 'dense',
    'trainable': True,
    'dtype': 'float32',
    'units': 100,
    'activation': 'relu',
    'use_bias': True,
    'kernel_initializer': {'class_name': 'GlorotUniform',
     'config': {'seed': No

In [61]:
temp = rand_tuner.get_best_models(num_models=2)
model_2 = temp[0].from_config(temp[0].get_config())
model_3 = temp[1].from_config(temp[1].get_config())




In [63]:
def cross_validate(model, X, y, val_size = 2000, folds = 3):
    init_weights = model.get_weights()
    results = []
    for fold in range(folds):
        X_val = X[fold*val_size:(fold+1)*val_size,:]
        X_train = np.concatenate((X[0:fold*val_size,:],X[(fold+1)*val_size:,:]))
        y_val = y[fold*val_size:(fold+1)*val_size]
        y_train = np.concatenate((y[0:fold*val_size],y[(fold+1)*val_size:]))
        history = model.fit(X_train, y_train,
            epochs=200,
            batch_size=128, 
            validation_data=(X_val, y_val),
            callbacks = [adaptive_lr, 
                        progress_update, 
                        clear_output
                        ],
            verbose = 0)
        model.set_weights(init_weights)
        results.append(history)
    return results

In [64]:
results = []
for model in [model_1, model_2, model_3]:
    model.compile(optimizer = keras.optimizers.Adam(.003),
                    loss = "sparse_categorical_crossentropy",
                    metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )
    histories = cross_validate(model, X_train, y_train)
    results.append(histories)

Begin training of functional_1 at 08:07:17. Progress updates every 120.0 seconds.
Finished fitting at 08:08:56. Elapsed time 0:01:39.265299.


In [69]:
for e in results[0]:
    print(e.history['val_loss'][-1])
for e in results[1]:
    print(e.history['val_loss'][-1])
for e in results[2]:
    print(e.history['val_loss'][-1])
    

0.039871301501989365
0.010694405995309353
0.012702114880084991
0.03751494362950325
0.08814937621355057
0.08254300802946091
0.04536708816885948
0.06536883115768433
0.0808723047375679


In [79]:
## Reset all the weights.
model_1.compile(optimizer = keras.optimizers.Adam(.003),
                    loss = "sparse_categorical_crossentropy",
                    metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )
## Save so I can move to another computer
model_1.save('MNIST_model.hdf5')
model_2.save("mnist_2.hdf5")
model_3.save("mnist_3.hdf5")

In [77]:
model_1.fit(X_train, y_train,
            epochs=200,
            batch_size=128, 
            validation_data=(X_dev, y_dev),
            callbacks = [adaptive_lr, 
                        progress_update, 
                        clear_output
                        ],
            verbose = 2)
model_1.save('MNIST_model.hdf5')

Begin training of functional_1 at 08:45:50. Progress updates every 120.0 seconds.
Epoch 1/200
297/297 - 1s - loss: 0.0311 - acc: 0.9896 - val_loss: 0.0906 - val_acc: 0.9785
Epoch 2/200
297/297 - 1s - loss: 0.0333 - acc: 0.9886 - val_loss: 0.0886 - val_acc: 0.9805
Epoch 3/200
297/297 - 1s - loss: 0.0343 - acc: 0.9886 - val_loss: 0.0873 - val_acc: 0.9815
Epoch 4/200
297/297 - 1s - loss: 0.0322 - acc: 0.9896 - val_loss: 0.0871 - val_acc: 0.9785
Epoch 5/200
297/297 - 1s - loss: 0.0342 - acc: 0.9884 - val_loss: 0.0875 - val_acc: 0.9800
Epoch 6/200
297/297 - 1s - loss: 0.0372 - acc: 0.9881 - val_loss: 0.0890 - val_acc: 0.9790
Epoch 7/200
297/297 - 1s - loss: 0.0342 - acc: 0.9890 - val_loss: 0.0885 - val_acc: 0.9795
Epoch 8/200
297/297 - 1s - loss: 0.0313 - acc: 0.9896 - val_loss: 0.0891 - val_acc: 0.9800
Epoch 9/200
297/297 - 1s - loss: 0.0318 - acc: 0.9886 - val_loss: 0.0875 - val_acc: 0.9800
Epoch 10/200
297/297 - 1s - loss: 0.0335 - acc: 0.9891 - val_loss: 0.0853 - val_acc: 0.9810
Epoch 1

KeyboardInterrupt: 