## Saving and Restoring a Model

When using the Sequential API or the Functional API, saving a trained Keras model is as simple as it gets:

In [1]:
import tensorflow as tf
from tensorflow import keras


import numpy as np
import pandas as pd
from datetime import datetime

In [2]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

In [3]:
X_valid, X_train = X_train_full[:5000]/255.0, X_train_full[5000:]/255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [4]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [5]:
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

Train on 55000 samples, validate on 5000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x154f9a748>

In [6]:
model.save('my_keras_model.h5')

Keras will use the HDF5 format to save both the model's arhitecture (including every layer's hyperparameters) and the values of all the model parameters for every layer (e.g., connection weights and biases). It also saves the optimizer (including its hyperparameters and any state it may have). You will typically have a script that trains a model and saves it, and one or more scripts (or web services) that load the model and use it to make predictions. Loading the model is just as easy.

In [7]:
model = keras.models.load_model('my_keras_model.h5')

This will work when using the Sequential Api or the Fucntional API, but unfortunately not when using model subclassing. You can use save_weights() and load_weights() to at least save and restore the model parameters, but you will need to save and restore everything yourself. 

But what if training lasts several hours? This is quite common, especially when training on large datasets. In this case, you should not only save your model at the end of training , but also save checkpoints at regular intervals during training, to avoid losing everything if your computer crashes. But how can you tell the fit() method to save checkpoints? Use callbacks.

## Using Callbacks

The fit() method accepts a callbacks argument that lets you specify a list of objects that Keras will call at the start and at the end of training , at the start and end of each epoch, and even before and after processing each batch. For example, the ModelCheckpoint callback saves checkpoints of your model at regular intervals during training, by default at the end of each epoch:

In [8]:
model_0 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
model_0.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
checkpoint_cb = keras.callbacks.ModelCheckpoint("my_keras_model_0.h5")
history_0 = model_0.fit(X_train, y_train, epochs=10, callbacks=[checkpoint_cb])

Train on 55000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Morove if you use a validation set during training you can use save_best_only = True when creating the ModelCheckpoint. In this case, it will only save your model when it's performance on the validation set is the best so far. This way, you do not need to worry about training for too long and overfitting the training set: simply restore the last model saved after training, and this will be the best model on the validation set. The following code is simple way to implement early stopping:


In [9]:
model_1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
model_1.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
checkpoint_cb = keras.callbacks.ModelCheckpoint('my_keras_model_1.h5', save_best_only=True)
history_1 = model_1.fit(X_train, y_train,epochs=10,validation_data=(X_valid, y_valid),callbacks=[checkpoint_cb])
model = keras.models.load_model('my_keras_model_1.h5') #roll back to best model

Train on 55000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Another way to implement early stopping is to simply use the EarlyStopping callback. It will interrupt training when it measures no progress on the validation set for a number of epoch (defined by the patience argument), and it will optionally roll back to the best model. You can combine both callbacks to checkpoints of your model (in case your computer crashes) and interrupt training early when there is no more progress (to avoid wasting time and resources):

In [10]:
model_2 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
model_2.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
checkpoint_cb_2 = keras.callbacks.ModelCheckpoint('my_keras_model_2.h5', save_best_only=True)
early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history_2 = model_2.fit(X_train, y_train, epochs=100, validation_data=(X_valid, y_valid),
                        callbacks=[checkpoint_cb_2, early_stopping_cb])

Train on 55000 samples, validate on 5000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100


The number of epochs can be set to a large value since training will stop automatically when there is no more progress. In this case, there is no need to restore the best model saved because the EarlyStopping callback will keep track of the best weights and restore them for you at the end of training.

There is many other callbacks available in the keras.callbacks package (https://keras.io/callbacks/.)

If you need extra control, you can easily write your own custom callbacks. As an example of how to do that, 
the following custom callback will display the ratio between the validation loss and training loss during training(e.g. to detect overfitting):

In [11]:
class PrintValTrainRatioCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        print('\nval/train: {:2f}'.format(logs['val_loss']/logs['loss']))

As you might expect you can implement on_train_begin(), on_train_end(), on_epoch_begin(), on_epoch_end(),on_batc_begin(),on_natch_end(). Callbacks can also be used during evaluation and predictions, should you ever need them on_test_end(), on_test_batch_begin(), or on_test_batch_end() (called by evaluate()), and for prediction you should implement on_predict_begin(), on_predict_end(), on_predict_batch_begin(), or on_predict_batch_end() (called by predict()).

Now let's take a look at one more tool from tf.keras: TensorBoard

## Using TensorBoard for Visualization

TensorBoard is a great interactive visualization tool that you can use to view the learning curves during training, compare learning curves between multiple runs, visualize the computation graph, analyze training statistics, view images generated by your model, visualize complex multidimensional data projected down to 3D and automatically clustered for you, and more! This tool is installed automatically when you install TensorFlow, so you already have it.

To use it, you must modify your program so that it outputs the data you want to visualize to special binary
log files called event files. Each binary data record is called a summary. The TensorBoard server will monitor the log directory, and it will automatically pick up the changes and update the visualizations: this allow you to visualize live data(with a short delay), such as the learning curves during training. In general, you want to point the TensorBoard server instance will allow you to visualize and compare data from multiple runs of your program, without getting everything mixed up. 

Let's start by defining the root log directory we will use for our TensorBoard logs, plus a small funtion that will
generate a subdirectory path based on the current date and time so that it's different at every run. you may want to include extra information in the log directory name, such as hyperparameter values you are testing, to make it easier to know what you are looking at in TensorBoard:


In [12]:
# import os
# root_logdir = os.path.join(os.curdir, 'my_logs')

# def get_run_logdir():
#     import time
#     run_id = "logs/scalars/" + time.strftime("%Y%m%d-%H%M%S")
#     return os.path.join(root_logdir, run_id)

# run_logdir = get_run_logdir() # e.g., './my_logs/run_2019_06_07-15_15_22'

The good news is that Keras provides a nice TensorBoard() callback:

In [13]:
# run_logdir = get_run_logdir() can be used with the code mentioned above, the code provided below is  from 
# tensorflow's own documentation.

logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

model_3 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax'),
])

model_3.compile(
    loss='sparse_categorical_crossentropy', # keras.losses.sparse_categorical_crossentropy
    optimizer=keras.optimizers.SGD(lr=0.001),
    metrics=['accuracy']
)

training_history = model_3.fit(
    X_train, # input
    y_train, # output
    verbose=0, # Suppress chatty output; use Tensorboard instead
    epochs=30,
    validation_data=(X_valid, y_valid),
    callbacks=[tensorboard_callback],
)

print("Average test loss: ", np.average(training_history.history['loss']))

Average test loss:  0.5213567116247524


In [14]:
logdir0 = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback_4 = keras.callbacks.TensorBoard(log_dir=logdir0)
model_4 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = X_train.shape[1:]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax'),
])

model_4.compile(
    loss='sparse_categorical_crossentropy', # keras.losses.sparse_categorical_crossentropy
    optimizer=keras.optimizers.SGD(lr=0.05),
    metrics=['accuracy'],
)

training_history_4 = model_4.fit(
    X_train, # input
    y_train, # output
    verbose=0, # Suppress chatty output; use Tensorboard instead
    epochs=30,
    validation_data=(X_valid, y_valid),
    callbacks=[tensorboard_callback_4],
)

print("Average test loss: ", np.average(training_history.history['loss']))

Average test loss:  0.5213567116247524


And that's all there is to it! It could hardly be easier to use. If you run this code, the TensorBoard() callback
will take care of creating the log directory for you (along  with its parent directories if needed), and during training it will create event files and summaries to them. After running the program a second time (perhaps changing some hyperparameter value), you will end up with a directory structure.

There's one directory per run, each containing one subdirectory for training logs and one for validation logs. Both contain event files, but the training logs also include profiling traces: this allows TensorBoard to show you exactly how much time the model spent on each part of your model, across all your devices, which is great for locating performance bottlenecks.

Next you need to start the TensorBoard server. One way to do this by running a command in a terminal. If you installed within a virtualenv, you should activate it. Next, run the following command at the root of the project (or from anywhere else, as long as you point to the appropriate log directory):


$ tensorboard --logdir=./my_logs --port=6006

If your shell cannot find the tensorboard script, then you must update your PATH environment variable so that it contains the directory in which the script was installed (alternatively, you can just replace tensorboard in the command line with python3 -m tensorboard.main). Once server is up, you can open a web browser and go to http://localhost:6006.

Alternatively, you can use TensorBoard directly within Jupyter, by running the following commands. The first line 
loads the TensorBoard extension, and the second line starts a TensorBoard server on port 6006 (unless it is already started) and connects to it:

Either way, you should see TensorBoard's web interface. Click the SCALARS tab to view the learning curves. At the bottom left, select the logs you want to visualize (e.g., the training logs from the first and second run), and click the epoch_loss scalar. Notice that the training loss went down nicely during both runs, but the second run went down much faster. Indeed, we used a learning rate of 0.05 (optimizer=keras.optimizers.SGD(lr=0.05)) instead of 0.001.

In [15]:
%load_ext tensorboard
%tensorboard --logdir=./logs/scalars

Reusing TensorBoard on port 6007 (pid 30456), started 0:35:19 ago. (Use '!kill 30456' to kill it.)

Note that sometimes Keras callback creating .profile-empty file blocks loading data, the same problem occurs in tf-nightly (non-2.0-preview), but manifests differently: because there is only one run (named .) instead
of separate train/validation, all data stops being displayed after the epoch in which TensorBoard is opened.

Note as a special case of this that if TensorBoard is running before training starts, then train data may not appear at all. 

It should be fine by killing and reloading the TensorBoard, otherwise get more infor from https://github.com/tensorflow/tensorboard/issues/2084 to tackle with the problem.

In [22]:
#!kill it with the number provided
%reload_ext tensorboard
%tensorboard --logdir=./logs/scalars

Reusing TensorBoard on port 6007 (pid 30782), started 0:00:13 ago. (Use '!kill 30782' to kill it.)

You can also visualize the whole graph, the learned weights (projected to 3D), or the profiling traces. The TensorBoard() callback has options to log extra data too, such as embeddings (see Chapter 13).

Additionally, TensorFlow offers a lower-level API in the tf.summary package. The following code creates a SummaryWriter using the create_file_writer() function, and it uses this writer as a context to log scalars, histograms, images, audio, and text, all of which can than be visualized using TensorBoard(give it a try!):

In [17]:
test_logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
writer = tf.summary.create_file_writer(test_logdir)
with writer.as_default():
    for step in range(1,1000 + 1):
        tf.summary.scalar('my_scalar', np.sin(step/10), step=step)
        data = (np.random.rand(100) + 2)*step/100 #some random data
        tf.summary.histogram('my_list', data, buckets=50, step=step)
        images = np.random.rand(2,32,32,3) #random 32*32 RGB images
        tf.summary.image('my_images',images*step/1000, step=step)
        texts = ['The step is ' + str(step), "It's square is " + str(step**2)]
        tf.summary.text('my_text', texts, step=step)
        sine_wave = tf.math.sin(tf.range(12000)/48000*2*np.pi*step)
        audio = tf.reshape(tf.cast(sine_wave, tf.float32), [1,-1,1])
        tf.summary.audio('my_audio', audio, sample_rate=48000, step=step)

In [21]:
!kill 30456
%reload_ext tensorboard
%tensorboard --logdir=./logs/scalars

This is actually a useful visualization tool to have, even beyond Tensorflow or Deep Learning!