You can save the model during or after the training. The model saving helps restart the training task from the breaking point or sharing with others to recreate the model.

In general, you are going to prepare the document or source code of the model and the weights trained, to share with others or to publish to a journal.

In [0]:
!pip install -q --upgrade grpcio
!pip install -q tf-nightly
!pip install -q pyyaml h5py

In [1]:
import tensorflow as tf
import os

print("Tensorflow Version: {}".format(tf.__version__))
print("Eager Model: {}".format(tf.executing_eagerly()))
print("GPU {} available".format("is" if tf.config.experimental.list_physical_devices("GPU") else "not"))

Tensorflow Version: 2.1.0-dev20200101
Eager Model: True
GPU is available


# Building a Model

Here you are going to build a model on the MNIST dataset.

## Data Preparation

In [0]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

In [0]:
x_train, x_test = x_train / 255.0, x_test / 255.0

In [0]:
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

In [5]:
x_train.shape, y_train.shape

((60000, 28, 28, 1), (60000,))

## Building a Model

In [0]:
def build_cnn_model(inputs):
  x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), 
                             kernel_regularizer=tf.keras.regularizers.l2(0.001),
                             input_shape=(28,28,1), activation='elu', 
                             strides=(1,1), padding='same')(inputs)
  x = tf.keras.layers.Conv2D(filters=64, kernel_size=(5,5), activation='elu', 
                             kernel_regularizer=tf.keras.regularizers.l2(0.001),
                             strides=(2,2), padding='same')(x)
  x = tf.keras.layers.Dropout(0.5)(x)
  x = tf.keras.layers.Flatten()(x)
  y = tf.keras.layers.Dense(10, activation='softmax')(x)
  return y

In [0]:
def cnn_model():
  inputs = tf.keras.Input(shape=(28,28,1))
  outputs = build_cnn_model(inputs)
  model = tf.keras.Model(inputs, outputs)

  model.summary()

  model.compile(loss='sparse_categorical_crossentropy', 
                optimizer='adam', 
                metrics=['accuracy'])
  return model

In [8]:
model = cnn_model()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_______________________________________________________

# Save Checkpoints during Training

You can retrain the model where it left off via a `tf.keras.callbacks.ModelCheckpoint` API to continuously save the model both during and at the end of the training.

## Creating a Checkpoint Callback

Create a callback to save the model weights only.

In [9]:
ckpt_path = "train_1/cp.ckpt"
ckpt_dir = os.path.dirname(ckpt_path)
ckpt_path, ckpt_dir

('train_1/cp.ckpt', 'train_1')

In [0]:
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=ckpt_path, 
                                                 save_weights_only=True, 
                                                 verbose=1)

In [0]:
model.fit(x_train, y_train, 
          epochs=10, validation_data=(x_test, y_test), 
          callbacks=[cp_callback])

In [13]:
!ls {ckpt_dir}

checkpoint		     cp.ckpt.data-00001-of-00002
cp.ckpt.data-00000-of-00002  cp.ckpt.index


After you trained a model with a checkpoint callback, you can recreate a model and load the trained weights into the new one. 

Here you only save the model weight so that you have to instance a new model object with the same architecture while you are going to create a model out of this runtime.

In [14]:
restored_model = cnn_model()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_____________________________________________________

Here you evaluate the model with the test dataset without loading the weight.

You can look at the result. It is clear that the model was not trained.

In [15]:
loss, acc = restored_model.evaluate(x_test, y_test, verbose=2)
print("Loss: {}, Acc: {}".format(loss, acc))

10000/1 - 1s - loss: 2.3457 - accuracy: 0.1281
Loss: 2.3499669174194335, Acc: 0.12809999287128448


It is easy to load the weight via `load_weight()` into the model.

In [16]:
# you load the model weight first
restored_model.load_weights(ckpt_path)

loss, acc = restored_model.evaluate(x_test, y_test, verbose=2)
print("Loss: {}, Acc: {}".format(loss, acc))

10000/1 - 1s - loss: 0.4612 - accuracy: 0.8391
Loss: 0.48030954031944273, Acc: 0.8391000032424927


## Advanced Checkpoint Callbacks

You can also save more than more checkpoints with a customized name for restoring or using the best model. 

In [0]:
ckpt_path = "train_2/cp-{epoch:04d}.ckpt"
ckpt_dir = os.path.dirname(ckpt_path)

In [0]:
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=ckpt_path, save_weights_only=True, peroid=5, verbose=1)

In [19]:
model = cnn_model()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout_2 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_____________________________________________________

Here you can save the untrained weights of a model.

In [0]:
model.save_weights(ckpt_path.format(epoch=0))

In [0]:
model.fit(x_train, y_train, epochs=20, 
          callbacks=[cp_callback], 
          validation_data=(x_test, y_test), 
          verbose=0)

In [0]:
!ls {ckpt_dir}

In [24]:
latest = tf.train.latest_checkpoint(ckpt_dir)
latest

'train_2/cp-0020.ckpt'

Restore the weights into a new model.

In [25]:
restored_model = cnn_model()
restored_model.load_weights(latest)

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout_3 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_____________________________________________________

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f6a40599828>

In [27]:
loss, acc = restored_model.evaluate(x_test, y_test, verbose=2)
print("Acc: {:5.2f}%".format(acc*100))

10000/1 - 1s - loss: 0.4401 - accuracy: 0.8447
Acc: 84.47%


# Manually Save the Weights

The checkpoint files are mainly composed of two types of files, one is saving the weight file and the other is the index file for mapping the variables to their weights.

By default, `save_weights` in `tf.keras` uses the Tensorflow checkpoint format (`.ckpt` extension).

In [0]:
model.save_weights('./checkpoints/msave_ckpt')

In [34]:
!ls ./checkpoints/

checkpoint			msave_ckpt.data-00001-of-00002
msave_ckpt.data-00000-of-00002	msave_ckpt.index


In [35]:
restored_model = cnn_model()

Model: "model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout_5 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_____________________________________________________

In [36]:
restored_model.load_weights('./checkpoints/msave_ckpt')

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f6a40596278>

In [37]:
loss, acc = restored_model.evaluate(x_test, y_test, verbose=2)
print("Loss: {}, Acc: {}".format(loss, acc))

10000/1 - 1s - loss: 0.4401 - accuracy: 0.8447
Loss: 0.46790695462226867, Acc: 0.8446999788284302


# Save the Entire Model

Save the entire model helps you to retrain or transfer the model much easier. No additional model definition is required and the training status, including the optimizer or loss function, as well. On the other hand, exporting as the entire model benefits transferring to the web platform (Tensorflow.js) or the mobile platform (Tensorflow Lite). In common, the `HDF5` format (.h5) and the `SavedModel` format are two mainly used format.

But be much careful of `custom object` which is defined by yourself, it is required while loading the weights into the model. 

## HDF5 format

You can save the HDF5 format model via the `save()` API.

In [0]:
model.save('model_hdf5.h5')

In [39]:
restored_h5_model = tf.keras.models.load_model('model_hdf5.h5')
restored_h5_model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout_2 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_____________________________________________________

In [40]:
loss, acc = restored_h5_model.evaluate(x_test, y_test, verbose=2)
print("Loss: {}, Acc: {}".format(loss, acc))

10000/1 - 1s - loss: 0.4401 - accuracy: 0.8447
Loss: 0.46790695462226867, Acc: 0.8446999788284302


Basically, the entire model saves the weight values, the model architecture, and the optimizer. However, you need to compile the model first while you are going to retrain or fine-tune the model.

## SavedModel format

The SavedModel format is another way to save the entire model and is communicable with Tensorflow runtime, including Serving and its other ecosystems.

In [0]:
!mkdir -p saved_model

In [13]:
model.save('saved_model/SavedModel')

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: saved_model/SavedModel/assets


In [14]:
!ls saved_model

savedmodel  SavedModel


In [15]:
!ls saved_model/SavedModel

assets	saved_model.pb	variables


In [0]:
restored_tf_model = tf.keras.models.load_model('saved_model/SavedModel')

In [19]:
restored_tf_model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        102464    
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                125450    
Total params: 228,554
Trainable params: 228,554
Non-trainable params: 0
_______________________________________________________

In [20]:
loss, acc = restored_tf_model.evaluate(x_test, y_test, verbose=2)
print("Loss: {}, Acc: {}".format(loss, acc))

10000/10000 - 1s - loss: 0.4715 - accuracy: 0.8412
Loss: 0.4714866124153137, Acc: 0.8411999940872192


In [22]:
tf.argmax(restored_tf_model.predict(x_test[:10]), axis=1)

<tf.Tensor: shape=(10,), dtype=int64, numpy=array([9, 2, 1, 1, 6, 1, 4, 6, 5, 7])>

In [23]:
y_test[:10]

array([9, 2, 1, 1, 6, 1, 4, 6, 5, 7], dtype=uint8)

## Save Custom Objects

The main difference between HDF5 and SavedModel format is that HDF5 uses the object configs to save the model and the other SavedModel uses the whole execution graph to save the model. You don't need to add custom objects while exporting the model in SavedModel format.

To save custom objects in HDF5 format, you have to define a `get_config` method in your object and optionally a `from_config` classmethod.

To load the HDF5 model with custom objects via `custom_objects={}`, e.g. `tf.keras.models.load_model(path, custom_objects={'CustomLayer': CustomLayer})`.