In some rare cases, the ```fit()``` method may not be flexible enough for what you need
to do. For example, **the Wide and Deep paper we discussed ago** actually uses two different optimizers: one for the wide path and the other for the deep path.
Since the ```fit()``` method only uses one optimizer (the one that we specify when compiling the model), implementing this paper requires writing your own custom
loop.

You may also like to write your own custom training loops simply to feel more 
confident that it does precisely what you intent it to do (perhaps you are unsure about
some details of the ```fit()``` method). It can sometimes feel safer to make everything
explicit. However, remember that writing a custom training loop will make your code
longer, more error prone and harder to maintain.


Unless you really need the extra flexibility, you should prefer using
the ```fit()``` method rather than implementing your own training loop.

In [8]:
import tensorflow as tf
from sklearn import model_selection, preprocessing
from sklearn import datasets
import numpy as np
import time
print(tf.__version__)

2.0.0-beta1


In [4]:
# Using the sklearn california housing data
housing = datasets.fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = model_selection.train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = model_selection.train_test_split(X_train_full, y_train_full)

# Standard scaling the data
scaler = preprocessing.StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

#### First build a simple model. No need to compile it since we are going to handle the training loop manually.

In [5]:
l2_reg = tf.keras.regularizers.l2(0.05)
model = tf.keras.models.Sequential([tf.keras.layers.Dense(30, 
                                                          activation="elu", 
                                                          kernel_initializer="he_normal", 
                                                          kernel_regularizer=l2_reg),
                                    tf.keras.layers.Dense(1, kernel_regularizer=l2_reg)])

In [9]:
# Random batch generation
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

In [13]:
# function to display the training status

def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
    for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{}/{} - ".format(iteration, total) + metrics,
    end=end)

In [14]:
mean_loss = tf.keras.metrics.Mean(name="loss")
mean_square = tf.keras.metrics.Mean(name="mean_square")
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)

50/50 - loss: 0.0900 - mean_square: 858.5000


In [15]:
# A fancier version
def progress_bar(iteration, total, size=30):
    running = iteration < total
    c = ">" if running else "="
    p = (size - 1) * iteration // total
    fmt = "{{:-{}d}}/{{}} [{{}}]".format(len(str(total)))
    params = [iteration, total, "=" * p + c + "." * (size - p - 1)]
    return fmt.format(*params)

In [20]:
progress_bar(3500, 10000, size=6)

' 3500/10000 [=>....]'

In [21]:
def print_status_bar(iteration, total, loss, metrics=None, size=30):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                         for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{} - {}".format(progress_bar(iteration, total), metrics), end=end)

In [22]:
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)



This code is self-explanatory, unless you are unfamiliar with Python string format‐
ting: {:.4f} will format a float with 4 digits after the decimal point. Moreover, using
\r (carriage return) along with end="" ensures that the status bar always gets printed
on the same line. In the notebook, the print_status_bar() function also includes a
progress bar, but you could use the handy tqdm library instead.

In [37]:
n_epochs = 5
batch_size = 32
n_steps = len(X_train_scaled) // batch_size
optimizer = tf.keras.optimizers.Adam(lr=0.0001)
loss_fn = tf.keras.losses.mean_squared_error
mean_loss = tf.keras.metrics.Mean()
metrics = [tf.keras.metrics.MeanAbsoluteError()]

In [47]:
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    # print("*"*10)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    # print("*"*10)
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


• We create two nested loops: one for the epochs, the other for the batches within
an epoch.

• Then we sample a random batch from the training set.

• Inside the tf.GradientTape() block, we make a prediction for one batch (using
the model as a function), and we compute the loss: it is equal to the main loss
plus the other losses (in this model, there is one regularization loss per layer).
Since the mean_squared_error() function returns one loss per instance, we
compute the mean over the batch using tf.reduce_mean() (if you wanted to
apply different weights to each instance, this is where you would do it). The 
regularization losses are already reduced to a single scalar each, so we just need to
sum them (using tf.add_n() , which sums multiple tensors of the same shape
and data type).

• Next, we ask the tape to compute the gradient of the loss with regards to each
trainable variable (not all variables!), and we apply them to the optimizer to 
perform a Gradient Descent step.

• Next we update the mean loss and the metrics (over the current epoch), and we
display the status bar.

• At the end of each epoch, we display the status bar again to make it look 
complete 11 and to print a line feed, and we reset the states of the mean loss and the
metrics.

Most importantly, this training loop does not handle layers that behave differently
during training and testing (e.g., BatchNormalization or Dropout ). To handle these,
you need to call the model with training=True and make sure it propagates this to
every layer that needs it.