两种情况：
1. 使用 built-in APIs 来训练和验证（如 model.fit(), model.evaluate(), model.predict()）
2. 使用 eager execution 和 GradientTape 写 custom loops

In [1]:
import tensorflow as tf
import numpy as np

### Part I: Using built-in training & evaluation loops

当向内置的 training loops 传数据时，应该使用 Numpy arrays (如果数据量小，内存满足) 或 tf.data.Dataset。在下面的例子中，使用 MNIST datasets，Numpy array 形式。

In [2]:
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

下面就是典型的 end-to-end workflow，包括训练、验证、测试：

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [4]:
model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              # List of metrics to monitor
              metrics=['sparse_categorical_accuracy'])

In [5]:
print('# Fit model on training data')
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=3,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)

# Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

history dict: {'loss': [0.3551591438007355, 0.17198494044065477, 0.12303630582809448], 'sparse_categorical_accuracy': [0.8999, 0.94826, 0.96318], 'val_loss': [0.22055140339136123, 0.14517438811659814, 0.12091187521070242], 'val_sparse_categorical_accuracy': [0.9365, 0.958, 0.9633]}


In [6]:
# Evaluate the model on the test data using 'evaluate'
print('\n# Evaluate on test data')
results = model.evaluate(x_test[:10], y_test[:10], batch_size=128)
print('test loss, test acc: ', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)


# Evaluate on test data
test loss, test acc:  [0.11069478839635849, 0.9]

# Generate predictions for 3 samples
predictions shape: (3, 10)


**Specifying a loss, metrics, and an optimizer**

使用 fit 训练模型时，需要指定 loss function, optimizer 和 metrics to monitor (可选)，传入 compile() 方法：

In [7]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[keras.metrics.sparse_categorical_accuracy])

metrics 是个数组，可以指定多个 metrics。如果模型有多个 outputs，可以对每个 output 指定不同的 losses 和 metrics，而且可以指定权重。也可以使用字符串表示 optimizer, losses, metrics：

In [8]:
model.compile(optimizer='rmsprop',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])

为了后面使用方便，将模型定义和编译阶段分开，后面将会多次调用：

In [9]:
def get_uncompiled_model():
    inputs = keras.Input(shape=(784,), name='digits')
    x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
    x = layers.Dense(64, activation='relu', name='dense_2')(x)
    outputs = layers.Dense(10, name='predictions')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

def get_compiled_model():
    model = get_uncompiled_model()
    model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['sparse_categorical_accuracy'])
    return model

**有很多 built-in 的 optimizer, losses, metrics 可以使用**

Optimizers:
* SGD() (with or without momentum)
* RMSprop()
* Adam()
* etc.

Losses:
* MeanSquaredError()
* KLDivergence()
* CosineSimilarity()
* etc.

Metrics:
* AUC()
* Precision()
* Recall()
* etc.

**Custom losses**

也可以自己定制 losses，下面两个例子展示了两种方法：

In [10]:
def basic_loss_function(y_true, y_pred):
    return tf.math.reduce_mean(tf.abs(y_true - y_pred))

model.compile(optimizer=keras.optimizers.Adam(),
              loss=basic_loss_function)

model.fit(x_train, y_train, batch_size=64, epochs=3)

Train on 50000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1ead2d6bf60>

如果除了 y_true 和 y_pred 还有别的参数，可以继承 tf.keras.losses.Loss，然后实现 __init__(self) 和 call(self, y_true, y_pred) 函数，如：

In [11]:
class WeightedBinaryCrossEntropy(keras.losses.Loss):
    def __init__(self, pos_weight, weight, from_logits=False,
                 reduction=keras.losses.Reduction.AUTO,
                 name='weighted_binary_crossentropy'):
        super().__init__(reduction=reduction, name=name)
        self.pos_weight = pos_weight
        self.weight = weight
        self.from_logits = from_logits
    
    def call(self, y_true, y_pred):
        ce = tf.losses.binary_crossentropy(
            y_true, y_pred, from_logits=self.from_logits)[:,None]
        ce = self.weight * (ce*(1-y_true) + self.pos_weight*ce*(y_true))
        return ce

In [12]:
one_hot_y_train = tf.one_hot(y_train.astype(np.int32), depth=10)  # 需要将 y_train 转为 one-hot 格式

model = get_uncompiled_model()
model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=WeightedBinaryCrossEntropy(
        pos_weight=0.5, weight=2, from_logits=True)
)
model.fit(x_train, one_hot_y_train, batch_size=64, epochs=5)

Train on 50000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1ead2bbd5f8>

**Custom metrics**

也可以实现自己的 Metrics，通过继承 tf.metrics.Metrics，并实现四个方法：
* __init__(self), in which you will create state variables for your metric.
* update_state(self, y_true, y_pred, sample_weight=None), which uses the targets y_true and the model predictions y_pred to update the state variables.
* result(self), which uses the state variables to compute the final results.
* reset_states(self), which reinitializes the state of the metric.

下面的例子展示了怎么实现 CategoricalTruePositives metric, 计算多少样例被正确分类：

In [15]:
class CategoricalTruePositives(keras.metrics.Metric):
    def __init__(self, name='categorical_true_positives', **kwargs):
        super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
        self.true_positives = self.add_weight(name='tp', initializer='zeros')
        
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
        values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32')
        values = tf.cast(values, 'float32')
        if sample_weight is not None:
            sample_weight = tf.cast(sample_weight, 'float32')
            values = tf.multiply(values, sample_weight)
        self.true_positives.assign_add(tf.reduce_sum(values))
        
    def result(self):
        return self.true_positives
    
    def reset_states(self):
        # The state of the metric will be reset at the start of each epoch.
        self.true_positives.assign(0)

In [16]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[CategoricalTruePositives()])
model.fit(x_train, y_train,
          batch_size=64,
          epochs=3)

Train on 50000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1ead2c08d30>

**Handling losses and metrics that don't fit the standard signature**

绝大多数的 losses 和 metrics 都可以通过 y_ture 和 y_pred 求出，但有的，比如 regularization loss 只需要层的信息。这种情况下，需要在层的 call 函数中调用 self.add_loss(loss_value)：

In [17]:
class ActivityRegularizationLayer(layers.Layer):
    def call(self, inputs):
        self.add_loss(tf.reduce_sum(inputs) * 0.1)
        return inputs  # Pass-through layer.
    
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))

# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x1ead6c4d160>

对 metrics 可以同样这么做：

In [18]:
class MetricLoggingLayer(layers.Layer):
    def call(self, inputs):
        # The `aggregation` argument defines
        # how to aggregate the per-batch values
        # over each epoch:
        # in this case we simply average them.
        self.add_metric(keras.backend.std(inputs),
                        name='std_of_activation',
                        aggregation='mean')
        return inputs  # Pass-through layer.

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert std logging as a layer.
x = MetricLoggingLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x1ead7251ba8>

在 Functional API 中，也可以调用 model.add_loss(loss_tensor) 或 model.add_metric(metric_tensor, name, aggregation):

In [19]:
inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

model.add_loss(tf.reduce_sum(x1) * 0.1)

model.add_metric(keras.backend.std(x1),
                 name='std_of_activation',
                 aggregation='mean')

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x1ead88bec88>

**Automatically setting apart a validation holdout set**

除了显式指定 validation_data 外，还可以使用 validation_split 自动地从训练数据中留一部分用于验证，validation_split 的值表示比例，比如 0.2 表示训练数据中 20% 的数据用于验证。 

In [20]:
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1, steps_per_epoch=1)

Train on 40000 samples, validate on 10000 samples
   64/40000 [..............................] - ETA: 5:43 - loss: 2.3050 - sparse_categorical_accuracy: 0.1406 - val_loss: 0.0000e+00 - val_sparse_categorical_accuracy: 0.0000e+00

<tensorflow.python.keras.callbacks.History at 0x1ead8f96940>

**Training & evaluation from tf.data Datasets**

tf.data API 用于导入和预处理数据，比较快速和可拓展，可以直接将 Dataset 实例传到 fit(), evaluate() 和 predict() 中：

In [22]:
model = get_compiled_model()

# First, let's create a training Dataset instance.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Now we get a test dataset.
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(8)

# Since the dataset already takes care of batching,
# we don't pass a `batch_size` argument.
model.fit(train_dataset, epochs=3)

# You can also evaluate
print('\n# Evaluate')
result = model.evaluate(test_dataset)
dict(zip(model.metrics_names, result))

Epoch 1/3
Epoch 2/3
Epoch 3/3

# Evaluate


{'loss': 0.14213139407945272, 'sparse_categorical_accuracy': 0.9586}

可以看到，Dataset 在每个 epoch 结束时会重置，所以在下一个 epoch 可以复用。

如果只想使用 Dataset 一部分 batches 用于训练，可以使用 steps_per_epoch 参数。这种情况下，在每个 epoch 结束时，dataset 不会重置，而是接着用剩下的 batches：

In [24]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
print(len(x_train))
train_dataset = train_dataset.shuffle(buffer_size=2014).batch(64).repeat()

# Only use the 100 batches per epoch (that's 64 * 100 samples)
model.fit(train_dataset, steps_per_epoch=100, epochs=3)

50000
Train for 100 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1eaddc5bda0>

也可以将 Dataset 实例作为 validation_data 参数传入 fit：

In [25]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3, validation_data=val_dataset)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1ead6c19ac8>

在每轮结束时，模型都会根据 validation Dataset 计算出 validation loss 和 metrics。如果想对一部分 batches 执行 validation，可以传入 validation_steps：

In [27]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3,
          # Only run validation using the first 10 batches of the dataset
          # using the `validation_steps` argument
          validation_data=val_dataset, validation_steps=10)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1eada9a0710>

当用 Dataset 对象训练模型时，validation_split 不可用。

**Other input formats supported**

除了 Numpy arrays 和 Tensorflow Datasets 外，还可以使用 Pandas dataframes 或 Python generators that yield batches 来训练。但通常，如果数据量比较小，推荐使用 Numpy，否则使用 Datasets。

**Using sample weighting and class weighting**

除了输入数据和目标数据外，还可以传 sample weights, class weights 到 fit 中：
* When training from Numpy data: via the sample_weight and class_weight arguments.
* When training from Datasets: by having the Dataset return a tuple (input_batch, target_batch, sample_weight_batch)

In [28]:
import numpy as np

class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
                # Set weight "2" for class "5",
                # making this class 2x more important
                5: 2.,
                6: 1., 7: 1., 8: 1., 9: 1.}
print('Fit with class weight')
model.fit(x_train, y_train,
          class_weight=class_weight,
          batch_size=64,
          epochs=4)

Fit with class weight
Train on 50000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x1eade46dbe0>

In [31]:
# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
print('\nFit with sample weight')

model = get_compiled_model()
model.fit(x_train, y_train,
          sample_weight=sample_weight,
          batch_size=64,
          epochs=4)


Fit with sample weight
Train on 50000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x1eadaf20748>

In [32]:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1eadfaa18d0>

**Passing data to multi-input, multi-output models**

上面的例子中，输入和输出都是一个，实际上，输入和输出可以是多个。比如：

In [33]:
from tensorflow import keras
from tensorflow.keras import layers

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
                    outputs=[score_output, class_output])

model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]},
    loss_weights={'score_output': 2., 'class_output': 1.})

# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))

# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
          batch_size=32,
          epochs=3)

# Alternatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
          {'score_output': score_targets, 'class_output': class_targets},
          batch_size=32,
          epochs=3)

Train on 100 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Train on 100 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1eaed52bc50>

**Using callbacks**

Keras 中的 Callbacks 能在训练中被调用（如在 epoch 开始时，在 batch 结束时，在 epoch 结束时）等，可以实现如下功能：
* 训练过程中执行验证
* 满足条件时保存 checkpoint
* 当训练平稳时改变学习率
* 当训练平稳时对 top layers 执行 fine-tuning
* 训练结束或某性能阈值到达时发送邮件或其他消息提示

Callbacks 可以作为 list 传入 fit:

In [34]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.EarlyStopping(
        # Stop training when `val_loss` is no longer improving
        monitor='val_loss',
        # "no longer improving" being defined as "no better than 1e-2 less"
        min_delta=1e-2,
        # "no longer improving" being further defined as "for at least 2 epochs"
        patience=2,
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=20,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 00006: early stopping


<tensorflow.python.keras.callbacks.History at 0x1eaed6d82b0>

内置的 callbacks 为：
* ModelCheckpoint: 保存模型
* EarlyStopping: 验证 metrics 不增长时结束训练
* TensorBoard: 保存模型 logs，可以在 TensorBoard 中可视化
* CSVLogger: streams loss and metrics data to a CSV file.

**Writing your own callback**

可以通过继承 keras.callbacks.Callback 来定制自己的 callback:

In [35]:
class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs):
        self.losses = []

    def on_batch_end(self, batch, logs):
        self.losses.append(logs.get('loss'))

**Checkpointing models**

当数据量比较大时，保存模型的 checkpoints 很重要，最简单的方式是使用 ModelCheckpoint callback:

In [36]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath='mymodel_{epoch}',
        # Path where to save the model
        # The two parameters below mean that we will overwrite
        # the current checkpoint if and only if
        # the `val_loss` score has improved.
        save_best_only=True,
        monitor='val_loss',
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=3,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

Train on 40000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss improved from inf to 0.22211, saving model to mymodel_1
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: mymodel_1\assets
Epoch 2/3
Epoch 00002: val_loss improved from 0.22211 to 0.16985, saving model to mymodel_2
INFO:tensorflow:Assets written to: mymodel_2\assets
Epoch 3/3
Epoch 00003: val_loss improved from 0.16985 to 0.15109, saving model to mymodel_3
INFO:tensorflow:Assets written to: mymodel_3\assets


<tensorflow.python.keras.callbacks.History at 0x1eaeee4a4a8>

**Using learning rate schedules**

训练过程中逐渐减小学习率，即学习率衰减，实现机制可以是静态的（fixed in advance, as a function of the current epoch or the current batch index），也可以是动态的（responding to the current behavior of the model, in particular the validation loss）。

可以通过传 learning_rate 来实现静态学习率衰减：

In [37]:
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

内置的学习率 callback 为 ReduceLROnPlateau callback。

### Part II: Writing your own training & evaluation loops from scratch

**Using the GradientTape: a first end-to-end example**

在 GradientTape scope 内调用模型可以得到训练权重的梯度，使用优化器，就可以更新这些权重。

In [38]:
# Get the model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

In [40]:
epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))
    
    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        # Open a GradientTape to record the operations run
        # during the forward pass, which enables autodifferentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch
            
            # Compute the loss value for this minibatch
            loss_value = loss_fn(y_batch_train, logits)
            
        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # Log every 200 batches.
        if step % 200 == 0:
            print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
            print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 2.4334452152252197
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.2208168506622314
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.164945125579834
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.1104955673217773
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.126343011856079
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.987076997756958
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.848045825958252
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.8498668670654297
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.7770605087280273
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.5975961685180664
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.6125603914260864
Seen so far: 256

**Low-level handling of metrics**

加入 metrics 的流程为：
* 在 loop 开始出实例化 metric
* 每个 batch 后调用 metric.update_state()
* 需要展示 metric 的值时调用 metric.result()
* 清除 metric 状态时调用 metric.reset_states() (通常在 epoch 结束时)

In [41]:
# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

In [42]:
epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train)
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # Update training metric.
        train_acc_metric(y_batch_train, logits)
        
        # Log every 200 batches
        if step % 200 == 0:
            print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
            print('Seen so far: %s samples' % ((step + 1) * 64))
            
    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print('Training acc over epoch: %s' % (float(train_acc),))
    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()
    
    # Run a validation loop at the end of each epoch
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val)
        # Update val metrics
        val_acc_metric(y_batch_val, val_logits)
    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print('Validation acc: %s' % (float(val_acc),))

Start of epoch 0
Training loss (for one batch) at step 0: 2.3310351371765137
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.230015277862549
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2347941398620605
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.1749868392944336
Seen so far: 38464 samples
Training acc over epoch: 0.22519999742507935
Validation acc: 0.41119998693466187
Start of epoch 1
Training loss (for one batch) at step 0: 1.988825798034668
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.9822006225585938
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8704516887664795
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.831594705581665
Seen so far: 38464 samples
Training acc over epoch: 0.5037800073623657
Validation acc: 0.6004999876022339
Start of epoch 2
Training loss (for one batch) at step 0: 1.7841203212738037
Seen so far: 64 samples
Traini

**Low-level handling of extra losses**

在前面的小节有个例子：

In [43]:
class ActivityRegularizationLayer(layers.Layer):
    def call(self, inputs):
        self.add_loss(1e-2 * tf.reduce_sum(inputs))
        return inputs

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

当调用模型时，前向过程产生的 losses 会被加入到 model.losses 属性中，比如：

In [44]:
logits = model(x_train)
print(model.losses)

[<tf.Tensor: id=693212, shape=(), dtype=float32, numpy=6521.1426>]


多次调用模型，保存的是最后一次的 losses，如：

In [45]:
logits = model(x_train[:64])
logits = model(x_train[64: 128])
logits = model(x_train[128: 192])
print(model.losses)

[<tf.Tensor: id=693269, shape=(), dtype=float32, numpy=8.187633>]


如果想要保存所有这些 loss，需要做的就是加上 sum(model.losses)：

In [46]:
optimizer = keras.optimizers.SGD(learning_rate=1e-3)

epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))
    
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train)
            loss_value = loss_fn(y_batch_train, logits)
            
            # Add extra losses created during this forward pass:
            loss_value += sum(model.losses)
            
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # Log every 200 batches:
        if step % 200 == 0:
            print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
            print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 10.775253295898438
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.5282602310180664
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.374560594558716
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.3456227779388428
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.336315870285034
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.326626777648926
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.32987904548645
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.3131821155548096
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 2.308389663696289
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3308935165405273
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.3269221782684326
Seen so far: 25664