#### Writing a Training Loop from Scratch

1. Select Batch of Data
2. Forward Pass
3. Calculate Loss
4. Calculate Gradients
5. Update Weights

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [4]:
# Using the GradientTape API to build a simple model

inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 digits (InputLayer)         [(None, 784)]             0         
                                                                 
 dense_2 (Dense)             (None, 64)                50240     
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 predictions (Dense)         (None, 10)                650       
                                                                 
Total params: 55050 (215.04 KB)
Trainable params: 55050 (215.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [7]:
# Instantiate an optimizer to train the model
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In [5]:
# Prepare the training dataset
batch_size = 50
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))

# Normalize data to float32 in range [0, 1]
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Convert labels to float32
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")

# Reserve 10,000 samples for validation
# x_val = x_train[-10000:]
# y_val = y_train[-10000:]
# x_train = x_train[:-10000]
# y_train = y_train[:-10000]
num_classes = 10

# y_train = keras.utils.to_categorical(y_train, num_classes)
# y_val = keras.utils.to_categorical(y_val, num_classes)

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=2048).batch(batch_size)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
val_dataset = val_dataset.batch(batch_size)

Here's our training loop:

- We open a `for` loop that iterates over epochs.
- For each epoch, we open a `for` loop that iterates over the dataset, in batches.
- For each batch, we a `GradientTape() scope.
- Inside this scope, we call the model (`forward pass`) and compute the loss.
- Outside the scope, we retrieve the gradients of the weights of the model with regard to the loss.
- Finally, we sue the optimizer to update the weights of the model based on the gradients.

In [9]:
epochs = 5

for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    
    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        
        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer.
            # The operations that the layer applies to its inputs are
            # going to be recorded on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch
            
            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)
            
        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)
        
        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * batch_size))


Start of epoch 0
Training loss (for one batch) at step 0: 0.1064
Seen so far: 50 samples
Training loss (for one batch) at step 200: 0.0129
Seen so far: 10050 samples
Training loss (for one batch) at step 400: 0.0053
Seen so far: 20050 samples
Training loss (for one batch) at step 600: 0.0275
Seen so far: 30050 samples
Training loss (for one batch) at step 800: 0.0193
Seen so far: 40050 samples
Training loss (for one batch) at step 1000: 0.1509
Seen so far: 50050 samples

Start of epoch 1
Training loss (for one batch) at step 0: 0.1347
Seen so far: 50 samples
Training loss (for one batch) at step 200: 0.1159
Seen so far: 10050 samples
Training loss (for one batch) at step 400: 0.0274
Seen so far: 20050 samples
Training loss (for one batch) at step 600: 0.0107
Seen so far: 30050 samples
Training loss (for one batch) at step 800: 0.0038
Seen so far: 40050 samples
Training loss (for one batch) at step 1000: 0.0473
Seen so far: 50050 samples

Start of epoch 2
Training loss (for one batch) 

#### Low-Level Handling of Metrics

- Start the metric at the start of the loop
- Call `metric.update_state()` after each batch
- Call `metric.result()` when you need to display the current value of the metric
- Call `metric.reset_states()` when you need to clear the state of the metric (typically at the end of an epoch)

In [10]:
# Get Model
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

# Instantiate an optimizer to train the model
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 digits (InputLayer)         [(None, 784)]             0         
                                                                 
 dense_1 (Dense)             (None, 64)                50240     
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 predictions (Dense)         (None, 10)                650       
                                                                 
Total params: 55050 (215.04 KB)
Trainable params: 55050 (215.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [12]:
# Training and Evaluation loop

import time

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    start_time = time.time()
    
    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        
        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer.
            # The operations that the layer applies to its inputs are
            # going to be recorded on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch
            
            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)
            
        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)
        
        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # Update training metric.
        train_acc_metric.update_state(y_batch_train, logits)
        
        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * batch_size))
    
    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))
    
    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()
    
    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val, training=False)
        val_acc_metric.update_state(y_batch_val, val_logits)
    
    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))


Start of epoch 0
Training loss (for one batch) at step 0: 1.1345
Seen so far: 50 samples
Training loss (for one batch) at step 200: 0.9039
Seen so far: 10050 samples
Training loss (for one batch) at step 400: 0.9850
Seen so far: 20050 samples
Training loss (for one batch) at step 600: 1.0626
Seen so far: 30050 samples
Training loss (for one batch) at step 800: 0.8767
Seen so far: 40050 samples
Training loss (for one batch) at step 1000: 0.9831
Seen so far: 50050 samples
Training acc over epoch: 0.8014
Validation acc: 0.8316
Time taken: 13.48s

Start of epoch 1
Training loss (for one batch) at step 0: 0.9385
Seen so far: 50 samples
Training loss (for one batch) at step 200: 0.7142
Seen so far: 10050 samples
Training loss (for one batch) at step 400: 0.7662
Seen so far: 20050 samples
Training loss (for one batch) at step 600: 0.8152
Seen so far: 30050 samples
Training loss (for one batch) at step 800: 0.9134
Seen so far: 40050 samples
Training loss (for one batch) at step 1000: 0.8010
S