vignettes-src/writing_your_own_callbacks.Rmd

---
title: Writing your own callbacks
authors: Rick Chao, Francois Chollet
date-created: 2019/03/20
last-modified: 2023/06/25
description: Complete guide to writing new Keras callbacks.
accelerator: GPU
output: rmarkdown::html_vignette
knit: ({source(here::here("tools/knit.R")); knit_vignette})
tether: https://raw.githubusercontent.com/keras-team/keras-io/master/guides/writing_your_own_callbacks.py
---

## Introduction

A callback is a powerful tool to customize the behavior of a Keras model during
training, evaluation, or inference. Examples include `keras.callbacks.TensorBoard`
to visualize training progress and results with TensorBoard, or
`keras.callbacks.ModelCheckpoint` to periodically save your model during training.

In this guide, you will learn what a Keras callback is, what it can do, and how you can
build your own. We provide a few demos of simple callback applications to get you
started.

## Setup

```{r}
library(keras3)
```

## Keras callbacks overview

All callbacks subclass the `keras.callbacks.Callback` class, and
override a set of methods called at various stages of training, testing, and
predicting. Callbacks are useful to get a view on internal states and statistics of
the model during training.

You can pass a list of callbacks (as the keyword argument `callbacks`) to the following
model methods:

- `fit()`
- `evaluate()`
- `predict()`

## An overview of callback methods

### Global methods

#### `on_(train|test|predict)_begin(logs = NULL)`

Called at the beginning of `fit`/`evaluate`/`predict`.

#### `on_(train|test|predict)_end(logs = NULL)`

Called at the end of `fit`/`evaluate`/`predict`.

### Batch-level methods for training/testing/predicting

#### `on_(train|test|predict)_batch_begin(batch, logs = NULL)`

Called right before processing a batch during training/testing/predicting.

#### `on_(train|test|predict)_batch_end(batch, logs = NULL)`

Called at the end of training/testing/predicting a batch. Within this method, `logs` is
a named list containing the metrics results.

### Epoch-level methods (training only)

#### `on_epoch_begin(epoch, logs = NULL)`

Called at the beginning of an epoch during training.

#### `on_epoch_end(epoch, logs = NULL)`

Called at the end of an epoch during training.

## A basic example

Let's take a look at a concrete example. To get started, let's import tensorflow and
define a simple Sequential Keras model:

```{r}
# Define the Keras model to add callbacks to
get_model <- function() {
  model <- keras_model_sequential()
  model |> layer_dense(units = 1)
  model |> compile(
    optimizer = optimizer_rmsprop(learning_rate = 0.1),
    loss = "mean_squared_error",
    metrics = "mean_absolute_error"
  )
  model
}
```

Then, load the MNIST data for training and testing from Keras datasets API:

```{r}
# Load example MNIST data and pre-process it
mnist <- dataset_mnist()

flatten_and_rescale <- function(x) {
  x <- array_reshape(x, c(-1, 784))
  x <- x / 255
  x
}

mnist$train$x <- flatten_and_rescale(mnist$train$x)
mnist$test$x  <- flatten_and_rescale(mnist$test$x)

# limit to 1000 samples
n <- 1000
mnist$train$x <- mnist$train$x[1:n,]
mnist$train$y <- mnist$train$y[1:n]
mnist$test$x  <- mnist$test$x[1:n,]
mnist$test$y  <- mnist$test$y[1:n]
```

Now, define a simple custom callback that logs:

- When `fit`/`evaluate`/`predict` starts & ends
- When each epoch starts & ends
- When each training batch starts & ends
- When each evaluation (test) batch starts & ends
- When each inference (prediction) batch starts & ends

```{r}
show <- function(msg, logs) {
  cat(glue::glue(msg, .envir = parent.frame()),
      "got logs: ", sep = "; ")
  str(logs); cat("\n")
}

callback_custom <- Callback(
  "CustomCallback",
  on_train_begin         = \(logs = NULL)        show("Starting training", logs),
  on_epoch_begin         = \(epoch, logs = NULL) show("Start epoch {epoch} of training", logs),
  on_train_batch_begin   = \(batch, logs = NULL) show("...Training: start of batch {batch}", logs),
  on_train_batch_end     = \(batch, logs = NULL) show("...Training: end of batch {batch}",  logs),
  on_epoch_end           = \(epoch, logs = NULL) show("End epoch {epoch} of training", logs),
  on_train_end           = \(logs = NULL)        show("Stop training", logs),


  on_test_begin          = \(logs = NULL)        show("Start testing", logs),
  on_test_batch_begin    = \(batch, logs = NULL) show("...Evaluating: start of batch {batch}", logs),
  on_test_batch_end      = \(batch, logs = NULL) show("...Evaluating: end of batch {batch}", logs),
  on_test_end            = \(logs = NULL)        show("Stop testing", logs),

  on_predict_begin       = \(logs = NULL)        show("Start predicting", logs),
  on_predict_end         = \(logs = NULL)        show("Stop predicting", logs),
  on_predict_batch_begin = \(batch, logs = NULL) show("...Predicting: start of batch {batch}", logs),
  on_predict_batch_end   = \(batch, logs = NULL) show("...Predicting: end of batch {batch}", logs),
)
```

Let's try it out:
```{r}
model <- get_model()
model |> fit(
  mnist$train$x, mnist$train$y,
  batch_size = 128,
  epochs = 2,
  verbose = 0,
  validation_split = 0.5,
  callbacks = list(callback_custom())
)

res <- model |> evaluate(
  mnist$test$x, mnist$test$y,
  batch_size = 128, verbose = 0,
  callbacks = list(callback_custom())
)

res <- model |> predict(
  mnist$test$x,
  batch_size = 128, verbose = 0,
  callbacks = list(callback_custom())
)
```

### Usage of `logs` list
The `logs` named list contains the loss value, and all the metrics at the end of a batch or
epoch. Example includes the loss and mean absolute error.
```{r}
callback_print_loss_and_mae <- Callback(
  "LossAndErrorPrintingCallback",

  on_train_batch_end = function(batch, logs = NULL)
    cat(sprintf("Up to batch %i, the average loss is %7.2f.\n",
                batch,  logs$loss)),

  on_test_batch_end = function(batch, logs = NULL)
    cat(sprintf("Up to batch %i, the average loss is %7.2f.\n",
                batch, logs$loss)),

  on_epoch_end = function(epoch, logs = NULL)
    cat(sprintf(
      "The average loss for epoch %2i is %9.2f and mean absolute error is %7.2f.\n",
      epoch, logs$loss, logs$mean_absolute_error
    ))
)


model <- get_model()
model |> fit(
  mnist$train$x, mnist$train$y,
  epochs = 2, verbose = 0, batch_size = 128,
  callbacks = list(callback_print_loss_and_mae())
)

res = model |> evaluate(
  mnist$test$x, mnist$test$y,
  verbose = 0, batch_size = 128,
  callbacks = list(callback_print_loss_and_mae())
)
```
For more information about callbacks, you can check out the [Keras callback API documentation](https://keras.posit.co/reference/index.html#callbacks).


## Usage of `self$model` attribute

In addition to receiving log information when one of their methods is called,
callbacks have access to the model associated with the current round of
training/evaluation/inference: `self$model`.

Here are of few of the things you can do with `self$model` in a callback:

- Set `self$model$stop_training <- TRUE` to immediately interrupt training.
- Mutate hyperparameters of the optimizer (available as `self$model$optimizer`),
such as `self$model$optimizer$learning_rate`.
- Save the model at period intervals.
- Record the output of `model |> predict()` on a few test samples at the end of each
epoch, to use as a sanity check during training.
- Extract visualizations of intermediate features at the end of each epoch, to monitor
what the model is learning over time.
- etc.

Let's see this in action in a couple of examples.

## Examples of Keras callback applications

### Early stopping at minimum loss

This first example shows the creation of a `Callback` that stops training when the
minimum of loss has been reached, by setting the attribute `self$model$stop_training`
(boolean). Optionally, you can provide an argument `patience` to specify how many
epochs we should wait before stopping after having reached a local minimum.

`callback_early_stopping()` provides a more complete and general implementation.

```{r}
callback_early_stopping_at_min_loss <- Callback(
  "EarlyStoppingAtMinLoss",
  `__doc__` =
    "Stop training when the loss is at its min, i.e. the loss stops decreasing.

    Arguments:
        patience: Number of epochs to wait after min has been hit. After this
        number of no improvement, training stops.
    ",

  initialize = function(patience = 0) {
    super$initialize()
    self$patience <- patience
    # best_weights to store the weights at which the minimum loss occurs.
    self$best_weights <- NULL
  },

  on_train_begin = function(logs = NULL) {
    # The number of epoch it has waited when loss is no longer minimum.
    self$wait <- 0
    # The epoch the training stops at.
    self$stopped_epoch <- 0
    # Initialize the best as infinity.
    self$best <- Inf
  },

  on_epoch_end = function(epoch, logs = NULL) {
    current <- logs$loss
    if (current < self$best) {
      self$best <- current
      self$wait <- 0L
      # Record the best weights if current results is better (less).
      self$best_weights <- get_weights(self$model)
    } else {
      add(self$wait) <- 1L
      if (self$wait >= self$patience) {
        self$stopped_epoch <- epoch
        self$model$stop_training <- TRUE
        cat("Restoring model weights from the end of the best epoch.\n")
        model$set_weights(self$best_weights)
      }
    }
  },

  on_train_end = function(logs = NULL)
    if (self$stopped_epoch > 0)
      cat(sprintf("Epoch %05d: early stopping\n", self$stopped_epoch + 1))
)
`add<-` <- `+`


model <- get_model()
model |> fit(
  mnist$train$x,
  mnist$train$y,
  epochs = 30,
  batch_size = 64,
  verbose = 0,
  callbacks = list(callback_print_loss_and_mae(),
                   callback_early_stopping_at_min_loss())
)
```


### Learning rate scheduling

In this example, we show how a custom Callback can be used to dynamically change the
learning rate of the optimizer during the course of training.

See `keras$callbacks$LearningRateScheduler` for a more general implementations (in RStudio, press F1 while the cursor is over `LearningRateScheduler` and a browser will open to [this page](https://www.tensorflow.org/versions/r2.5/api_docs/python/tf/keras/callbacks/LearningRateScheduler)).

```{r}
callback_custom_learning_rate_scheduler <- Callback(
  "CustomLearningRateScheduler",
  `__doc__` =
  "Learning rate scheduler which sets the learning rate according to schedule.

    Arguments:
        schedule: a function that takes an epoch index
            (integer, indexed from 0) and current learning rate
            as inputs and returns a new learning rate as output (float).
    ",

  initialize = function(schedule) {
    super$initialize()
    self$schedule <- schedule
  },

  on_epoch_begin = function(epoch, logs = NULL) {
    ## When in doubt about what types of objects are in scope (e.g., self$model)
    ## use a debugger to interact with the actual objects at the console!
    # browser()

    if (!"learning_rate" %in% names(self$model$optimizer))
      stop('Optimizer must have a "learning_rate" attribute.')

    # # Get the current learning rate from model's optimizer.
    # use as.numeric() to convert the keras variablea to an R numeric
    lr <- as.numeric(self$model$optimizer$learning_rate)
    # # Call schedule function to get the scheduled learning rate.
    scheduled_lr <- self$schedule(epoch, lr)
    # # Set the value back to the optimizer before this epoch starts
    optimizer <- self$model$optimizer
    optimizer$learning_rate <- scheduled_lr
    cat(sprintf("\nEpoch %03d: Learning rate is %6.4f.\n", epoch, scheduled_lr))
  }
)

LR_SCHEDULE <- tibble::tribble(
  ~start_epoch, ~learning_rate,
             0,            0.1,
             3,           0.05,
             6,           0.01,
             9,          0.005,
            12,          0.001,
  )

last <- function(x) x[length(x)]
lr_schedule <- function(epoch, learning_rate) {
  "Helper function to retrieve the scheduled learning rate based on epoch."
  with(LR_SCHEDULE, learning_rate[last(which(epoch >= start_epoch))])
}

model <- get_model()
model |> fit(
  mnist$train$x,
  mnist$train$y,
  epochs = 14,
  batch_size = 64,
  verbose = 0,
  callbacks = list(
    callback_print_loss_and_mae(),
    callback_custom_learning_rate_scheduler(lr_schedule)
  )
)
```

### Built-in Keras callbacks

Be sure to check out the existing Keras callbacks by
reading the [API docs](https://keras.posit.co/reference/index.html#callbacks).
Applications include logging to CSV, saving
the model, visualizing metrics in TensorBoard, and a lot more!