Skip to content

Commit

Permalink
docs: add callbacks documentation (#564)
Browse files Browse the repository at this point in the history
  • Loading branch information
LMMilliken committed Oct 7, 2022
1 parent a44343e commit dfa62f8
Showing 1 changed file with 53 additions and 32 deletions.
85 changes: 53 additions & 32 deletions docs/walkthrough/using-callbacks.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,58 +43,79 @@ The evaluation callback is used to calculate performance metrics for the model b
INFO Pushing artifact to Hubble ... __main__.py:231
```

The evaluation callback is triggered at two stages: at the start of fitting to generate progress bars to track the evaluation progress, and at the end of each epoch, in which the model is evaluated using the query and index datasets that were provided when callback was created. It is worth noting that the evaluation callback and the eval_data parameter do not do the same thing. The eval data parameter takes a Document Array (or the name of one that has been pushed on Hubble) and uses its contents to evaluate the loss pf the model whereas the evaluation callback is used to evaluate the quality of the searches using metrics such as average precision and recall. These search metrics can be used by other callbacks if the evaluation callback is first in the list of callbacks when creating a run.
The evaluation callback is triggered at the end of each epoch, in which the model is evaluated using the `query_data` and `index_data` datasets that were provided when the callback was created. It is worth noting that the evaluation callback and the eval_data parameter of the fir method do not do the same thing. The eval data parameter takes a Document Array (or the name of one that has been pushed on Hubble) and uses its contents to evaluate the loss of the model whereas the evaluation callback is used to evaluate the quality of the searches using metrics such as average precision and recall. These search metrics can be used by other callbacks if the evaluation callback is first in the list of callbacks when creating a run.

## Best Model Checkpoint

This callback evaluated the performance of the model at the end of each epoch, and keeps a record of the best perfoming model across all epochs. Once fitting is finished the best performing model is saved. The definition of 'best' used by the callback is provided by the `monitor` parameter of the callback. By default this value is 'val_loss', the loss function calculated using the evaluation data, however the loss calculated on the training data can be used instead with 'train_loss'; any metric that is recorded by the evaluation callback can also be used. Then, the `mode` parameter specifies wether the monitored metric should be maximised ('max') or minimised ('min'). By default the mode is set to auto, meaning that it will automatically chose the correct mode depending on the chosen metric: 'min' if the metric is loss and 'max' if the metric is one recorded by the evaluation callback.
This callback evaluates the performance of the model at the end of each epoch, and keeps a record of the best perfoming model across all epochs. Once fitting is finished the best performing model is saved instead of the most recent model. The definition of 'best' used by the callback is provided by the `monitor` parameter of the callback. By default this value is 'val_loss', the loss function calculated using the evaluation data, however the loss calculated on the training data can be used instead with 'train_loss'; any metric that is recorded by the evaluation callback can also be used. Then, the `mode` parameter specifies wether the monitored metric should be maximised ('max') or minimised ('min'). By default the mode is set to auto, meaning that it will automatically chose the correct mode depending on the chosen metric: 'min' if the metric is loss and 'max' if the metric is one recorded by the evaluation callback.
The console output below shows how the evaluation loss of the model is monitored between each epoch, and how the best performing model is tracked. Since the final model has a higher loss than the previously recorded best model, the best model will be saved instead of the latest one.

EXAMPLE
```bash
INFO Finetuning ...
__main__.py:173
[11:50:33] INFO Model improved from inf to 2.756!
best_model_checkpoint.py:112
[11:50:52] INFO Model improved from 2.756 to 2.711!
best_model_checkpoint.py:112
[11:51:10] INFO Model did not improve
best_model_checkpoint.py:120
[11:51:28] INFO Model did not improve
best_model_checkpoint.py:120
Training [4/4] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54/54 0:00:00 0:00:15 • loss: 0.496 • val_loss: 2.797
INFO Done ✨
__main__.py:194
DEBUG Finetuning took 0 days, 0 hours 1 minutes and 16 seconds
__main__.py:196
INFO Building the artifact ...
__main__.py:207
INFO Pushing artifact to Hubble ...
__main__.py:231
```

This callback is triggered at the end of both training and evaluation batches, to record the losses of the two data sets, and is triggered a third time at the end of each epoch to evaluate the performance of the current model using the monitored metric and then record this model if it performs better than the best model so far.

## Early Stopping

Similarly to the best model checkpoint callback, the early stopping callback measures a given metric at the end of every epoch and saves the best performing model at the end of the fitting process. Unlike the best model checkpoint callback, the early stopping callback will end the fitting process early if the metric does not improve enough between successive runs. Since the best model is only used to assess the rate of improvement, only the monitored metric is needed and so the model itself is not saved.

REDO THIS PART FOR NEW OUTPUT
The examples below show some code in which the early stopping callback is used, along with the output logs at the end of the tuning. Compared to the results in previous explanations only , the number of epochs was only, compared to the full five, more than halfing the runtime of the training. On the other hand, the final loss recorded is 0.007, more than twice as high as if all five training epochs had completed. While the parameters of the early stopping callback can be adjusted to further improve either the runtime or loss, however this tradeoff between the two outcomes is what the early stopping callback promises to deliver.
Below is some example output for a run with the early stopping callback followed by the output for the same run without the early stopping callback, and then the python code used to create the run. The output for the run with early stopping finished after just ten epochs whereas the other run finished all twenty epochs, resulting in nearly twice the runtime. That said, the resulting loss value of the early stopping run is only 0.284, compared to the full run's 0.272, less than five percent higher. The early stopping callback can be used in this way to reduce the amount of training time while still showing improvement.

```bash
Training [10/20] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54/54 0:00:00 0:00:14 • loss: 0.284
[11:19:28] INFO Done ✨ __main__.py:194
DEBUG Finetuning took 0 days, 0 hours 2 minutes and 30 seconds __main__.py:196
INFO Building the artifact ... __main__.py:207
INFO Pushing artifact to Hubble ... __main__.py:231
```

```bash
Training [20/20] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54/54 0:00:00 0:00:14 • loss: 0.272
[10:37:33] INFO Done ✨ __main__.py:194
DEBUG Finetuning took 0 days, 0 hours 4 minutes and 54 seconds __main__.py:196
INFO Building the artifact ... __main__.py:207
INFO Pushing artifact to Hubble ... __main__.py:231
```

```python
run = finetuner.fit(
model = 'resnet50',
run_name = 'resnet-tll-early-callback',
train_data = 'tll-train-da',
batch_size = 128,
epochs = 5,
learning_rate = 1e-6,
cpu = False,
callbacks=[
model='openai/clip-vit-base-patch32',
run_name='clip-fashion-early',
train_data='clip-fashion-train-data',
epochs=10,
learning_rate= 1e-5,
loss='CLIPLoss',
cpu=False,
callbacks= [
callback.EarlyStopping(
monitor = "train_loss",
mode = "min",
patience=3,
min_delta=0.0001,
baseline = 0.006
)
patience=2,
min_delta=1,
baseline=1.5
)
]
)
```

```bash
Training [2/5] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76/76 0:00:00 0:00:16 • loss: 0.007
[14:34:56] INFO Done ✨ __main__.py:194
INFO Building the artifact ... __main__.py:207
INFO Pushing artifact to Hubble ... __main__.py:231
```

The early stopping callback triggers at the same times as the best model checkpoint callback: at the end of training and evaluation batches to record the loss, and at the end of each epoch to evaluate the model and compare it to the best so far. It differs from the best model checkpoint after this. If the best model is not improved upon by a certain amount specified by the `minimum_delta` parameter, then it is not counted as having improved that epoch (the best model is still updated). If the model does not show improvement after a number of rounds specified by the `patience` parameter, then training is ended early. By default, the `minimum_delta` parameter is zero and the `patience` parameter is two.
The early stopping callback is triggered at the end of both training and evaluation batches, to record the training and evaluation loss respectively.

## Training Checkpoint

The training checkpoint saves a copy of the tuner at the end of each epoch, or the last k epochs, as determined by the `last_k_epochs` parameter which is one by default.

DOT DOT DOT

The training checkpoint callback is only triggered at the end of each epoch. In this time it saves the tuner in its current state and appends it to a list of checkpoints. If the list already has a length of k, then the oldest state is removed from the list.

0 comments on commit dfa62f8

Please sign in to comment.