docs: add checkpoints to documentation (#312)

* docs: add checkpoints to documentation * fix: commit suggestion Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Wang Bo <bo.wang@jina.ai>
jina-ai · Jan 4, 2022 · 0eae320 · 0eae320
1 parent 8ff5649
commit 0eae320
Show file tree

Hide file tree

Showing 2 changed files with 49 additions and 8 deletions.
diff --git a/docs/components/tuner.md b/docs/components/tuner.md
@@ -3,10 +3,11 @@
 Tuner is one of the three key components of Finetuner. Given an {term}`embedding model` and {term}`labeled dataset` (see {ref}`the guide on data formats<data-format>` for more information), Tuner trains the model to fit the data.
 
 With Tuner, you can customize the training process to best fit your data, and track your experiements in a clear and transparent manner. You can do things like
-- choose between different loss functions, use hard negative mining for triplets/pairs
-- set your own optimizers and learning rates
-- track the training and evaluation metrics with Weights and Biases
-- write custom callbacks
+- Choose between different loss functions, use hard negative mining for triplets/pairs
+- Set your own optimizers and learning rates
+- Track the training and evaluation metrics with Weights and Biases
+- Save checkpoints during training
+- Write custom callbacks
 
 You can read more on these different options here or in these sub-sections:
 
@@ -203,12 +204,13 @@ Then we can create the {class}`~finetuner.tuner.pytorch.PytorchTuner` object. In
 - Triplet loss with hard miner with the easy positive and semihard negative strategy
 - Adam optimizer with initial learning rate of 0.0005, which will be halved every 30 epochs
 - WandB for tracking the experiement
+- A {class}`~finetuner.tuner.callback.training_checkpoint.TrainingCheckpoint` to save a checkpoint every epoch - if training is interrupted we can later continue from this checkpoint. We need to create a `checkpoints/` folder inside our current directory to store checkpoints there.
 
 ```python
 from torch.optim import Adam
 from torch.optim.lr_scheduler import MultiStepLR
 
-from finetuner.tuner.callback import WandBLogger
+from finetuner.tuner.callback import WandBLogger, TrainingCheckpoint
 from finetuner.tuner.pytorch import PytorchTuner
 from finetuner.tuner.pytorch.losses import TripletLoss
 from finetuner.tuner.pytorch.miner import TripletEasyHardMiner
@@ -225,13 +227,14 @@ loss = TripletLoss(
     miner=TripletEasyHardMiner(pos_strategy='easy', neg_strategy='semihard')
 )
 logger_callback = WandBLogger()
+checkpoint = TrainingCheckpoint('checkpoints')
 
 tuner = PytorchTuner(
     embed_model,
     loss=loss,
     configure_optimizer=configure_optimizer,
     scheduler_step='epoch',
-    callbacks=[logger_callback],
+    callbacks=[logger_callback, checkpoint],
     device='cpu',
 )
 ```
@@ -244,7 +247,7 @@ from torch.optim import Adam
 from torch.optim.lr_scheduler import MultiStepLR
 
 from finetuner.toydata import generate_fashion
-from finetuner.tuner.callback import WandBLogger
+from finetuner.tuner.callback import WandBLogger, TrainingCheckpoint
 from finetuner.tuner.pytorch import PytorchTuner
 from finetuner.tuner.pytorch.losses import TripletLoss
 from finetuner.tuner.pytorch.miner import TripletEasyHardMiner
@@ -276,13 +279,14 @@ loss = TripletLoss(
     miner=TripletEasyHardMiner(pos_strategy='easy', neg_strategy='semihard')
 )
 logger_callback = WandBLogger()
+checkpoint = TrainingCheckpoint('checkpoints')
 
 tuner = PytorchTuner(
     embed_model,
     loss=loss,
     configure_optimizer=configure_optimizer,
     scheduler_step='epoch',
-    callbacks=[logger_callback],
+    callbacks=[logger_callback, checkpoint],
     device='cpu',
 )
 

diff --git a/docs/components/tuner/callbacks.md b/docs/components/tuner/callbacks.md
@@ -3,6 +3,7 @@
 Callbacks offer a way to integrate various auxiliary tasks into the training loop. We offer built-in callbacks for some common tasks, such as
 - Showing a progress bar (which is shown by default)
 - [Tracking experiements](#experiement-tracking)
+- [Checkpoint training progress](#checkpoints)
 
 You can also [write your own callbacks](#custom-callbacks).
 
@@ -32,6 +33,42 @@ tuner = PytorchTuner(..., callbacks=[logger])
 
 You should then be able to see your training runs in wandb.
 
+## Checkpoints
+
+On long train jobs, you may want to periodically save the progress, so you can continue
+from this checkpoint later, if training gets interrupted. Or you may want to save the
+best model, so that you can use it after the training finishes. For these purposes, we
+offer {class}`~finetuner.tuner.callback.training_checkpoint.TrainingCheckpoint` and {class}`~finetuner.tuner.callback.best_model_checkpoint.BestModelCheckpoint`, respectively.
+
+For the  {class}`~finetuner.tuner.callback.training_checkpoint.TrainingCheckpoint` checkpoint, you would simply add it to `callbacks`. Later you could then load the tuner from the checkpoint, as in the example below
+
+
+```python
+from finetuner.tuner.callback import TrainingCheckpoint
+from finetuner.tuner.pytorch import PytorchTuner
+
+checkpoint = TrainingCheckpoint('checkpoints')
+
+tuner = PytorchTuner(..., callbacks=[checkpoint])
+
+# Afterwards, load tuner from the saved theckpoint
+TrainingCheckpoint.load(tuner, 'checkpoints/saved_model_epoch_10')
+```
+
+For the {class}`~finetuner.tuner.callback.best_model_checkpoint.BestModelCheckpoint`, you would also add it to `callbacks`, and later you could load the model from it.
+
+```python
+from finetuner.tuner.callback import BestModelCheckpoint
+from finetuner.tuner.pytorch import PytorchTuner
+
+checkpoint = BestModelCheckpoint('checkpoints')
+
+tuner = PytorchTuner(..., callbacks=[checkpoint])
+
+# Afterwards, load model from the saved theckpoint
+BestModelCheckpoint.load_model(tuner, 'checkpoints/best_model_val_loss')
+```
+
 ## Custom callbacks
 
 If the existing callbacks don't provide the functionality you need, you can easily write your own.