While the framework is flexible enough to deal with any kind of trainers, we encourage the use of a framework to manage your training loops. We found that Ignite provides everything we could expect from a training management system.
Ignite defines 6 classes of events, defining a training loop:
- STARTED: start the training loop
- EPOCH_STARTED: start an epoch
- ITERATION_STARTED: start processing of one batch
- ITERATION_COMPLETED: complete processing of one batch
- EPOCH_COMPLETED: complete a full epoch
- COMPLETED: complete the training loop
Ignite allows to perform some actions at each of these events, by simply adding events.
Here are some examples of events you can do:
- Track metrics and log them on the terminal
- Log metrics, parameters norms, histograms, distributions, etc.. to Tensorboard (via TensorboardX)
- Learning schedulers: adapt the learning rates at different times of the training. A good example is the Cyclical learning rate scheduling, which has proven successful in models like ULMFit
- Model checkpointing: save your model periodically if it improves
- Early stopping: stop training when no learning is ever observed
- Terminate on NaNs: terminates the training when nans or infinite values are encountered.
- Timers
- ...
We provide a BasicTrainer class which should set you up for most cases in the supervised single task setting. For more complex settings like multi-task learning, you might want to change the _update and _inference methods to fit several tasks objectives / loss functions.