Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for the One Cycle policy described by Smith in https://arxiv.org/pdf/1803.09820.pdf .
It is based on the PyTorch implementation: https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.OneCycleLR .
I added only the strictly needed parameters to the trainer (max_lr, total_steps, cycle_momentum) and left the others to default values, as they usually work well, but if one wants to experiment with them we need to add them all to the trainer interface.
In experiments on private datasets, I have seen a behavior that seems similar to the one described in the paper (very fast convergence and good regularization).
I tried also to reproduce this example https://github.com/flairNLP/flair/blob/master/resources/docs/EXPERIMENTS.md#wnut-17-emerging-entity-detection-english and it reached
in 20 epochs.
If this PR is accepted I'd like to add some automated test also, maybe @alanakbik you can guide me in this (e.g. point me to some training test that I can adapt to one cycle)?