Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add one-cycle policy #1776

Merged
merged 1 commit into from
Aug 3, 2020
Merged

Add one-cycle policy #1776

merged 1 commit into from
Aug 3, 2020

Conversation

lucaventurini2
Copy link
Contributor

This PR adds support for the One Cycle policy described by Smith in https://arxiv.org/pdf/1803.09820.pdf .

It is based on the PyTorch implementation: https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.OneCycleLR .
I added only the strictly needed parameters to the trainer (max_lr, total_steps, cycle_momentum) and left the others to default values, as they usually work well, but if one wants to experiment with them we need to add them all to the trainer interface.

In experiments on private datasets, I have seen a behavior that seems similar to the one described in the paper (very fast convergence and good regularization).

I tried also to reproduce this example https://github.com/flairNLP/flair/blob/master/resources/docs/EXPERIMENTS.md#wnut-17-emerging-entity-detection-english and it reached

- F1-score (micro) 0.4925
- F1-score (macro) 0.4034

in 20 epochs.

If this PR is accepted I'd like to add some automated test also, maybe @alanakbik you can guide me in this (e.g. point me to some training test that I can adapt to one cycle)?

@alanakbik
Copy link
Collaborator

@lucaventurini2 thanks for adding this! Could you paste a quick example of a training script that uses this? (and sorry for the late reply - I've been offline)

@lucaventurini2
Copy link
Contributor Author

Yes, the minimum needed change to a script is something like:

from torch.optim.lr_scheduler import OneCycleLR

trainer.train(out_folder, scheduler=OneCycleLR, max_epochs=20)

@alanakbik
Copy link
Collaborator

Ah thanks. So 20 epochs is the recommendation? What about the cycle_momentum, should this be set as well? How did you get the f-score above?

@lucaventurini2
Copy link
Contributor Author

No, I tried 20 to try a number that was considerably less than what I saw in the wnut-17 example, but it's not a recommendation.

The example above is exactly how I got that result, I left the default lr. cycle_momentum should be set when normally we would want to use momentum, but I haven't seen a scenario where it was worthwhile to set, yet (but if you save some experiments where you use momentum, please try it!). The most critical parameters are the batch size and the lr, like explained in the paper, but values that are good with annealing should be good with one-cycle as well, since they are supposedly not diverging the loss.

@lucaventurini2
Copy link
Contributor Author

Just for clearness, in wnut-17 I launched the training with

trainer.train('resources/taggers/example-ner',
              train_with_dev=True, 
              scheduler=OneCycleLR, max_epochs=20)

@alanakbik
Copy link
Collaborator

@lucaventurini2 thanks again for adding this! I've tested a bit and everything looks good!

@alanakbik alanakbik merged commit 928a168 into flairNLP:master Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants