Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-1072: Batch-growth annealing #1138

Merged
merged 3 commits into from Sep 20, 2019
Merged

Conversation

alanakbik
Copy link
Collaborator

@alanakbik alanakbik commented Sep 20, 2019

The paper Don't Decay the Learning Rate, Increase the Batch Size makes the case for increasing the batch size over time instead of annealing the learning rate.

This PR adds the possibility to have arbitrarily large mini-batch sizes with an accumulating gradient strategy (closes #1072). It introduces the parameter mini_batch_chunk_size that you can set to break down large mini-batches into smaller chunks for processing purposes.

So let's say you want to have a mini-batch size of 128, but your memory cannot handle more than 32 samples at a time. Then you can train like this:

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set large mini-batch size
    mini_batch_size=128,
    # set chunk size to lower memory requirements
    mini_batch_chunk_size=32,
)

Because we now can arbitrarly raise mini-batch size, we can now execute the annealing strategy in the above paper. Do it like this:

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set initial mini-batch size
    mini_batch_size=32,
    # choose batch growth annealing 
    batch_growth_annealing=True,
)

This will double the mini-batch size each time the learning rate anneals. You can also combine this with "annealing with restarts" in which the last best model state is restored each time the learning rate anneals.

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set initial mini-batch size
    mini_batch_size=32,
    # choose batch growth annealing 
    batch_growth_annealing=True,
    # reset model state to best on each anneal
    anneal_with_restarts=True,
)

@yosipk
Copy link
Collaborator

yosipk commented Sep 20, 2019

👍

1 similar comment
@alanakbik
Copy link
Collaborator Author

👍

@alanakbik alanakbik merged commit 984c5a9 into master Sep 20, 2019
@alanakbik alanakbik deleted the GH-1072-accumulate-gradients branch September 20, 2019 10:50
@lucaventurini
Copy link

Hi @alanakbik , how do we fix the lr with this method? As I understood, we shouldn't reduce anymore the lr if we already double the batch size, am I correct?

I tried setting the anneal factor to 1.0, but I got an error. Also, should there be a max_batch_size equivalent to the min_learning_rate to stop early?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accumulating gradients to enable larger mini-batches
3 participants