GH-1072: Batch-growth annealing #1138

alanakbik · 2019-09-20T09:51:38Z

The paper Don't Decay the Learning Rate, Increase the Batch Size makes the case for increasing the batch size over time instead of annealing the learning rate.

This PR adds the possibility to have arbitrarily large mini-batch sizes with an accumulating gradient strategy (closes #1072). It introduces the parameter mini_batch_chunk_size that you can set to break down large mini-batches into smaller chunks for processing purposes.

So let's say you want to have a mini-batch size of 128, but your memory cannot handle more than 32 samples at a time. Then you can train like this:

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set large mini-batch size
    mini_batch_size=128,
    # set chunk size to lower memory requirements
    mini_batch_chunk_size=32,
)

Because we now can arbitrarly raise mini-batch size, we can now execute the annealing strategy in the above paper. Do it like this:

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set initial mini-batch size
    mini_batch_size=32,
    # choose batch growth annealing 
    batch_growth_annealing=True,
)

This will double the mini-batch size each time the learning rate anneals. You can also combine this with "annealing with restarts" in which the last best model state is restored each time the learning rate anneals.

trainer = ModelTrainer(tagger, corpus)
trainer.train(
    "path/to/experiment/folder",
    # set initial mini-batch size
    mini_batch_size=32,
    # choose batch growth annealing 
    batch_growth_annealing=True,
    # reset model state to best on each anneal
    anneal_with_restarts=True,
)

yosipk · 2019-09-20T10:48:18Z

👍

alanakbik · 2019-09-20T10:49:27Z

👍

lucaventurini · 2019-10-28T15:43:19Z

Hi @alanakbik , how do we fix the lr with this method? As I understood, we shouldn't reduce anymore the lr if we already double the batch size, am I correct?

I tried setting the anneal factor to 1.0, but I got an error. Also, should there be a max_batch_size equivalent to the min_learning_rate to stop early?

aakbik added 3 commits September 18, 2019 18:50

GH-1072: add mini batch annealing and accumulate gradients

0306564

GH-1072: rename variable

36a5715

GH-1072: rename variable

c1579c9

alanakbik merged commit 984c5a9 into master Sep 20, 2019

alanakbik deleted the GH-1072-accumulate-gradients branch September 20, 2019 10:50

alanakbik mentioned this pull request Oct 29, 2019

More options in ModelTrainer #1252

Closed

This was referenced Nov 25, 2019

Is there a way to train your model on two GPUs (in case gpu 0 runs out of memory,we could use gpu 1)? #1287

Closed

Experiments on one-cycle policy (super convergence) #1282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-1072: Batch-growth annealing #1138

GH-1072: Batch-growth annealing #1138

alanakbik commented Sep 20, 2019 •

edited

yosipk commented Sep 20, 2019

alanakbik commented Sep 20, 2019

lucaventurini commented Oct 28, 2019

GH-1072: Batch-growth annealing #1138

GH-1072: Batch-growth annealing #1138

Conversation

alanakbik commented Sep 20, 2019 • edited

yosipk commented Sep 20, 2019

alanakbik commented Sep 20, 2019

lucaventurini commented Oct 28, 2019

alanakbik commented Sep 20, 2019 •

edited