Experiment with constant learning rate with transformers #24

cwmeijer · 2020-11-03T08:55:34Z

Do a couple of runs with constant learning, varied systematically between runs, and find out if we can find some learning rate that consistently reduces loss. After that we can experiment with more complicated schemes.

egpbos · 2020-11-13T15:59:12Z

I started a run, effortless-cosmos-19, with the cyclic scheduler with maximum and minimum set to values that may be more sane for this model (based on this article). So far, it is at least outperforming all previous runs. Will let it run until Monday 8 a.m. (according to my estimation) and then we'll see how it went.

Already, now, one interesting feature is the correlation of the periodic pattern in the loss with the learning rate. This might suggest that, as also mentioned in the article, some kind of decreasing maximum might be helpful to press down the big (seemingly useless) peaks in the loss and make it converge even faster.

egpbos · 2020-11-13T16:11:51Z

New run dainty-dawn-20 also logs validation loss on wandb.

egpbos · 2020-11-16T09:15:02Z

I think we can close this, since the training loss is very effectively reduced now. Next problem is validation loss, issue #30.

cwmeijer added this to the Transformers milestone Nov 3, 2020

egpbos closed this as completed Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with constant learning rate with transformers #24

Experiment with constant learning rate with transformers #24

cwmeijer commented Nov 3, 2020 •

edited

Loading

egpbos commented Nov 13, 2020

egpbos commented Nov 13, 2020

egpbos commented Nov 16, 2020

Experiment with constant learning rate with transformers #24

Experiment with constant learning rate with transformers #24

Comments

cwmeijer commented Nov 3, 2020 • edited Loading

egpbos commented Nov 13, 2020

egpbos commented Nov 13, 2020

egpbos commented Nov 16, 2020

cwmeijer commented Nov 3, 2020 •

edited

Loading