Adjust learning rate #25

liujiqiang999 · 2019-03-03T03:48:44Z

Hi, I noticed that whether it is unsupervised NMT training or MLM training, the learning rate is 0.0001. Is this the learning rate when training with 8 GPUs? If I use 4 GPUs, how to adjust the learning rate and warm-up? Thank you very much.

glample · 2019-03-03T12:39:05Z

Hi,

0.0001 is overall what worked the best in our experiments on 8 GPU. On 64 GPU, we found that the best was between 0.0001 and 0.0003. For 4, I think 0.0001 should do the trick. But it is true that the model is quite sensitive to learning rate tuning (unless it does not have many layers), so I would suggest trying a few values around that to see what is the best.

liujiqiang999 · 2019-03-03T14:33:56Z

Thank you very much.

glample closed this as completed Mar 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust learning rate #25

Adjust learning rate #25

liujiqiang999 commented Mar 3, 2019

glample commented Mar 3, 2019

liujiqiang999 commented Mar 3, 2019

Adjust learning rate #25

Adjust learning rate #25

Comments

liujiqiang999 commented Mar 3, 2019

glample commented Mar 3, 2019

liujiqiang999 commented Mar 3, 2019