Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Adjust learning rate #25

Closed
liujiqiang999 opened this issue Mar 3, 2019 · 2 comments
Closed

Adjust learning rate #25

liujiqiang999 opened this issue Mar 3, 2019 · 2 comments

Comments

@liujiqiang999
Copy link

Hi, I noticed that whether it is unsupervised NMT training or MLM training, the learning rate is 0.0001. Is this the learning rate when training with 8 GPUs? If I use 4 GPUs, how to adjust the learning rate and warm-up? Thank you very much.

@glample
Copy link
Contributor

glample commented Mar 3, 2019

Hi,

0.0001 is overall what worked the best in our experiments on 8 GPU. On 64 GPU, we found that the best was between 0.0001 and 0.0003. For 4, I think 0.0001 should do the trick. But it is true that the model is quite sensitive to learning rate tuning (unless it does not have many layers), so I would suggest trying a few values around that to see what is the best.

@liujiqiang999
Copy link
Author

Thank you very much.

@glample glample closed this as completed Mar 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants