Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch size究竟是128还是16384 #21

Closed
hankcs opened this issue Oct 23, 2021 · 1 comment
Closed

batch size究竟是128还是16384 #21

hankcs opened this issue Oct 23, 2021 · 1 comment

Comments

@hankcs
Copy link

hankcs commented Oct 23, 2021

我注意到技术报告中2.1节提到:

We limit the length of sentences in each batch to up to 512 tokens, and the batch size is 128.

这一段后面又提到:

The batch sizes for the two stages are 16384 and 32768, respectively

请问究竟batch size究竟是哪个呢?是否前一个是number of sequences,后面一个是number of tokens?还是由于使用了LAMB所以能支持这么大的batch size?LAMB的paper用的是32868。

@Ag2S1
Copy link
Contributor

Ag2S1 commented Oct 25, 2021

感谢提醒,这里的表述可能不够准确。前面的 128 是指单卡 batch size,后面的是 global batch size。
LAMB 使得大 batch size 收敛的更稳定,我们参考其实验编写了对应的规则。实际训练当中,会尽可能从集群资源池中拿到更多的 GPU,但不使 global Batch Size 超过 32768。
下一版报告我们会调整这一部分,避免引起误解。

@Ag2S1 Ag2S1 closed this as completed Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants