batch size究竟是128还是16384 #21

hankcs · 2021-10-23T16:36:26Z

我注意到技术报告中2.1节提到：

We limit the length of sentences in each batch to up to 512 tokens, and the batch size is 128.

这一段后面又提到：

The batch sizes for the two stages are 16384 and 32768, respectively

请问究竟batch size究竟是哪个呢？是否前一个是number of sequences，后面一个是number of tokens？还是由于使用了LAMB所以能支持这么大的batch size？LAMB的paper用的是32868。

Ag2S1 · 2021-10-25T03:22:48Z

感谢提醒，这里的表述可能不够准确。前面的 128 是指单卡 batch size，后面的是 global batch size。
LAMB 使得大 batch size 收敛的更稳定，我们参考其实验编写了对应的规则。实际训练当中，会尽可能从集群资源池中拿到更多的 GPU，但不使 global Batch Size 超过 32768。
下一版报告我们会调整这一部分，避免引起误解。

Ag2S1 closed this as completed Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch size究竟是128还是16384 #21

batch size究竟是128还是16384 #21

hankcs commented Oct 23, 2021 •

edited

Ag2S1 commented Oct 25, 2021

batch size究竟是128还是16384 #21

batch size究竟是128还是16384 #21

Comments

hankcs commented Oct 23, 2021 • edited

Ag2S1 commented Oct 25, 2021

hankcs commented Oct 23, 2021 •

edited