What is the expected "global batch size"? #50

ohmeow · 2023-11-26T21:47:41Z

In the recipes README there is this statement:

If you scale up/down the number of GPUs, we recommend also scaling up the per-device batch size or number of gradient accumulation steps to keep the global batch size constant (and thus replicate our results).

Q: What is the expected "global batch size"?

For example, I'm trying to run this on 2x3090s and need to know what the expected global batch size is so I can adjust the accumulation steps and per device train batch size.

Thanks much!

timothylimyl · 2023-11-27T02:06:12Z

@ohmeow

For SFT, it's 512.

For DPO, it's 32.

ohmeow closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the expected "global batch size"? #50

What is the expected "global batch size"? #50

ohmeow commented Nov 26, 2023

timothylimyl commented Nov 27, 2023

What is the expected "global batch size"? #50

What is the expected "global batch size"? #50

Comments

ohmeow commented Nov 26, 2023

timothylimyl commented Nov 27, 2023