Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the expected "global batch size"? #50

Closed
ohmeow opened this issue Nov 26, 2023 · 1 comment
Closed

What is the expected "global batch size"? #50

ohmeow opened this issue Nov 26, 2023 · 1 comment

Comments

@ohmeow
Copy link

ohmeow commented Nov 26, 2023

In the recipes README there is this statement:

If you scale up/down the number of GPUs, we recommend also scaling up the per-device batch size or number of gradient accumulation steps to keep the global batch size constant (and thus replicate our results).

Q: What is the expected "global batch size"?

For example, I'm trying to run this on 2x3090s and need to know what the expected global batch size is so I can adjust the accumulation steps and per device train batch size.

Thanks much!

@timothylimyl
Copy link

@ohmeow

For SFT, it's 512.

For DPO, it's 32.

@ohmeow ohmeow closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants