You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you scale up/down the number of GPUs, we recommend also scaling up the per-device batch size or number of gradient accumulation steps to keep the global batch size constant (and thus replicate our results).
Q: What is the expected "global batch size"?
For example, I'm trying to run this on 2x3090s and need to know what the expected global batch size is so I can adjust the accumulation steps and per device train batch size.
Thanks much!
The text was updated successfully, but these errors were encountered:
In the recipes README there is this statement:
Q: What is the expected "global batch size"?
For example, I'm trying to run this on 2x3090s and need to know what the expected global batch size is so I can adjust the accumulation steps and per device train batch size.
Thanks much!
The text was updated successfully, but these errors were encountered: