Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added trainer.gradient_accumulation_steps for increasing effective batch size #3305

Merged
merged 2 commits into from
Mar 31, 2023

Conversation

tgaddair
Copy link
Collaborator

Benefits include:

  • Lower network bandwidth overhead by reducing frequency of allreduce / gradient synchronization
  • Increase effective batch size to smooth out variance when training very large models

@tgaddair tgaddair changed the title Added trainer.gradient_accumulation option for increasing effective batch size Added trainer.gradient_accumulation_steps for increasing effective batch size Mar 29, 2023
@github-actions
Copy link

github-actions bot commented Mar 29, 2023

Unit Test Results

    6 files  ±    0      6 suites  ±0   1h 52m 27s ⏱️ + 1h 30m 16s
153 tests +141  140 ✔️ +130  13 💤 +11  0 ±0 
193 runs  +133  172 ✔️ +124  21 💤 +  9  0 ±0 

Results for commit 37b6678. ± Comparison against base commit 531e024.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C00L! 👍


# Just test that training completes without error.
# TODO(travis): We may want to expand upon this in the future to include some checks on model
# convergence like gradient magnitudes, etc. Should also add distributed tests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we re-use the test utility for distributed tests?

run_test_suite(config, dataset, "ray")

Copy link
Collaborator Author

@tgaddair tgaddair Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refactor to be able to do this in a follow-up, yeah. The reason I didn't do it here is because that function is a lot more expensive, since it runs a lot more additional tests. But we should in general rely on some standard test suite functions that can run with different levels of checks.

@tgaddair tgaddair merged commit 16fed3a into master Mar 31, 2023
@tgaddair tgaddair deleted the grad-accum branch March 31, 2023 17:17
Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants