Skip to content

v0.11.0 Gradient Accumulation and SageMaker Data Parallelism

Compare
Choose a tag to compare
@sgugger sgugger released this 18 Jul 13:02
· 1219 commits to main since this release
eebeb59

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along gradient_accumulation_steps=xxx when instantiating the Accelerator and put all your training loop step under a with accelerator.accumulate(model):. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the documentation.

  • Add gradient accumulation doc by @muellerzr in #511
  • Make gradient accumulation work with dispatched dataloaders by @muellerzr in #510
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

  • SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by @pacman100 in #504
  • SageMaker DP Support by @pacman100 in #494

What's new?