Skip to content

v0.18.0: GradientState enhancements and Big Model Inference Fixes

Compare
Choose a tag to compare
@muellerzr muellerzr released this 24 Mar 14:52
· 670 commits to main since this release
ecd1288

What's Changed

  • A new GradientAccumulationPlugin has been added to handle more configurations with the GradientState. Specifically you can optionally disable having Accelerate automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation
  • Some fixes related to the launch configuration and TPU launches were adjusted, and the dynamo_backend warning has been silenced.
  • Big model inference saw a number of fixes related to linear layers, drop_last on linear layers, tied weight loading, and handling of multiple tied parameters
  • A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher

Breaking Changes

  • find_tied_parameters now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.

What's New?

New Contributors

Full Changelog: v0.17.1...v0.18.0