Release v0.18.0: GradientState enhancements and Big Model Inference Fixes · huggingface/accelerate

What's Changed

A new GradientAccumulationPlugin has been added to handle more configurations with the GradientState. Specifically you can optionally disable having Accelerate automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation
Some fixes related to the launch configuration and TPU launches were adjusted, and the dynamo_backend warning has been silenced.
Big model inference saw a number of fixes related to linear layers, drop_last on linear layers, tied weight loading, and handling of multiple tied parameters
A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher

Breaking Changes

find_tied_parameters now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.

What's New?

Add documentation around FSDP state dict save behavior by @VikParuchuri in #1181
add use_orig_params to FullyShardedDataParallelPlugin by @pacman100 in #1184
Only convert linear layers with weights multiple of 16 by @sgugger in #1188
Set drop last to ensure modulo16 restriction for fp8 by @ksivaman in #1189
Accelerator should not call to on modules that wraps accelerate loaded models by @younesbelkada in #1172
Fixup passing overlapping args to the script by @muellerzr in #1198
Make the Scheduler adjust the steps taken relative to the gradient accumulation steps by @muellerzr in #1187
Fix tied weights load by @sgugger in #1204
Better error message when using multi-GPU and Accelerate on torch <1.9.1 by @muellerzr in #1203
Fix typo in TPU config by @muellerzr in #1202
Fix example in accumulate method documentation by @VikParuchuri in #1211
ds offload optim fix to use CPUAdam by @pacman100 in #1208
Move when the GradientState test is performed so that it is not None by @muellerzr in #1219
Fix bug in loading launch config by @neumyor in #1218
Fix get_logger kwarg documentation issue by @bcol23 in #1222
docs: add finetuner to ppl who use accelerate by @bwanglzu in #1224
Silence dynamo_backend by @muellerzr in #1226
Add additional check when patching env by @Chris-hughes10 in #1229
Make grad accum steps mutable on the Accelerator object by @muellerzr in #1233
devcontainer: "extensions" has been removed and replaced by customizations by @dbpprt in #1075
remove empty dicts while saving accelerate config by @pacman100 in #1236
backfill ds plugin attributes when using ds_config by @pacman100 in #1235
Change multinode to multigpu in notebook tutorial by @muellerzr in #1247
Hardware Auto-Setup Example/Tutorial for Distributed Launch by @carolineechen in #1227
Handle multiple tied parameters by @sgugger in #1241

New Contributors

@hackpert made their first contribution in #1180
@VikParuchuri made their first contribution in #1181
@ksivaman made their first contribution in #1189
@neumyor made their first contribution in #1218
@bcol23 made their first contribution in #1222
@bwanglzu made their first contribution in #1224
@carolineechen made their first contribution in #1227

Full Changelog: v0.17.1...v0.18.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.18.0: GradientState enhancements and Big Model Inference Fixes

What's Changed

Breaking Changes

What's New?

New Contributors

Contributors