Release v0.27.0: PyTorch 2.2.0 Support, PyTorch-Native Pipeline Parallism, DeepSpeed XPU support, and Bug Fixes · huggingface/accelerate

PyTorch 2.2.0 Support

With the latest release of PyTorch 2.2.0, we've guaranteed that there are no breaking changes regarding it

PyTorch-Native Pipeline Parallel Inference

With this release we are excited to announce support for pipeline-parallel inference by integrating PyTorch's PiPPy framework (so no need to use Megatron or DeepSpeed)! This supports automatic model-weight splitting to each device using a similar API to device_map="auto". This is still under heavy development, however the inference side is stable enough that we are ready for a release. Read more about it in our docs and check out the example zoo.

Requires pippy of version 0.2.0 or later (pip install torchpippy -U)

Example usage (combined with accelerate launch or torchrun):

from accelerate import PartialState, prepare_pippy
model = AutoModelForSequenceClassification.from_pretrained("gpt2")
model = prepare_pippy(model, split_points="auto", example_args=(input,))
input = input.to("cuda:0")
with torch.no_grad():
    output = model(input)
# The outputs are only on the final process by default
# You can pass in `gather_outputs=True` to prepare_pippy to
# make them available on all processes
if PartialState().is_last_process:
    output = torch.stack(tuple(output[0]))
    print(output.shape)

DeepSpeed

This release provides support for utilizing DeepSpeed on XPU devices thanks to @faaany

What's Changed

Convert model.hf_device_map back to Dict by @SunMarc in #2326
Fix model memory issue by @muellerzr in #2327
Fixed typos in readme files of docs folder. by @rishit5 in #2329
Disable P2P in just the 4000 series by @muellerzr in #2332
Avoid duplicating memory for tied weights in dispatch_model, and in forward with offloading by @fxmarty in #2330
Show DeepSpeed option when multi-XPU is selected in accelerate config by @faaany in #2346
FIX: add oneCCL environment variable for non-MPI launcher (accelerate launch) by @faaany in #2339
device agnostic test_accelerator/test_multigpu by @wangshuai09 in #2343
Fix mpi4py/failing deepspeed test issues by @muellerzr in #2353
Fix block_size picking in megatron_lm_gpt_pretraining example. by @nilq in #2342
Fix dispatch_model with tied weights test on T4 by @fxmarty in #2354
bugfix to allow usage of TE or MSAMP in FP8RecipeKwargs by @sudhakarsingh27 in #2355
Pin DeepSpeed until patch by @muellerzr in #2366
Remove init_hook_kwargs by @fxmarty in #2365
device agnostic optimizer testing by @statelesshz in #2363
add_hook_to_module and remove_hook_from_module compatibility with fx.GraphModule by @fxmarty in #2369
Adding requires_grad to kwargs when registering empty parameters. by @BlackSamorez in #2376
Add adapter_only option to save_fsdp_model and load_fsdp_model to only save/load PEFT weights by @AjayP13 in #2321
device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing by @wangshuai09 in #2356
Fix batch_size sanity check logic for split_batches by @izhx in #2344
Pin Torch version to <2.2.0 by @Rocketknight1 in #2394
Address PIP-632 deprecation of distutils by @AieatAssam in #2388
[don't merge yet] unpin torch by @ydshieh in #2406
Revert "[don't merge yet] unpin torch" by @muellerzr in #2407
Fix CI due to pytest by @muellerzr in #2408
Added activateEnviroment.sh to readme by @TJ-Solergibert in #2409
Fix XPU inference by @notsyncing in #2383
Fix the size of int and bool type when computing module size by @notsyncing in #2411
Adding Local SGD support for NPU by @statelesshz in #2415
Unpin torch by @muellerzr in #2418
Use Ruff for formatting too by @akx in #2400
torch-native pipeline parallelism for big models by @muellerzr in #2345
Update FSDP docs by @pacman100 in #2430
Make output end up on all GPUs at the end by @muellerzr in #2423
Migrate pippy examples over and run tests by @muellerzr in #2424
[FIX] fix the wrong nproc_per_node in the multi gpu test by @faaany in #2422
Fix fp8 things by @muellerzr in #2403
[FIX] allow Accelerator to prepare models in eval mode for XPU&CPU by @faaany in #2426
[Fix] make all tests pass on XPU by @faaany in #2427

New Contributors

@rishit5 made their first contribution in #2329
@faaany made their first contribution in #2346
@wangshuai09 made their first contribution in #2343
@nilq made their first contribution in #2342
@BlackSamorez made their first contribution in #2376
@AjayP13 made their first contribution in #2321
@Rocketknight1 made their first contribution in #2394
@AieatAssam made their first contribution in #2388
@ydshieh made their first contribution in #2406
@notsyncing made their first contribution in #2383
@akx made their first contribution in #2400

Full Changelog: v0.26.1...v0.27.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.27.0: PyTorch 2.2.0 Support, PyTorch-Native Pipeline Parallism, DeepSpeed XPU support, and Bug Fixes

PyTorch 2.2.0 Support

PyTorch-Native Pipeline Parallel Inference

DeepSpeed

What's Changed

New Contributors

Contributors