fix(zero): detach flat buffer to prevent autograd inplace error on CP… by delock · Pull Request #7948 · deepspeedai/DeepSpeed

delock · 2026-04-02T13:41:43Z

…U accelerator

The on-device flatten path (introduced in #7828) passes nn.Parameter objects with requires_grad=True to torch.cat(), creating a flat buffer with CatBackward0 grad_fn. Later, unflatten_dense_tensors produces SplitBackward0 views that are assigned to model params. Inplace copy() on these views during optimizer step raises:
RuntimeError: Output 0 of SplitBackward0 is a view and is being modified inplace.

This especially affects CPU training where CPU_Accelerator.is_available() returns True and available_memory() returns system RAM, so the on-device path is always taken.

Fix: add .detach() to the flattened buffer, matching the implicit detach behavior of the CPU-offload path (param.data.cpu() + .to(device)).

Also rename flatten_on_gpu -> flatten_on_accelerator and replace GPU-specific terminology in comments/logs with accelerator-generic equivalents.

…U accelerator The on-device flatten path (introduced in deepspeedai#7828) passes nn.Parameter objects with requires_grad=True to torch.cat(), creating a flat buffer with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces SplitBackward0 views that are assigned to model params. Inplace copy_() on these views during optimizer step raises: RuntimeError: Output 0 of SplitBackward0 is a view and is being modified inplace. This especially affects CPU training where CPU_Accelerator.is_available() returns True and available_memory() returns system RAM, so the on-device path is always taken. Fix: add .detach() to the flattened buffer, matching the implicit detach behavior of the CPU-offload path (param.data.cpu() + .to(device)). Also rename flatten_on_gpu -> flatten_on_accelerator and replace GPU-specific terminology in comments/logs with accelerator-generic equivalents. Signed-off-by: Guokai Ma <guokai.ma@intel.com>

Signed-off-by: Guokai Ma <guokai.ma@intel.com>

tohtana

Thank you, @delock! I left a comment in the new test.

tohtana · 2026-04-03T01:43:47Z

tests/unit/v1/zero/test_stage2_flatten_on_gpu.py

+            assert flat.grad_fn is None, ("Flat buffer must be detached from autograd graph"
+                                          " to prevent inplace-modification errors during optimizer step")
+
+        data_loader = random_dataloader(model=engine, total_samples=8, hidden_dim=hidden_dim, device=engine.device)


Shouldn't random_dataloader take dtype? The default is preferred_dtype(), which could mismatch dtype.

Thanks for the catch!

…sertion - Pass explicit dtype to random_dataloader to avoid mismatch when preferred_dtype() (bfloat16 on CPU) differs from the test config dtype. Fixes fp32 test failure on CPU-only CI where data was bfloat16 but model expected float32. - Tighten log check from 'sufficient' to '(sufficient memory)' so it does not accidentally match '(insufficient memory)'. Signed-off-by: Guokai Ma <guokai.ma@intel.com> Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

Signed-off-by: Guokai Ma <guokai.ma@intel.com> Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

delock · 2026-04-03T02:40:10Z

@tohtana Thanks for the comments! I also verified that the newly added test will fail before applying this PR.

deepspeedai#7948) …U accelerator The on-device flatten path (introduced in deepspeedai#7828) passes nn.Parameter objects with requires_grad=True to torch.cat(), creating a flat buffer with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces SplitBackward0 views that are assigned to model params. Inplace copy_() on these views during optimizer step raises: RuntimeError: Output 0 of SplitBackward0 is a view and is being modified inplace. This especially affects CPU training where CPU_Accelerator.is_available() returns True and available_memory() returns system RAM, so the on-device path is always taken. Fix: add .detach() to the flattened buffer, matching the implicit detach behavior of the CPU-offload path (param.data.cpu() + .to(device)). Also rename flatten_on_gpu -> flatten_on_accelerator and replace GPU-specific terminology in comments/logs with accelerator-generic equivalents. --------- Signed-off-by: Guokai Ma <guokai.ma@intel.com> Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

delock requested review from tjruwase and tohtana as code owners April 2, 2026 13:41

delock added 2 commits April 3, 2026 07:40

remove unnecessary detach comment

0bfe45e

Signed-off-by: Guokai Ma <guokai.ma@intel.com>

delock force-pushed the gma/fix_cpu_train branch from 47b1ebc to 0bfe45e Compare April 2, 2026 23:41

test(zero): add training step coverage for accelerator flatten path

7f8aff5

Signed-off-by: Guokai Ma <guokai.ma@intel.com>

delock requested a review from loadams as a code owner April 3, 2026 00:14

tohtana approved these changes Apr 3, 2026

View reviewed changes

delock added 2 commits April 2, 2026 19:07

style: fix yapf formatting for random_dataloader call

0b8b50a

Signed-off-by: Guokai Ma <guokai.ma@intel.com> Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

delock enabled auto-merge (squash) April 3, 2026 02:40

Merge branch 'master' into gma/fix_cpu_train

d088b79

delock merged commit 37e232f into deepspeedai:master Apr 3, 2026
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(zero): detach flat buffer to prevent autograd inplace error on CP…#7948

fix(zero): detach flat buffer to prevent autograd inplace error on CP…#7948
delock merged 6 commits intodeepspeedai:masterfrom
delock:gma/fix_cpu_train

delock commented Apr 2, 2026

Uh oh!

tohtana left a comment •

edited

Loading

Uh oh!

tohtana Apr 3, 2026

Uh oh!

delock Apr 3, 2026

Uh oh!

delock commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

delock commented Apr 2, 2026

Uh oh!

tohtana left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tohtana Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

delock Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

delock commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tohtana left a comment •

edited

Loading