Skip to content

[torchao] Fix record_stream with torchao and group offloading#13427

Closed
asomoza wants to merge 1 commit intofix-torchao-groupoffloadingfrom
torchao-group-offload-record-stream
Closed

[torchao] Fix record_stream with torchao and group offloading#13427
asomoza wants to merge 1 commit intofix-torchao-groupoffloadingfrom
torchao-group-offload-record-stream

Conversation

@asomoza
Copy link
Copy Markdown
Member

@asomoza asomoza commented Apr 6, 2026

What does this PR do?

Fixes the use of record_stream with torchao and group offload. Without this, the resulting image is wrong.

Without this PR With this PR
flux_benchmark_dynamic_eager_bs1_goff_leaf_level_output_bad flux_benchmark_dynamic_eager_bs1_output

Who can review?

@sayakpaul

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza asomoza marked this pull request as draft April 6, 2026 17:56
@asomoza
Copy link
Copy Markdown
Member Author

asomoza commented Apr 6, 2026

I did a double check and this is not a correct fix, the VRAM usage jumps from 5.67 GB to 13.71GB

@asomoza
Copy link
Copy Markdown
Member Author

asomoza commented Apr 6, 2026

actually, with the torchao PR this works ok, so closing.

@asomoza asomoza closed this Apr 6, 2026
@asomoza asomoza deleted the torchao-group-offload-record-stream branch April 6, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants