Skip to content

Fix Ulysses SP backward with SDPA#13328

Merged
sayakpaul merged 3 commits intohuggingface:mainfrom
zhtmike:fix_sp_backward
Mar 30, 2026
Merged

Fix Ulysses SP backward with SDPA#13328
sayakpaul merged 3 commits intohuggingface:mainfrom
zhtmike:fix_sp_backward

Conversation

@zhtmike
Copy link
Copy Markdown
Contributor

@zhtmike zhtmike commented Mar 25, 2026

What does this PR do?

Solve the issue #13319.

There are two bugs:

  • grad_out is already in BSHD format, which matches the shape of out. Therefore, we do not need to permute grad_out again.
  • autograd.Function.backward() does not store the gradient by default, so we need to enable it manually.

After the fix, running torchrun --nproc-per-node 2 toy_train.py --enable-sp can get the expected result

loss=1.351188

And we add the test coverage for backward ops with context parallel.

Tested with TestQwenImageTransformerContextParallel, TestFluxTransformerContextParallel and TestFlux2TransformerContextParallel.

Fixes # (issue)

  • Fix Ulysses SP backward with SDPA

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the tests and they pass as well. Great work!

@zhtmike
Copy link
Copy Markdown
Contributor Author

zhtmike commented Mar 30, 2026

Hi @sayakpaul, do I have anything else to help for this PR?

@sayakpaul
Copy link
Copy Markdown
Member

Gonna let the CI run and merge afterward. Sorry for the delay!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul merged commit e1e7d58 into huggingface:main Mar 30, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants