Skip to content

Conversation

@Ratish1
Copy link

@Ratish1 Ratish1 commented Nov 5, 2025

What does this PR do?

This PR fixes a bug that causes the QwenImage model to crash when using context parallelism with a prompt whose sequence length is not divisible by the world size.

The fix is implemented within the QwenImageTransformer2DModel and consists of three parts:

  1. Input Padding: The text prompt embeddings (encoder_hidden_states) and their attention mask are padded at the
    start of the forward method to ensure their length is divisible by the world size.
  2. RoPE Correction: The model is updated to use the new, padded sequence length to generate the rotary positional
    embeddings (RoPE), preventing a tensor shape mismatch that was causing a RuntimeError.
  3. Attention Masking: The QwenDoubleStreamAttnProcessor2_0 is corrected to build and use a proper additive attention mask. This ensures the new padded tokens are correctly ignored by the attention mechanism, preserving the numerical output of the model.

A new unit test is also added to simulate a distributed environment. It verifies that the padding logic prevents the crash while ensuring the output is numerically equivalent to the baseline, non-padded run.

Fixes #12568

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul @yiyixuxu @DN6

@Ratish1 Ratish1 changed the title fix(qwenimage): Correct context parallelism padding fix(qwenimage): Add padding for context parallelism Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context parallel bug when using QwenImage

1 participant