Fix qwen image prompt padding #12075 #12643
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix QwenImage Prompt Embedding Padding for Deterministic Outputs
What was the issue?
Issue #12075 reported that QwenImage pipelines were producing non-deterministic outputs when using the same prompt across different batch sizes. The same text prompt would generate different images depending on whether it was batched alone or with other prompts of varying lengths.
This inconsistency violated a fundamental expectation: identical prompts with the same seed should always produce identical outputs, regardless of batch composition.
How I identified the problem
After reviewing the issue report and examining the QwenImage pipeline implementation, I discovered the root cause in the prompt embedding padding logic.
The pipelines were dynamically padding prompt embeddings to the maximum sequence length within each batch, rather than using a fixed padding length. This meant:
The problem existed across all 8 QwenImage pipeline variants (main, img2img, inpaint, edit, edit_inpaint, edit_plus, controlnet, controlnet_inpaint) and the modular encoder functions.
How I solved it
The solution involved ensuring all prompt embeddings are padded to a consistent, fixed length determined by the
max_sequence_lengthparameter (default 512, configurable up to the model's 1024 token limit).I modified the padding logic in all affected locations to:
max_sequence_lengthpaddingmax_sequence_lengthparameter to all internal prompt encoding methodstxt_seq_lensto reflect the padded length instead of actual token countsmax_sequence_length=512to preserve existing behavior for usersThe fix ensures that any prompt will always receive the same padding and RoPE positions, regardless of batch composition, making outputs deterministic and reproducible.
How the fix was tested
I created a comprehensive test
test_prompt_embeds_padding()that verifies three critical behaviors:Additionally, I ran the entire QwenImage test suite to ensure no regressions were introduced. All structural tests pass successfully, with only expected value assertion changes (since fixing the padding changes the numerical outputs).
Test Results
Fixes : #12075
cc : @sayakpaul @yiyixuxu