-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Bug] QwenImagePipeline silently disables CFG when passing negative_prompt_embeds if mask is None (which encode_prompt returns by default) #13377
Description
Describe the bug
In QwenImagePipeline, when users manually pre-compute prompt embeddings to optimize memory usage (e.g., placing the encoder and transformer on different GPUs), Classifier-Free Guidance (CFG) is silently disabled if negative_prompt_embeds_mask is set to None.
However, encode_prompt explicitly converts an all-ones mask to None as an optimization. This creates a logical contradiction where the pipeline's own encoder returns a valid state (None) that the __call__ method subsequently rejects, causing CFG to fail with a warning.
Reproduction
If you manually extract embeddings and pass them to the pipeline:
# 1. Manually encode prompts
pos_embeds, pos_mask = pipeline.encode_prompt("A photo of a cat")
neg_embeds, neg_mask = pipeline.encode_prompt("bad quality")
# Note: neg_mask is often `None` here because `encode_prompt` optimizes `prompt_embeds_mask.all() -> None`
# 2. Pass them to the pipeline
image = pipeline(
prompt_embeds=pos_embeds,
prompt_embeds_mask=pos_mask,
negative_prompt_embeds=neg_embeds,
negative_prompt_embeds_mask=neg_mask, # This passes None
true_cfg_scale=4.0
).images[0]Output Warning:
true_cfg_scale is passed as 4.0, but classifier-free guidance is not enabled since no negative_prompt is provided.
Root Cause Analysis:
In pipeline_qwenimage.py:
1. The has_neg_prompt check in __call__ requires the mask to NOT be None:
has_neg_prompt = negative_prompt is not None or (
negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None
)2. But encode_prompt intentionally sets the mask to None if it's full:
if prompt_embeds_mask is not None:
# ... (reshape logic)
if prompt_embeds_mask.all():
prompt_embeds_mask = None # <--- Here!Because neg_mask becomes None, has_neg_prompt evaluates to False, and do_true_cfg is set to False.
Expected behavior:
The presence of negative_prompt_embeds alone should be sufficient to trigger has_neg_prompt = True. The negative_prompt_embeds_mask being None should simply mean "no masking is required" (all valid), which is consistent with the behavior of encode_prompt.
The check in __call__ should probably be relaxed to:
has_neg_prompt = negative_prompt is not None or negative_prompt_embeds is not NoneTemporary Workaround:
Users currently have to manually fake an all-ones mask before passing it to the transformer:
if neg_mask is None:
neg_mask = torch.ones(neg_embeds.shape[:2], dtype=torch.long, device=device)Logs
System Info
- 🤗 Diffusers version: 0.37.1
- Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.10.20
- PyTorch version (GPU?): 2.11.0+cu130 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 1.8.0
- Transformers version: 5.4.0
- Accelerate version: 1.13.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.7.0
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 3060, 12288 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Yes