Skip to content

[Bug] QwenImagePipeline silently disables CFG when passing negative_prompt_embeds if mask is None (which encode_prompt returns by default) #13377

@Sunhill666

Description

@Sunhill666

Describe the bug

In QwenImagePipeline, when users manually pre-compute prompt embeddings to optimize memory usage (e.g., placing the encoder and transformer on different GPUs), Classifier-Free Guidance (CFG) is silently disabled if negative_prompt_embeds_mask is set to None.

However, encode_prompt explicitly converts an all-ones mask to None as an optimization. This creates a logical contradiction where the pipeline's own encoder returns a valid state (None) that the __call__ method subsequently rejects, causing CFG to fail with a warning.

Reproduction

If you manually extract embeddings and pass them to the pipeline:

# 1. Manually encode prompts
pos_embeds, pos_mask = pipeline.encode_prompt("A photo of a cat")
neg_embeds, neg_mask = pipeline.encode_prompt("bad quality") 
# Note: neg_mask is often `None` here because `encode_prompt` optimizes `prompt_embeds_mask.all() -> None`

# 2. Pass them to the pipeline
image = pipeline(
    prompt_embeds=pos_embeds,
    prompt_embeds_mask=pos_mask,
    negative_prompt_embeds=neg_embeds,
    negative_prompt_embeds_mask=neg_mask,  # This passes None
    true_cfg_scale=4.0
).images[0]

Output Warning:

true_cfg_scale is passed as 4.0, but classifier-free guidance is not enabled since no negative_prompt is provided.

Root Cause Analysis:
In pipeline_qwenimage.py:

1. The has_neg_prompt check in __call__ requires the mask to NOT be None:

has_neg_prompt = negative_prompt is not None or (
    negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None
)

2. But encode_prompt intentionally sets the mask to None if it's full:

if prompt_embeds_mask is not None:
    # ... (reshape logic)
    if prompt_embeds_mask.all():
        prompt_embeds_mask = None  # <--- Here!

Because neg_mask becomes None, has_neg_prompt evaluates to False, and do_true_cfg is set to False.

Expected behavior:
The presence of negative_prompt_embeds alone should be sufficient to trigger has_neg_prompt = True. The negative_prompt_embeds_mask being None should simply mean "no masking is required" (all valid), which is consistent with the behavior of encode_prompt.

The check in __call__ should probably be relaxed to:

has_neg_prompt = negative_prompt is not None or negative_prompt_embeds is not None

Temporary Workaround:
Users currently have to manually fake an all-ones mask before passing it to the transformer:

if neg_mask is None:
    neg_mask = torch.ones(neg_embeds.shape[:2], dtype=torch.long, device=device)

Logs

System Info

  • 🤗 Diffusers version: 0.37.1
  • Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Running on Google Colab?: No
  • Python version: 3.10.20
  • PyTorch version (GPU?): 2.11.0+cu130 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 1.8.0
  • Transformers version: 5.4.0
  • Accelerate version: 1.13.0
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.7.0
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 3060, 12288 MiB
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes

Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions