Fix: WanVideoToVideoPipeline now handles VACE control tensors safely #12576

Mathias5 · 2025-11-02T13:44:01Z

What does this PR do?

Summary:
WanVideoToVideoPipeline.__call__ currently crashes when the underlying TransformerWanVACE.forward() expects VACE control arguments but the pipeline doesn’t provide them, and it also doesn’t expose a way for users to pass them. This PR:

Adds optional kwargs to the pipeline call:

control_hidden_states: Optional[torch.Tensor] = None
control_hidden_states_scale: Optional[torch.Tensor] = None

Auto-injects a neutral control (zeros + ones scale) when these args are not provided.
Keeps compatibility with transformers that don’t accept control kwargs by checking the base transformer’s signature and only passing control args when supported.
Adds focused tests to ensure the neutral control path does not crash (both with and without CFG, in latent mode).

This makes Wan V2V robust out-of-the-box while allowing advanced users to supply real control tensors.

Motivation & Context

Users encounter AttributeError: 'NoneType' object has no attribute 'new_ones' because control_hidden_states/_scale are assumed non-None in the model, but the pipeline never provides them.
Passing them manually wasn’t possible either, since the pipeline didn’t expose these kwargs.
We avoid breaking older/tiny test transformers by detecting control-kwarg support via signature inspection and only passing control tensors when supported.

Changes

API (non-breaking): Add two optional kwargs to WanVideoToVideoPipeline.__call__.
Runtime safety:
- If the transformer’s forward() exposes control_hidden_states & control_hidden_states_scale, the pipeline:
  - Builds neutral tensors on the correct device/dtype and with patch-token shape (B, C_ctrl, pt, ph, pw).
  - Expands B if needed during the denoising loop (future-proof for batch shape changes).
- If not supported, the pipeline doesn’t pass the control kwargs (no behavior change).
Docstring: Document new kwargs and behavior.
Tests: Add two fast tests under tests/pipelines/wan/test_wan_video_to_video.py:
- test_neutral_control_injection_no_crash_latent: no CFG, latent output, asserts shape preserved.
- test_neutral_control_injection_with_cfg: with CFG (uncond branch), latent output, asserts shape preserved.
- Both tests skip automatically if the dummy transformer does not accept control kwargs.

Example usage

from diffusers import WanVideoToVideoPipeline, AutoencoderKLWan
from PIL import Image
import torch

vae = AutoencoderKLWan.from_pretrained("wan-vace-weights", subfolder="vae", torch_dtype=torch.float32)
pipe = WanVideoToVideoPipeline.from_pretrained("wan-vace-weights", vae=vae, torch_dtype=torch.bfloat16).to("cuda")

frames = [Image.new("RGB", (512, 288), "gray") for _ in range(8)]

# Baseline: no control kwargs -> neutral control auto-injected when supported
out = pipe(video=frames, prompt="studio three-point lighting", num_inference_steps=16)

# Advanced: provide your own control tensors (optional)
# control_hidden_states = torch.zeros(B, C_ctrl, pt, ph, pw, device="cuda", dtype=torch.bfloat16)
# control_hidden_states_scale = torch.ones(n_layers, device="cuda", dtype=torch.bfloat16)
# out = pipe(video=frames, prompt="...", control_hidden_states=control_hidden_states, control_hidden_states_scale=control_hidden_states_scale)

Backwards compatibility

No breaking changes.
- New kwargs are optional and default to current behavior.
- If the underlying transformer doesn’t accept control kwargs, nothing is passed.
Dtype/device handling matches the transformer’s parameters to avoid autocast mismatches (e.g., bf16 on H100).

Performance

Neutral control allocation is tiny (one token per item: (B, C_ctrl, pt, ph, pw)); no measurable overhead.
Only computed if the transformer supports control kwargs.

Tests

✅ Added: test_neutral_control_injection_no_crash_latent
✅ Added: test_neutral_control_injection_with_cfg
Both tests:
- Use latent path (output_type="latent") for speed, avoid VAE decode.
- Skip automatically if the dummy WanTransformer3DModel doesn’t expose control kwargs.

Documentation

Updated the pipeline docstring to describe the new kwargs, shapes, and default neutral behavior.

Checklist

I read the contributor guideline
This was discussed in issue [Bug] WanVideoToVideoPipeline fails due to missing handling of control arguments required by underlying VACE transformer #12574
Added/updated docstrings
Added tests for the new behavior
Style/lint pass (make style) and quality checks (make quality) on my branch

Who can review?

Tagging pipelines maintainers who might be interested: @yiyixuxu @asomoza

Thanks!

Mathias5 added 4 commits November 2, 2025 12:51

Fix WanVideoToVideoPipeline to conditionally handle VACE control inputs

f48d4d2

style

fc5634f

Add warnings, improve docstring

565650b

docs

bae6994

Mathias5 mentioned this pull request Nov 2, 2025

[Bug] WanVideoToVideoPipeline fails due to missing handling of control arguments required by underlying VACE transformer #12574

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: WanVideoToVideoPipeline now handles VACE control tensors safely #12576

Fix: WanVideoToVideoPipeline now handles VACE control tensors safely #12576

Uh oh!

Mathias5 commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: WanVideoToVideoPipeline now handles VACE control tensors safely #12576

Are you sure you want to change the base?

Fix: WanVideoToVideoPipeline now handles VACE control tensors safely #12576

Uh oh!

Conversation

Mathias5 commented Nov 2, 2025

What does this PR do?

Motivation & Context

Changes

Example usage

Backwards compatibility

Performance

Tests

Documentation

Checklist

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant