Skip to content

Conversation

@Mathias5
Copy link

@Mathias5 Mathias5 commented Nov 2, 2025

What does this PR do?

Fixes: #12574

Summary:
WanVideoToVideoPipeline.__call__ currently crashes when the underlying TransformerWanVACE.forward() expects VACE control arguments but the pipeline doesn’t provide them, and it also doesn’t expose a way for users to pass them. This PR:

  1. Adds optional kwargs to the pipeline call:
  • control_hidden_states: Optional[torch.Tensor] = None
  • control_hidden_states_scale: Optional[torch.Tensor] = None
  1. Auto-injects a neutral control (zeros + ones scale) when these args are not provided.
  2. Keeps compatibility with transformers that don’t accept control kwargs by checking the base transformer’s signature and only passing control args when supported.
  3. Adds focused tests to ensure the neutral control path does not crash (both with and without CFG, in latent mode).

This makes Wan V2V robust out-of-the-box while allowing advanced users to supply real control tensors.

Motivation & Context

  • Users encounter AttributeError: 'NoneType' object has no attribute 'new_ones' because control_hidden_states/_scale are assumed non-None in the model, but the pipeline never provides them.
  • Passing them manually wasn’t possible either, since the pipeline didn’t expose these kwargs.
  • We avoid breaking older/tiny test transformers by detecting control-kwarg support via signature inspection and only passing control tensors when supported.

Changes

  • API (non-breaking): Add two optional kwargs to WanVideoToVideoPipeline.__call__.
  • Runtime safety:
    • If the transformer’s forward() exposes control_hidden_states & control_hidden_states_scale, the pipeline:
      • Builds neutral tensors on the correct device/dtype and with patch-token shape (B, C_ctrl, pt, ph, pw).
      • Expands B if needed during the denoising loop (future-proof for batch shape changes).
    • If not supported, the pipeline doesn’t pass the control kwargs (no behavior change).
  • Docstring: Document new kwargs and behavior.
  • Tests: Add two fast tests under tests/pipelines/wan/test_wan_video_to_video.py:
    • test_neutral_control_injection_no_crash_latent: no CFG, latent output, asserts shape preserved.
    • test_neutral_control_injection_with_cfg: with CFG (uncond branch), latent output, asserts shape preserved.
    • Both tests skip automatically if the dummy transformer does not accept control kwargs.

Example usage

from diffusers import WanVideoToVideoPipeline, AutoencoderKLWan
from PIL import Image
import torch

vae = AutoencoderKLWan.from_pretrained("wan-vace-weights", subfolder="vae", torch_dtype=torch.float32)
pipe = WanVideoToVideoPipeline.from_pretrained("wan-vace-weights", vae=vae, torch_dtype=torch.bfloat16).to("cuda")

frames = [Image.new("RGB", (512, 288), "gray") for _ in range(8)]

# Baseline: no control kwargs -> neutral control auto-injected when supported
out = pipe(video=frames, prompt="studio three-point lighting", num_inference_steps=16)

# Advanced: provide your own control tensors (optional)
# control_hidden_states = torch.zeros(B, C_ctrl, pt, ph, pw, device="cuda", dtype=torch.bfloat16)
# control_hidden_states_scale = torch.ones(n_layers, device="cuda", dtype=torch.bfloat16)
# out = pipe(video=frames, prompt="...", control_hidden_states=control_hidden_states, control_hidden_states_scale=control_hidden_states_scale)

Backwards compatibility

  • No breaking changes.
    • New kwargs are optional and default to current behavior.
    • If the underlying transformer doesn’t accept control kwargs, nothing is passed.
  • Dtype/device handling matches the transformer’s parameters to avoid autocast mismatches (e.g., bf16 on H100).

Performance

  • Neutral control allocation is tiny (one token per item: (B, C_ctrl, pt, ph, pw)); no measurable overhead.
  • Only computed if the transformer supports control kwargs.

Tests

  • ✅ Added: test_neutral_control_injection_no_crash_latent
  • ✅ Added: test_neutral_control_injection_with_cfg
  • Both tests:
    • Use latent path (output_type="latent") for speed, avoid VAE decode.
    • Skip automatically if the dummy WanTransformer3DModel doesn’t expose control kwargs.

Documentation

  • Updated the pipeline docstring to describe the new kwargs, shapes, and default neutral behavior.

Checklist

Who can review?

Tagging pipelines maintainers who might be interested: @yiyixuxu @asomoza

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] WanVideoToVideoPipeline fails due to missing handling of control arguments required by underlying VACE transformer

1 participant