Skip to content

Conversation

@sayakpaul
Copy link
Member

What does this PR do?

More and more models will start using flash varlens. So, let's support those variants through kernels, as well. Hunyuan Video 1.5 already does. With this PR, enabling varlen backends becomes this easy:

pipe.transformer.set_attention_backend("flash_varlen_hub")
# or
pipe.transformer.set_attention_backend("_flash_3_varlen_hub")
Testing script (taken from here)
import torch
import gc 

from diffusers import HunyuanVideo15Pipeline, HunyuanVideo15ImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "cuda:0"

# t2v_names = ["480p_t2v", "720p_t2v", "480p_t2v_distilled"]
t2v_names = ["480p_t2v"]
num_frames = 31  # use a minimum number for testing, 121 is default

# test t2v
prompt="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
seed = 1
for name in t2v_names:
    print(f"Testing {name}...")
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    pipe = HunyuanVideo15Pipeline.from_pretrained(f"hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-{name}", torch_dtype=dtype)
    pipe.transformer.set_attention_backend("flash_varlen_hub")
    pipe.enable_model_cpu_offload()
    pipe.vae.enable_tiling()

    generator = torch.Generator(device=device).manual_seed(seed)
    video = pipe(
        prompt=prompt,
        generator=generator,
        num_frames=num_frames,
        num_inference_steps=50,
    ).frames[0]
    export_to_video(video, f"yiyi_test_hy15_{name}_output.mp4", fps=24)
    max_allocated = torch.cuda.max_memory_allocated() / 1024**3  # GB
    print(f"Max Allocated Memory: {max_allocated:.2f} GB")

    pipe.to("cpu")
    del pipe
    gc.collect()
    
# test i2v
# i2v_names = ["480p_i2v", "720p_i2v", "480p_i2v_distilled", "720p_i2v_distilled"]
i2v_names = ["480p_i2v"]
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG")
prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
seed = 1
for name in i2v_names:
    print(f"Testing {name}...")
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    pipe = HunyuanVideo15ImageToVideoPipeline.from_pretrained(f"hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-{name}", torch_dtype=dtype)
    pipe.transformer.set_attention_backend("flash_varlen_hub")
    pipe.enable_model_cpu_offload()
    pipe.vae.enable_tiling()

    generator = torch.Generator(device=device).manual_seed(seed)
    video = pipe(
        prompt=prompt,
        generator=generator,
        image=image,
        num_frames=num_frames,
        num_inference_steps=50,
    ).frames[0]
    export_to_video(video, f"yiyi_test_hy15_{name}_output.mp4", fps=24)
    max_allocated = torch.cuda.max_memory_allocated() / 1024**3  # GB
    print(f"Max Allocated Memory: {max_allocated:.2f} GB")
    
    pipe.to("cpu")
    del pipe
    gc.collect()

Tested with FA3 varlen as well.

Results

FA2

yiyi_test_hy15_480p_t2v_output.mp4
yiyi_test_hy15_480p_i2v_i2v_output.mp4

FA3

yiyi_test_hy15_480p_t2v_output.mp4
yiyi_test_hy15_480p_i2v_output.mp4

Cc: @MekkCyber

@sayakpaul sayakpaul requested review from DN6 and yiyixuxu December 1, 2025 15:31
@sayakpaul sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Dec 1, 2025
AttentionBackendName._FLASH_3_VARLEN_HUB: _HubKernelConfig(
repo_id="kernels-community/flash-attn3",
function_attr="flash_attn_varlen_func",
# revision="fake-ops-return-probs",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be tested a bit because I am facing some problems. Checking with the kernels team.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul merged commit f48f9c2 into main Dec 3, 2025
15 checks passed
@sayakpaul sayakpaul deleted the varlen-kernels branch December 3, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Anything related to performance improvements, profiling and benchmarking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants