[core] start varlen variants for attn backend kernels. #12765

sayakpaul · 2025-12-01T15:31:57Z

What does this PR do?

More and more models will start using flash varlens. So, let's support those variants through kernels, as well. Hunyuan Video 1.5 already does. With this PR, enabling varlen backends becomes this easy:

pipe.transformer.set_attention_backend("flash_varlen_hub")
# or
pipe.transformer.set_attention_backend("_flash_3_varlen_hub")

Testing script (taken from here)

import torch
import gc 

from diffusers import HunyuanVideo15Pipeline, HunyuanVideo15ImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "cuda:0"

# t2v_names = ["480p_t2v", "720p_t2v", "480p_t2v_distilled"]
t2v_names = ["480p_t2v"]
num_frames = 31  # use a minimum number for testing, 121 is default

# test t2v
prompt="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
seed = 1
for name in t2v_names:
    print(f"Testing {name}...")
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    pipe = HunyuanVideo15Pipeline.from_pretrained(f"hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-{name}", torch_dtype=dtype)
    pipe.transformer.set_attention_backend("flash_varlen_hub")
    pipe.enable_model_cpu_offload()
    pipe.vae.enable_tiling()

    generator = torch.Generator(device=device).manual_seed(seed)
    video = pipe(
        prompt=prompt,
        generator=generator,
        num_frames=num_frames,
        num_inference_steps=50,
    ).frames[0]
    export_to_video(video, f"yiyi_test_hy15_{name}_output.mp4", fps=24)
    max_allocated = torch.cuda.max_memory_allocated() / 1024**3  # GB
    print(f"Max Allocated Memory: {max_allocated:.2f} GB")

    pipe.to("cpu")
    del pipe
    gc.collect()
    
# test i2v
# i2v_names = ["480p_i2v", "720p_i2v", "480p_i2v_distilled", "720p_i2v_distilled"]
i2v_names = ["480p_i2v"]
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG")
prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
seed = 1
for name in i2v_names:
    print(f"Testing {name}...")
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    pipe = HunyuanVideo15ImageToVideoPipeline.from_pretrained(f"hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-{name}", torch_dtype=dtype)
    pipe.transformer.set_attention_backend("flash_varlen_hub")
    pipe.enable_model_cpu_offload()
    pipe.vae.enable_tiling()

    generator = torch.Generator(device=device).manual_seed(seed)
    video = pipe(
        prompt=prompt,
        generator=generator,
        image=image,
        num_frames=num_frames,
        num_inference_steps=50,
    ).frames[0]
    export_to_video(video, f"yiyi_test_hy15_{name}_output.mp4", fps=24)
    max_allocated = torch.cuda.max_memory_allocated() / 1024**3  # GB
    print(f"Max Allocated Memory: {max_allocated:.2f} GB")
    
    pipe.to("cpu")
    del pipe
    gc.collect()

Tested with FA3 varlen as well.

Results

FA2

yiyi_test_hy15_480p_t2v_output.mp4

yiyi_test_hy15_480p_i2v_i2v_output.mp4

FA3

yiyi_test_hy15_480p_t2v_output.mp4

yiyi_test_hy15_480p_i2v_output.mp4

Cc: @MekkCyber

sayakpaul · 2025-12-01T15:33:01Z

src/diffusers/models/attention_dispatch.py

+    AttentionBackendName._FLASH_3_VARLEN_HUB: _HubKernelConfig(
+        repo_id="kernels-community/flash-attn3",
+        function_attr="flash_attn_varlen_func",
+        # revision="fake-ops-return-probs",


This needs to be tested a bit because I am facing some problems. Checking with the kernels team.

HuggingFaceDocBuilderDev · 2025-12-01T15:42:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul added 3 commits December 1, 2025 21:41

start varlen variants for attn backend kernels.

965f60e

maybe unflatten heads.

195e52c

updates

ae72f97

sayakpaul requested review from DN6 and yiyixuxu December 1, 2025 15:31

sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Dec 1, 2025

sayakpaul commented Dec 1, 2025

View reviewed changes

remove unused function.

25ac1cc

sayakpaul added 2 commits December 2, 2025 06:56

doc

98510b1

up

8904d7e

DN6 approved these changes Dec 3, 2025

View reviewed changes

Merge branch 'main' into varlen-kernels

a55a5f5

sayakpaul merged commit f48f9c2 into main Dec 3, 2025
15 checks passed

sayakpaul deleted the varlen-kernels branch December 3, 2025 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] start varlen variants for attn backend kernels. #12765

[core] start varlen variants for attn backend kernels. #12765

sayakpaul commented Dec 1, 2025

Uh oh!

sayakpaul Dec 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[core] start varlen variants for attn backend kernels. #12765

[core] start varlen variants for attn backend kernels. #12765

Conversation

sayakpaul commented Dec 1, 2025

What does this PR do?

Results

FA2

FA3

Uh oh!

sayakpaul Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants