klein 9b kv by huemin-art · Pull Request #13262 · huggingface/diffusers

huemin-art · 2026-03-12T14:56:59Z

What does this PR do?

Adds Flux2KleinKVPipeline — a KV-cached reference image conditioning pipeline for Flux2 Klein 9B.

On the first denoising step, reference image tokens are included in the full transformer forward pass and their post-RoPE attention K/V projections are cached per-layer. On subsequent steps, only the noise latents are forwarded and the cached reference K/V are injected during attention, avoiding redundant recomputation of reference tokens.

Transformer changes (transformer_flux2.py):

Flux2KVLayerCache / Flux2KVCache — per-layer and global KV cache containers
_flux2_kv_causal_attention — causal attention dispatch (ref tokens self-attend only; txt+img attend to all)
Modulation blending helpers (_blend_mod_params, _blend_double_block_mods, _blend_single_block_mods) so reference tokens receive fixed-timestep modulation
Flux2KVAttnProcessor / Flux2KVParallelSelfAttnProcessor — KV-aware attention processors for double/single stream blocks
Flux2Transformer2DModel.forward extended with kv_cache, kv_cache_mode, num_ref_tokens, ref_fixed_timestep parameters

Pipeline (pipeline_flux2_klein_kv.py):

Reference images are VAE-encoded and concatenated with noise latents for step 0 ("extract" mode)
Steps 1+ reuse cached K/V without reference tokens in the input ("cached" mode)
Falls back to standard forward when no reference image is provided

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

sayakpaul · 2026-03-12T15:14:54Z

@bot /style

github-actions · 2026-03-12T15:15:23Z

Style bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2026-03-12T15:22:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks a ton!
I left some very small feedbacks!

src/diffusers/pipelines/flux2/pipeline_flux2_klein_kv.py

src/diffusers/models/transformers/transformer_flux2.py

eugenehp · 2026-03-13T22:39:07Z

Awesome!

…onditioning) Prepare support for the FLUX.2-klein-9b-kv model which uses KV-cached attention for up to 2.5x faster multi-reference editing. On step 0, reference image tokens are included in the full forward pass and their K/V projections are cached per-layer. On subsequent steps, cached K/V are reused without recomputing reference tokens. Based on diffusers PR huggingface/diffusers#13262 (not yet released). Will become functional once diffusers releases Flux2KleinKVPipeline support. Changes: - Add Klein9BKV variant to Flux2VariantType taxonomy - Detect Flux2KleinKVPipeline in model config identification - Add starter model entry for black-forest-labs/FLUX.2-klein-9b-kv - Add denoise_kv() backend function with KV-cache denoising loop - Add Flux2KVDenoiseInvocation node with full inpainting support - Update Qwen3 encoder variant validation to include Klein9BKV

Add Flux2KleinKVCachePipeline for KV-cached reference image conditioning. On step 0, reference tokens go through the full forward pass and their post-RoPE K/V are cached per-layer. Steps 1+ reuse cached K/V, skipping redundant ref token computation. Based on diffusers PR huggingface/diffusers#13262, adapted to SGLang (USPAttention, SP, flashinfer RoPE). Key changes: - Flux2KVLayerCache / Flux2KVCache for per-layer K/V storage - Causal attention in extract mode (ref tokens self-attend only) - Modulation blending with fixed timestep for ref tokens - Flux2KleinKVCacheDenoisingStage for step-dependent cache switching - Expose --pipeline-class-name CLI flag (existing ServerArgs field) - Integration test case for flux_2_klein_9b_kv_ti2i Target: black-forest-labs/FLUX.2-klein-9b-kv (rope_theta=2000).

klein 9b kv

c2c446c

Apply style fixes

159337c

yiyixuxu approved these changes Mar 12, 2026

View reviewed changes

fix typo inline modulation split

87d6f18

huemin-art requested a review from yiyixuxu March 12, 2026 16:15

make fix-copies

430561b

yiyixuxu merged commit 094caf3 into huggingface:main Mar 12, 2026
9 of 11 checks passed

GaoxiangLuo mentioned this pull request Mar 18, 2026

[diffusion] feat: add KV cache pipeline for FLUX.2-klein-9b-kv sgl-project/sglang#20879

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

klein 9b kv#13262

klein 9b kv#13262
yiyixuxu merged 4 commits intohuggingface:mainfrom
huemin-art:klein-kv

huemin-art commented Mar 12, 2026

Uh oh!

sayakpaul commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2026

Uh oh!

yiyixuxu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eugenehp commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

huemin-art commented Mar 12, 2026

What does this PR do?

Before submitting

Uh oh!

sayakpaul commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eugenehp commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Mar 12, 2026 •

edited

Loading