Skip to content

klein 9b kv#13262

Merged
yiyixuxu merged 4 commits intohuggingface:mainfrom
huemin-art:klein-kv
Mar 12, 2026
Merged

klein 9b kv#13262
yiyixuxu merged 4 commits intohuggingface:mainfrom
huemin-art:klein-kv

Conversation

@huemin-art
Copy link
Contributor

What does this PR do?

Adds Flux2KleinKVPipeline — a KV-cached reference image conditioning pipeline for Flux2 Klein 9B.

On the first denoising step, reference image tokens are included in the full transformer forward pass and their post-RoPE attention K/V projections are cached per-layer. On subsequent steps, only the noise latents are forwarded and the cached reference K/V are injected during attention, avoiding redundant recomputation of reference tokens.

Transformer changes (transformer_flux2.py):

  • Flux2KVLayerCache / Flux2KVCache — per-layer and global KV cache containers
  • _flux2_kv_causal_attention — causal attention dispatch (ref tokens self-attend only; txt+img attend to all)
  • Modulation blending helpers (_blend_mod_params, _blend_double_block_mods, _blend_single_block_mods) so reference tokens receive fixed-timestep modulation
  • Flux2KVAttnProcessor / Flux2KVParallelSelfAttnProcessor — KV-aware attention processors for double/single stream blocks
  • Flux2Transformer2DModel.forward extended with kv_cache, kv_cache_mode, num_ref_tokens, ref_fixed_timestep parameters

Pipeline (pipeline_flux2_klein_kv.py):

  • Reference images are VAE-encoded and concatenated with noise latents for step 0 ("extract" mode)
  • Steps 1+ reuse cached K/V without reference tokens in the input ("cached" mode)
  • Falls back to standard forward when no reference image is provided

Before submitting

@sayakpaul
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Style bot fixed some files and pushed the changes.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a ton!
I left some very small feedbacks!

@huemin-art huemin-art requested a review from yiyixuxu March 12, 2026 16:15
@yiyixuxu yiyixuxu merged commit 094caf3 into huggingface:main Mar 12, 2026
9 of 11 checks passed
@eugenehp
Copy link

Awesome!

Pfannkuchensack added a commit to Pfannkuchensack/InvokeAI that referenced this pull request Mar 17, 2026
…onditioning)

Prepare support for the FLUX.2-klein-9b-kv model which uses KV-cached
attention for up to 2.5x faster multi-reference editing. On step 0,
reference image tokens are included in the full forward pass and their
K/V projections are cached per-layer. On subsequent steps, cached K/V
are reused without recomputing reference tokens.

Based on diffusers PR huggingface/diffusers#13262 (not yet released).
Will become functional once diffusers releases Flux2KleinKVPipeline support.

Changes:
- Add Klein9BKV variant to Flux2VariantType taxonomy
- Detect Flux2KleinKVPipeline in model config identification
- Add starter model entry for black-forest-labs/FLUX.2-klein-9b-kv
- Add denoise_kv() backend function with KV-cache denoising loop
- Add Flux2KVDenoiseInvocation node with full inpainting support
- Update Qwen3 encoder variant validation to include Klein9BKV
GaoxiangLuo added a commit to GaoxiangLuo/sglang that referenced this pull request Mar 18, 2026
Add Flux2KleinKVCachePipeline for KV-cached reference image conditioning.
On step 0, reference tokens go through the full forward pass and their
post-RoPE K/V are cached per-layer. Steps 1+ reuse cached K/V, skipping
redundant ref token computation.

Based on diffusers PR huggingface/diffusers#13262, adapted to SGLang
(USPAttention, SP, flashinfer RoPE).

Key changes:
- Flux2KVLayerCache / Flux2KVCache for per-layer K/V storage
- Causal attention in extract mode (ref tokens self-attend only)
- Modulation blending with fixed timestep for ref tokens
- Flux2KleinKVCacheDenoisingStage for step-dependent cache switching
- Expose --pipeline-class-name CLI flag (existing ServerArgs field)
- Integration test case for flux_2_klein_9b_kv_ti2i

Target: black-forest-labs/FLUX.2-klein-9b-kv (rope_theta=2000).
GaoxiangLuo added a commit to GaoxiangLuo/sglang that referenced this pull request Mar 18, 2026
Add Flux2KleinKVCachePipeline for KV-cached reference image conditioning.
On step 0, reference tokens go through the full forward pass and their
post-RoPE K/V are cached per-layer. Steps 1+ reuse cached K/V, skipping
redundant ref token computation.

Based on diffusers PR huggingface/diffusers#13262, adapted to SGLang
(USPAttention, SP, flashinfer RoPE).

Key changes:
- Flux2KVLayerCache / Flux2KVCache for per-layer K/V storage
- Causal attention in extract mode (ref tokens self-attend only)
- Modulation blending with fixed timestep for ref tokens
- Flux2KleinKVCacheDenoisingStage for step-dependent cache switching
- Expose --pipeline-class-name CLI flag (existing ServerArgs field)
- Integration test case for flux_2_klein_9b_kv_ti2i

Target: black-forest-labs/FLUX.2-klein-9b-kv (rope_theta=2000).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants