Skip to content

Fix MLA fp8 prefill crash when max_seqlen_q > 1#12

Merged
sunway513 merged 1 commit intomainfrom
fix/mla-fp8-prefill
Feb 20, 2026
Merged

Fix MLA fp8 prefill crash when max_seqlen_q > 1#12
sunway513 merged 1 commit intomainfrom
fix/mla-fp8-prefill

Conversation

@sunway513
Copy link
Owner

Summary

  • mla_decode_fwd only supports max_seqlen_q=1 (decode). When called during fp8 prefill with max_seqlen_q > 1, it crashes with KeyError (e.g. nhead * max_seqlen_q = 96 not a valid key).
  • Fix: use mla_prefill_fwd with fp8-to-bf16 conversion when max_q_len > 1. Decode path (max_q_len == 1) still uses mla_decode_fwd with native fp8 scale support.

Test plan

  • DeepSeek R1 671B fp8, 8x MI300X, Triton-only build
  • ISL=1024, OSL=1024, conc=128, 1280 prompts — all requests successful

mla_decode_fwd only supports max_seqlen_q=1. When fp8 KV cache is used
during prefill (max_seqlen_q > 1), use mla_prefill_fwd with fp8-to-bf16
conversion instead. Decode path (max_q_len == 1) still uses mla_decode_fwd
with native fp8 scale support.
@sunway513 sunway513 merged commit e18ede9 into main Feb 20, 2026
2 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant