Fix MLA fp8 prefill crash when max_seqlen_q > 1 by sunway513 · Pull Request #12 · sunway513/ATOM

sunway513 · 2026-02-20T05:43:03Z

Summary

mla_decode_fwd only supports max_seqlen_q=1 (decode). When called during fp8 prefill with max_seqlen_q > 1, it crashes with KeyError (e.g. nhead * max_seqlen_q = 96 not a valid key).
Fix: use mla_prefill_fwd with fp8-to-bf16 conversion when max_q_len > 1. Decode path (max_q_len == 1) still uses mla_decode_fwd with native fp8 scale support.

Test plan

DeepSeek R1 671B fp8, 8x MI300X, Triton-only build
ISL=1024, OSL=1024, conc=128, 1280 prompts — all requests successful

mla_decode_fwd only supports max_seqlen_q=1. When fp8 KV cache is used during prefill (max_seqlen_q > 1), use mla_prefill_fwd with fp8-to-bf16 conversion instead. Decode path (max_q_len == 1) still uses mla_decode_fwd with native fp8 scale support.

sunway513 merged commit e18ede9 into main Feb 20, 2026
2 of 11 checks passed

sunway513 mentioned this pull request Feb 20, 2026

Add Tier 1 unit tests for model_ops regression guard #17

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MLA fp8 prefill crash when max_seqlen_q > 1#12

Fix MLA fp8 prefill crash when max_seqlen_q > 1#12
sunway513 merged 1 commit intomainfrom
fix/mla-fp8-prefill

sunway513 commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunway513 commented Feb 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant