refactor(attention): archive FMHA sm_120 cluster kernel by kekzl · Pull Request #295 · kekzl/imp

kekzl · 2026-05-20T08:30:09Z

Summary

Archive `src/compute/attention_fmha_sm120_cluster.cu` (1102 LOC, defines both FP16 + FP8 cluster entry points) to `docs/archive/fmha_sm120_cluster/` with a resurrection memo
Remove `try_fmha_sm120_cluster_prefill` + `try_fmha_sm120_fp8_cluster_prefill` forward decls + call sites
Remove `attention.no_fmha_cluster` config field (parser + seed_from_env)
Remove the conditional CMakeLists entry
Delete the FP16 + FP8 cluster test suites (23 `ClusterPath` / `ClusterMatchesLegacy` gtests total** — 13 in `test_attention_fmha_sm120.cu` + 10 in `test_fmha_fp8.cu`. The commit body says "8" — this was a count error in the spec text that the implementer preserved verbatim; the real number is 23.)
Update `docs/roadmap.md` cluster section to reflect archival (follow-up commit `defffb3`)

Why

Two A/B refute memos established the cluster path is bit-identical to legacy:

`fmha_tma_lever_refuted_2026_05_14.md` — TMA bulk-store on sm_120 underperforms cp.async 0.31×-0.79×
`m5_slice2_cluster_refuted_2026_05_17.md` — 4-model A/B sweep: noise-dominated, output bit-identical

The path was default-off since #204. The test `FmhaSm120Test.ClusterPathNonAligned` was failing on main without anyone noticing — confirming the code was unexercised. This PR retires the opt-in.

Side effect

The pre-existing `FmhaSm120Test.ClusterPathNonAligned` failure on main is resolved by removing the test along with the code.

Test plan

`make build` green
`make verify-fast` green (`=== verify fast: OK ===`, no failing tests)
Pre-push hook verify-fast green
Spec compliance reviewer ✅ (FP8 expansion justified — both entry points live in the same .cu)
Code quality reviewer ✅ Approved (with two Important issues both addressed: roadmap.md updated in `defffb3`, test count documented here)

Phase 2 Task 1 of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.

🤖 Generated with Claude Code

The cluster kernel was opt-in (default off) per m5_slice2_cluster_refuted_2026_05_17.md and produced bit-identical output to the non-cluster kernel under all measured configurations. fmha_tma_lever_refuted_2026_05_14.md established that the TMA-style distributed-shared-memory pattern it relied on underperforms cp.async on sm_120. A side effect: the test FmhaSm120Test.ClusterPathNonAligned, which was failing on main without being noticed, is removed along with its code. Archive at docs/archive/fmha_sm120_cluster/ with a resurrection memo describing how to re-introduce on a future GPU SKU. Removes ~1100 LOC + the no_fmha_cluster runtime config field + 8 ClusterPath* gtests. Phase 2 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The cluster section still said "code retained as opt-in for future hardware where the signal might emerge." That stopped being true the moment the kernel was archived to docs/archive/fmha_sm120_cluster/ and the attention.no_fmha_cluster config field was removed. A reader following the original paragraph would conclude they could flip the flag back on; the flag no longer exists. Update the section header from "investigated and shelved" to "archived", note the Phase 2 archival, and point at the resurrection memo. The original investigation history is preserved. Follow-up to a1e221a on the same branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(attention): collapse prefill dispatcher gate After archiving the cluster, mxf4nvf4, and naive paths in Tasks 1-3, the dispatcher's prefill gate at executor_attention.cu collapses from a multi-clause cascade to a clean two-branch switch: if S-matrix fits AND not non-Gemma-4 sliding: cuBLAS QK^T + softmax + PV else: FMHA fallback chain The decode switch at the bottom of the function is untouched (already flat per Phase 2 spec). executor_attention.cu: 1277 -> 1253 LOC (-24). Also drops the now-orphaned attention.no_cublas knob from RuntimeConfig (src/runtime/config.{h,cpp}) and imp.conf.example. After Tasks 1-4 the flag had no remaining readers in the engine; the prefill gate that used it is now a hard policy (cuBLAS default, FMHA fallback on capacity or non-Gemma-4 sliding). Phase 2 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(arch): close Phase 2 of refactor roadmap Phase 2 (Attention-Dispatcher entrümpeln) is done. PRs that landed: - #295 archive FMHA sm_120 cluster (~1100 LOC + 23 cluster gtests) - #296 archive FMHA mxf4nvf4 + remove attention.fmha_blockscale orphan - #297 archive attention_naive + inline parity ref into test - #298 collapse prefill gate to 2-branch switch + remove attention.no_cublas Closeout updates: 1. docs/superpowers/specs/...-roadmap-design.md — Phase 2 status line with PR list + deferred soft PRs (5/6/7). 2. docs/architecture.md — refresh "Attention dispatcher" section to show the new two-branch gate, drop the stale 4-clause snippet. 3. docs/architecture.dot — same in the diagram source (drop ad_naive node, rename cluster label to "cuBLAS default / FMHA fallback", simplify ad_gate label). 4. docs/architecture.svg + .png — regenerated from the new .dot via `docker run --rm -v $(pwd)/docs:/d nshine/dot dot -T{svg,png} ...`. Phase 3 (Pre-Dequant + Quant-Zoo aufräumen) may now begin. A new writing-plans output is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kekzl and others added 2 commits May 20, 2026 10:24

github-actions Bot enabled auto-merge (squash) May 20, 2026 08:30

github-actions Bot merged commit 6025108 into main May 20, 2026
3 checks passed

This was referenced May 20, 2026

docs(arch): close Phase 2 of refactor roadmap #299

Closed

docs(arch): close Phase 2 of refactor roadmap #327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(attention): archive FMHA sm_120 cluster kernel#295

refactor(attention): archive FMHA sm_120 cluster kernel#295
github-actions[bot] merged 2 commits into
mainfrom
refactor/archive-fmha-cluster

kekzl commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kekzl commented May 20, 2026

Summary

Why

Side effect

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant