Skip to content

fix(attention): Track E PV repack — m16n8k16 A-operand layout#352

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/track-e-disable-degeneration
May 21, 2026
Merged

fix(attention): Track E PV repack — m16n8k16 A-operand layout#352
github-actions[bot] merged 1 commit into
mainfrom
fix/track-e-disable-degeneration

Conversation

@kekzl
Copy link
Copy Markdown
Owner

@kekzl kekzl commented May 21, 2026

What this PR does

Fixes the Track E degeneration bug. Branch contains 2 commits:

  1. `6f2ce41` — temporary disable (return false at launcher entry) as production hotfix while debugging
  2. `99ec716` — actual root-cause fix (re-enables Track E with corrected PV repack)

Net effect: Track E shipped fixed.

Root cause

The PV repack (S_frag → P_frag for the second mma) used the wrong A-operand interleaving for `mma.sync.aligned.m16n8k16.row.col.f32.f16.f16.f32`.

Correct PTX m16n8k16 A-frag per-lane layout (row × k-half interleaved):

  • a[0] = (row_a, k<8) → accumulates into d[0,1] (row_a output)
  • a[1] = (row_b, k<8) → accumulates into d[2,3] (row_b output)
  • a[2] = (row_a, k≥8) → accumulates into d[0,1]
  • a[3] = (row_b, k≥8) → accumulates into d[2,3]

The shipped code grouped by row-then-k instead of by k-then-row at positions a[1]/a[2]:

  • a[1] = S[row_a, k≥8] ← WRONG (should be row_b, k<8)
  • a[2] = S[row_b, k<8] ← WRONG (should be row_a, k≥8)

This put row_a softmax weights into d[2,3] (row_b output slot) and vice versa. With the original uniform test fill (magnitude 0.125), all row values were similar enough that the swap produced only ~5e-3 error — within the test tolerance.

On real Qwen3 attention weights (non-uniform, magnitude ~1.0), the swap produces ~0.116 abs error → garbage logits → degenerate output (`,,,` or `ffff...` infinite repetition).

Fix

  1. `P_frag` construction — corrected interleaving order.
  2. `O_frag` rescale & epilogue store — restored standard mappings (frag[0,1] → row_a = lane/4, frag[2,3] → row_b = lane/4 + 8).
  3. Test fill magnitude — bumped from `* 0.125f` to `* 1.0f` so future bugs of this class don't slip through. Max abs error after fix: 0.000244 (fp16 rounding floor only) on all 9 `TrackE_*` tests.

Verification

```
Prompt: "What is 17 + 25?"
Before fix: ",,,,,,,,,,..."
After fix: "Okay, I need to figure out how to solve this problem. Let me read it again." (coherent)
```

Tests

  • 9/9 `TrackE_*` tests pass with stricter magnitude=1.0 fill (max_abs = 0.000244)
  • Real Qwen3-8B Q8_0 inference produces coherent output
  • verify-fast pre-push hook green
  • full test-attention suite passes

Followup

🤖 Generated with Claude Code

Track E numerical tests pass (max_abs<5e-3 vs cuBLAS reference) on the
synthetic deterministic test fill, but actual inference on Qwen3-8B Q8_0
produces degenerate output:

  Prompt: "What is 17 + 25?"
  cuBLAS: "<think>\nOkay, let's see. The user is asking what 17 plus 25..."
  Track E: ",,,,,,,,,,,,,,,,,,,,..." (infinite Chinese comma)

The synthetic fill_fp16_deterministic produces uniform-ish softmax
distributions where mask/precision/layout errors cancel within the
5e-3 tolerance. Real Qwen3 attention scores are highly non-uniform;
something in Track E (suspect: m16n8k16 D-frag layout invert assumption
from Task 13, or short-seq Q-row beyond-seq_q handling) breaks on real
distributions.

Disabling Track E by early-returning false at the launcher entry. Code
stays in tree for the bug investigation. cuBLAS / FMHA fallback paths
unchanged — production attention reverts to its pre-#350 behavior.

Hotfix; investigation continues. Will re-enable once root-caused and
test fill is updated to catch the bug class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot enabled auto-merge (squash) May 21, 2026 12:30
@github-actions github-actions Bot merged commit 7094b66 into main May 21, 2026
3 checks passed
@kekzl kekzl changed the title fix(attention): disable Track E — degeneration on real model weights fix(attention): Track E PV repack — m16n8k16 A-operand layout May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant