fix(distill): pre-warm rmsnorm at both 1e-6 (Qwen2) and 1e-5 (Llama) eps (PMAT-698n) by noahgift · Pull Request #1827 · paiml/aprender

noahgift · 2026-05-20T05:03:24Z

Summary

PMAT-698k added the _eps{:08x} suffix to the rmsnorm pre-warm key but used the wrong default value:

Pre-warm: 1.0e-5_f32.to_bits() = 0x3727c5ac (Llama/Mistral default)
Runtime (Qwen2.5-Coder-0.5B): rms_norm_eps = 1e-6 = 0x358637bd

Different bits → different cache key → still cache-misses on Qwen2 family. Live Phase 3 dispatch v11 confirmed batched_rmsnorm_fwd_896_eps358637bd still JIT'd.

Fix

Pre-warm BOTH eps values. Cost: ~30 KB extra cache. Benefit: zero rmsnorm cache misses for either Qwen2 family OR Llama family without per-model dispatch logic.

Test plan

cargo check --features cuda clean
Live gx10 dispatch: zero [FWD-CACHE] Compiling events for batched_rmsnorm_fwd_*

Cascade stage 10

Final forward-cache JIT hygiene fix. Phase 3 pipeline already works end-to-end (#1817 PMAT-698j unblocked it); this just eliminates the last cache miss.

🤖 Generated with Claude Code

…eps (PMAT-698n) PMAT-698k added the eps-suffix to the rmsnorm pre-warm key but used the wrong default value: 1.0e-5_f32.to_bits() = 0x3727c5ac, whereas the Qwen2 / Qwen2.5 family uses rms_norm_eps = 1e-6 = 0x358637bd. Live Phase 3 dispatch v11 confirmed the runtime key was batched_rmsnorm_fwd_896_eps358637bd — still cache-missing on the PMAT-698k pre-warm entry. Fix: pre-warm BOTH eps values (1e-6 for Qwen2/Qwen2.5, 1e-5 for Llama/Mistral). Cost: ~30 KB extra cache headroom. Benefit: zero cache misses for either model family without per-model dispatch logic. Test plan: - [x] cargo check --features cuda — clean build - [ ] Live gx10 dispatch: no [FWD-CACHE] Compiling for batched_rmsnorm_fwd_* Stage 10 of cascade hygiene — last forward-cache JIT event eliminated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-20 — real distillation 1.5B teacher → 0.5B student on Blackwell GB10 with the full PMAT-698e..n + PMAT-700-B cascade active. initial_loss = 7.6746 final_loss = 7.2036 ← LESS THAN initial 62 steps, 122.7s, no errors F-DISTILL-SMOKE-001 ("final_loss < initial_loss") discharged. Phase 3 of SPEC-DISTILL-001 is COMPLETE. Evidence: - evidence/distill-phase-3-real-kd/dispatch.json — dispatch manifest - evidence/distill-phase-3-real-kd/launch-final-pass.txt — full training log Run dir on gx10: /home/noah/runs/distill-smoke-20260520-070404/ Trained student checkpoint: student-trained.apr/model.safetensors Cascade summary (all merged): - #1804 PMAT-700-B (cuBLAS prewarm skip) - #1808 PMAT-698e (workspace cap) - #1809 PMAT-698f (APR magic in weights loader) - #1810 PMAT-698g (non-LoRA backward pre-warm) - #1813 PMAT-698h (rms_norm_gamma_reduce pre-warm) - #1815 PMAT-698i (FWD-CACHE diagnostic logging) - #1817 PMAT-698j (THE root cause — warm! macro key) - #1820 PMAT-698k (cache-key alignment: rope fwd + rmsnorm eps) - #1823 PMAT-698m (smoke setup: non-degenerate batch) - #1824 (post-mortem doc) - #1827 PMAT-698n (rmsnorm pre-warm at both 1e-6 + 1e-5 eps) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 20, 2026 05:03

noahgift mentioned this pull request May 20, 2026

evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10 #1828

Merged

noahgift merged commit c3c68b5 into main May 20, 2026
11 checks passed

noahgift deleted the fix/distill-rmsnorm-eps-1e6-pmat-698n branch May 20, 2026 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(distill): pre-warm rmsnorm at both 1e-6 (Qwen2) and 1e-5 (Llama) eps (PMAT-698n)#1827

fix(distill): pre-warm rmsnorm at both 1e-6 (Qwen2) and 1e-5 (Llama) eps (PMAT-698n)#1827
noahgift merged 1 commit into
mainfrom
fix/distill-rmsnorm-eps-1e6-pmat-698n

noahgift commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 20, 2026

Summary

Fix

Test plan

Cascade stage 10

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant