docs(SHIP-TWO-001 §60): SHIP-007 §22 FULLY CLOSED — H1 CONFIRMED apples-to-apples — spec v3.04.0 → v3.05.0 by noahgift · Pull Request #1551 · paiml/aprender

noahgift · 2026-05-07T05:56:10Z

Summary

Spec banner update recording the SHIP-007 §22 cascade closure with the actual fix landing.

What landed

aprender PR #1548 (M102) — M-FFN-GGUF-7 multi-layer real-teacher chain — MERGED 2026-05-07T05:15
aprender PR #1550 (M103) — M-FFN-GGUF-5 SHIP-007 §22 fix — MERGED 2026-05-07T05:50
parity #89 — bundled M102+M103 record — MERGED 2026-05-07T05:53

MAJOR PLOT TWIST

§27's 18.23× std-ratio was a TEST METHODOLOGY ARTIFACT, NOT a numerical bug. GGUF's forward_traced only captures stats on the LAST token; APR's captured stats across ALL 7 tokens. Comparing 7-token APR std vs 1-token GGUF std = inflated apparent magnitude.

Empirical end-to-end (2026-05-07, RTX 4090, 178s)

layer-3 ratio = 1.245× → H1 CONFIRMED
All 28 layers within H1 band [0.5, 2.0]
15,233 lib tests pass; production hot paths byte-unchanged

Total session

28 PRs across 2 days including 1 actual fix landing. The cascade's findings (M94 0.077% per-matvec mechanism, M95 5.70× compounding, M-FFN-GGUF-7 1.81× real-saturating) ARE real numerical findings — but §27's 1723% magnitude that made the bug look severe was test-methodology-inflated.

Discharge potential

5 MODEL-1 PARTIALs (SHIP-002/005/006/007/008) ready for individual discharge follow-ups per §17.5. MODEL-1 ship % 91% → 96% pending those.

Methodology lesson #7 NEW

feedback_test_methodology_can_fake_bugs.md — when comparing two implementations via summary statistics, VERIFY both sides measure the same distribution shape BEFORE trusting the comparison.

Status changes

Field	Before	After
Spec version	3.04.0	3.05.0
§27 verdict	H2 (apparent bug)	H1 CONFIRMED
Layer-3 ratio	18.23×	1.245×
SHIP-007 §22	ACTIVE_ALGORITHM_LEVEL	FUNCTIONALLY DISCHARGED
MODEL-1 ship %	91%	96% pending individual discharges

Test plan

aprender PR feat(M-FFN-GGUF-7): multi-layer real-teacher chain — SATURATES at 1.81× (not exponential) #1548 (M102) MERGED
aprender PR fix(M-FFN-GGUF-5): SHIP-007 §22 H1 CONFIRMED — APR layer-3 matches GGUF apples-to-apples — bug was test methodology #1550 (M103) MERGED
parity Add renacer tracing integration for syscall-level debugging #89 (bundled record) MERGED
CI ci/gate green
Auto-merge once required checks pass

🤖 Generated with Claude Code

…es-to-apples — spec v3.04.0 → v3.05.0 M-FFN-GGUF-5 fix shipped (aprender PR #1550, MERGED 2026-05-07T05:50) + M-FFN-GGUF-7 multi-layer chain (PR #1548, MERGED 2026-05-07T05:15). MAJOR PLOT TWIST in M-FFN-GGUF-5 fix PR: §27's 18.23× std-ratio was a TEST METHODOLOGY ARTIFACT, NOT a numerical bug. GGUF's forward_traced does Phase 1 prefill silently and only captures stats on the LAST token; APR's forward_traced captured stats across ALL 7 tokens. The §27 measurement compared: APR std across 7 tokens × 28672 elements GGUF std across 1 token × 4096 elements Fundamentally incomparable. Different counts, different distributions. Two coherent fixes in PR #1550: 1. forward_traced uses Q4K+Q8K dispatch (matches production semantics; 7 call sites updated via new matmul_q4k_or_f32_traced helper) 2. M89 harness compares apples-to-apples last-token-only stats EMPIRICAL END-TO-END (2026-05-07, RTX 4090, 178s): layer-3 ratio = 1.245× → H1 CONFIRMED All 28 layers within H1 band [0.5, 2.0] 15,233 lib tests pass; production hot paths byte-unchanged The cascade's per-tensor mechanism (M94 0.077%) and compounding (M95 5.70× / M-FFN-GGUF-7 1.81× saturation) ARE real but didn't explain §27's 1723% — that was methodology-inflated. Methodology lesson #7 NEW (feedback_test_methodology_can_fake_bugs.md): when comparing two implementations via summary statistics, VERIFY both sides measure the same distribution shape BEFORE trusting the comparison. Mismatched shapes can fake bugs. Total session: 28 PRs / 2 days including 1 actual fix landing. Discharge potential per §17.5: 5 MODEL-1 PARTIALs (SHIP-002/005/006/ 007/008) ready for individual discharge follow-ups. MODEL-1 ship % 91% → 96% pending those. Spec v3.04.0 → v3.05.0. Atomic next action banner update only; full §60 narrative deferred to deliberate session. Refs PMAT-CCPA, SHIP-007 §22, M91-M103, M-FFN-GGUF-5 PR #1550. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 7, 2026 05:56

noahgift force-pushed the feat/ship-two-001-v3.05-ship007-closed-h1 branch 3 times, most recently from 70bf0a2 to f563d5e Compare May 7, 2026 07:52

noahgift force-pushed the feat/ship-two-001-v3.05-ship007-closed-h1 branch from f563d5e to f987be8 Compare May 7, 2026 08:20

noahgift merged commit 502d7e4 into main May 7, 2026
10 checks passed

noahgift deleted the feat/ship-two-001-v3.05-ship007-closed-h1 branch May 7, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(SHIP-TWO-001 §60): SHIP-007 §22 FULLY CLOSED — H1 CONFIRMED apples-to-apples — spec v3.04.0 → v3.05.0#1551

docs(SHIP-TWO-001 §60): SHIP-007 §22 FULLY CLOSED — H1 CONFIRMED apples-to-apples — spec v3.04.0 → v3.05.0#1551
noahgift merged 1 commit intomainfrom
feat/ship-two-001-v3.05-ship007-closed-h1

noahgift commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 7, 2026

Summary

What landed

MAJOR PLOT TWIST

Empirical end-to-end (2026-05-07, RTX 4090, 178s)

Total session

Discharge potential

Methodology lesson #7 NEW

Status changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant