Skip to content

docs(SHIP-TWO-001 §60): SHIP-007 §22 FULLY CLOSED — H1 CONFIRMED apples-to-apples — spec v3.04.0 → v3.05.0#1551

Merged
noahgift merged 1 commit intomainfrom
feat/ship-two-001-v3.05-ship007-closed-h1
May 7, 2026
Merged

docs(SHIP-TWO-001 §60): SHIP-007 §22 FULLY CLOSED — H1 CONFIRMED apples-to-apples — spec v3.04.0 → v3.05.0#1551
noahgift merged 1 commit intomainfrom
feat/ship-two-001-v3.05-ship007-closed-h1

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 7, 2026

Summary

Spec banner update recording the SHIP-007 §22 cascade closure with the actual fix landing.

What landed

  • aprender PR #1548 (M102) — M-FFN-GGUF-7 multi-layer real-teacher chain — MERGED 2026-05-07T05:15
  • aprender PR #1550 (M103) — M-FFN-GGUF-5 SHIP-007 §22 fix — MERGED 2026-05-07T05:50
  • parity #89 — bundled M102+M103 record — MERGED 2026-05-07T05:53

MAJOR PLOT TWIST

§27's 18.23× std-ratio was a TEST METHODOLOGY ARTIFACT, NOT a numerical bug. GGUF's forward_traced only captures stats on the LAST token; APR's captured stats across ALL 7 tokens. Comparing 7-token APR std vs 1-token GGUF std = inflated apparent magnitude.

Empirical end-to-end (2026-05-07, RTX 4090, 178s)

layer-3 ratio = 1.245× → H1 CONFIRMED
All 28 layers within H1 band [0.5, 2.0]
15,233 lib tests pass; production hot paths byte-unchanged

Total session

28 PRs across 2 days including 1 actual fix landing. The cascade's findings (M94 0.077% per-matvec mechanism, M95 5.70× compounding, M-FFN-GGUF-7 1.81× real-saturating) ARE real numerical findings — but §27's 1723% magnitude that made the bug look severe was test-methodology-inflated.

Discharge potential

5 MODEL-1 PARTIALs (SHIP-002/005/006/007/008) ready for individual discharge follow-ups per §17.5. MODEL-1 ship % 91% → 96% pending those.

Methodology lesson #7 NEW

feedback_test_methodology_can_fake_bugs.md — when comparing two implementations via summary statistics, VERIFY both sides measure the same distribution shape BEFORE trusting the comparison.

Status changes

Field Before After
Spec version 3.04.0 3.05.0
§27 verdict H2 (apparent bug) H1 CONFIRMED
Layer-3 ratio 18.23× 1.245×
SHIP-007 §22 ACTIVE_ALGORITHM_LEVEL FUNCTIONALLY DISCHARGED
MODEL-1 ship % 91% 96% pending individual discharges

Test plan

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 7, 2026 05:56
@noahgift noahgift force-pushed the feat/ship-two-001-v3.05-ship007-closed-h1 branch 3 times, most recently from 70bf0a2 to f563d5e Compare May 7, 2026 07:52
…es-to-apples — spec v3.04.0 → v3.05.0

M-FFN-GGUF-5 fix shipped (aprender PR #1550, MERGED 2026-05-07T05:50)
+ M-FFN-GGUF-7 multi-layer chain (PR #1548, MERGED 2026-05-07T05:15).

MAJOR PLOT TWIST in M-FFN-GGUF-5 fix PR: §27's 18.23× std-ratio
was a TEST METHODOLOGY ARTIFACT, NOT a numerical bug.

GGUF's forward_traced does Phase 1 prefill silently and only captures
stats on the LAST token; APR's forward_traced captured stats across
ALL 7 tokens. The §27 measurement compared:
  APR std across 7 tokens × 28672 elements
  GGUF std across 1 token × 4096 elements

Fundamentally incomparable. Different counts, different distributions.

Two coherent fixes in PR #1550:
1. forward_traced uses Q4K+Q8K dispatch (matches production semantics;
   7 call sites updated via new matmul_q4k_or_f32_traced helper)
2. M89 harness compares apples-to-apples last-token-only stats

EMPIRICAL END-TO-END (2026-05-07, RTX 4090, 178s):
  layer-3 ratio = 1.245× → H1 CONFIRMED
  All 28 layers within H1 band [0.5, 2.0]
  15,233 lib tests pass; production hot paths byte-unchanged

The cascade's per-tensor mechanism (M94 0.077%) and compounding
(M95 5.70× / M-FFN-GGUF-7 1.81× saturation) ARE real but didn't
explain §27's 1723% — that was methodology-inflated.

Methodology lesson #7 NEW (feedback_test_methodology_can_fake_bugs.md):
when comparing two implementations via summary statistics, VERIFY
both sides measure the same distribution shape BEFORE trusting the
comparison. Mismatched shapes can fake bugs.

Total session: 28 PRs / 2 days including 1 actual fix landing.

Discharge potential per §17.5: 5 MODEL-1 PARTIALs (SHIP-002/005/006/
007/008) ready for individual discharge follow-ups. MODEL-1 ship %
91% → 96% pending those.

Spec v3.04.0 → v3.05.0. Atomic next action banner update only;
full §60 narrative deferred to deliberate session.

Refs PMAT-CCPA, SHIP-007 §22, M91-M103, M-FFN-GGUF-5 PR #1550.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/ship-two-001-v3.05-ship007-closed-h1 branch from f563d5e to f987be8 Compare May 7, 2026 08:20
@noahgift noahgift merged commit 502d7e4 into main May 7, 2026
10 checks passed
@noahgift noahgift deleted the feat/ship-two-001-v3.05-ship007-closed-h1 branch May 7, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant