Skip to content

docs(spec): §79 + §80 + §81 consolidated — audit retrospective + priority queue + P0 packaging gaps#1702

Merged
noahgift merged 2 commits into
mainfrom
spec/79-80-81-consolidated
May 15, 2026
Merged

docs(spec): §79 + §80 + §81 consolidated — audit retrospective + priority queue + P0 packaging gaps#1702
noahgift merged 2 commits into
mainfrom
spec/79-80-81-consolidated

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Consolidates PRs #1695 / #1697 / #1698 (all DIRTY against main from overlapping header edits) into a single mergeable amendment. Adds §79 (external audit + Five-Whys), §80 (prioritized backlog → 92% ceiling), §81 (Class 3 packaging defects surfaced by P0 dispatch trio). Companion code PRs #1699 (P0-F arch case) + #1701 (P0-D embed tokenizer + P0-E arch metadata) close all three defects identified in §81. Methodology lessons #26-28 NEW. Spec v3.24.0 → v3.27.0.

…0 packaging gaps

Triple-amendment to SPEC-SHIP-TWO-001 capturing the §78 → §80 dispatch
arc that revealed a Class 3 packaging-defect wave in apr pretrain output.

Consolidates the content of PRs #1695 (§79), #1697 (§80), #1698 (§81)
which were all DIRTY against main due to overlapping spec-header edits.

§79 — External audit + Five-Whys retrospective on MODEL-2 convergence
  Synthesizes docs/specifications/two-model-spec-audit.md. Identifies
  three compounding root causes for the val_loss=9.75 plateau:
    1. Data starvation (0.24% of Chinchilla-optimal token count)
    2. False plateau hypothesis (LR-budget falsification)
    3. Infrastructure masking bugs (silent CPU fallback, exhaustion
       placeholder, premature early-stop)
  Five-Whys for Case A (silent corpus exhaustion), Case B (early stop),
  Case C (val_loss=9.75 plateau). Reconciles audit Recommendations 1-3
  vs §78's §49-pivot path.

§80 — Prioritized open-follow-up backlog
  Ranks all open SHIP-TWO-001 work by ship-% delta ÷ effort. P0 trio
  (apr qa / bench / export against epoch-004.apr) + P1 Chinchilla gate
  + P1 python validity + P1 HumanEval + P2-A long train = MODEL-2
  theoretical ceiling 92% at ~6-10h RTX 4090 compute.

§81 — P0 dispatch surfaced 3 systemic packaging-defect gaps
  Dispatching §80's P0 trio against §78's epoch-004.apr revealed:
    - P0-A apr qa     → "APR missing embedded tokenizer"
    - P0-B apr bench  → "C-03: APR model missing 'hidden_size' metadata"
    - P0-C apr export → PASSED, but llama-cli refused with
                        "unknown model architecture: 'LlamaForCausalLM'"
                        (GGUF expects lowercase "llama")
  Companion code PRs:
    - #1699 P0-F      → HF→GGUF arch case mapping in apr export
    - #1701 P0-D + P0-E → embed tokenizer + write arch metadata in
                          apr pretrain output
  AC-SHIP2-010 → DISCHARGED (315.5 tok/s on Qwen-0.5B fine-tune;
  3.15× over the 100 tok/s floor).

Methodology lessons added:
  #26 NEW: Three-class root-cause taxonomy for ML convergence failures
          (data starvation / optimization defects / infrastructure
          masking). Diagnose which class is binding before tuning.
  #27 NEW: Prioritize by ship-% delta ÷ effort, not alphabetical AC
          order. P0 dispatches are 0.1% the compute cost of P2-A.
  #28 NEW: Class 3 defects come in waves. Training works ≠ checkpoint
          is usable. Each lifecycle stage needs its own surfacing
          dispatch.

Ship-% movement:
  MODEL-1: 100% (unchanged)
  MODEL-2: 75% (unchanged in this PR; +2pp expected on #1701 merge)

Spec v3.24.0 → v3.27.0.

Replaces PRs #1695, #1697, #1698 (all DIRTY against main).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 749e0c4 into main May 15, 2026
10 checks passed
@noahgift noahgift deleted the spec/79-80-81-consolidated branch May 15, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant