docs(spec): §81 — P0 dispatch surfaces 3 systemic apr pretrain metadata gaps by noahgift · Pull Request #1698 · paiml/aprender

noahgift · 2026-05-15T12:40:59Z

Summary

Surfaces a Class 3 (infrastructure/packaging) defect wave. §80's predicted +6pp from the P0 trio (apr qa / bench / export against §78's epoch-004.apr) was BLOCKED on three packaging defects in MODEL-2 checkpoint output.

The three defects

P0	Predicted	Actual blocker
apr qa	+2pp, AC-SHIP2-006	`Validation failed: APR missing embedded tokenizer`
apr bench	+2pp, AC-SHIP2-010	`C-03: APR model missing 'hidden_size' metadata`
apr export → llama-cli	+2pp, AC-SHIP2-009	Export PASSED; llama-cli refused: `unknown model architecture: 'LlamaForCausalLM'` (GGUF wants lowercase `llama`)

`apr inspect` still reports the file structure as sound (`valid=true / format="APR v2" / tensor_count=291 / checksum_valid=true`) — only downstream-tool-required metadata is missing.

Root cause (per §79 lesson #26)

All three are Class 3 (infrastructure/packaging) defects, not Class 1 (data) or Class 2 (optimization). §22's wave hid the data-starvation signal. §81's wave hides packaging readiness.

Revised priority queue

§80's order is invalidated mid-flight. New blockers go first:

New item	Effort	Scope
P0-D embed tokenizer	~50 LOC	`pretrain.rs`: read `tokenizer.json` from `--tokenizer` dir, embed via `AprWriter::add_tokenizer`
P0-E arch metadata	~30 LOC	`pretrain.rs`: persist `hidden_size` + heads + ffn + layers via `AprWriter::set_metadata`
P0-F arch case mapping	~10 LOC	`export.rs`: map HF arch names (LlamaForCausalLM, Qwen2ForCausalLM, ...) to GGUF lowercase

Total: ~90 LOC + 3 tests, 1-2 days, 0 compute. Then re-dispatch P0-A/B/C for the predicted +6pp.

Methodology lesson #28 NEW

Surface defects in waves; each lifecycle stage needs its own dispatch. Training works ≠ checkpoint is usable. Adding a P2-C smoke test (cargo test pretrain_e2e — train tiny model 1 step + run qa+bench+export against output) would catch P0-D/E/F at PR time instead of via field discovery.

Ship-% movement

MODEL-1: 100% (unchanged)
MODEL-2: 75% (unchanged; §80's +6pp delta blocked until P0-D/E/F land)

Spec v3.26.0 → v3.27.0.

🤖 Generated with Claude Code

…ta gaps §80's predicted +6pp from the P0 dispatch trio (apr qa / bench / export) against §78's epoch-004.apr was BLOCKED on three packaging defects in the MODEL-2 checkpoint output: P0-A apr qa → "APR missing embedded tokenizer" (apr pretrain doesn't embed --tokenizer dir's tokenizer.json into output .apr) P0-B apr bench → "C-03: APR model missing 'hidden_size' metadata" (apr pretrain doesn't write hidden_size + num_attention_heads + ... to .apr metadata) P0-C apr export → PASSED — 2.35 GiB GGUF / 291 tensors llama-cli → FAILED — "unknown model architecture: 'LlamaForCausalLM'" (apr export --format gguf writes HF-convention architecture string; GGUF convention is lowercase "llama") apr inspect still reports valid=true / format="APR v2" / tensor_count=291 / checksum_valid=true — file structure is sound, only metadata is missing. Root cause: All three are Class 3 (infrastructure / packaging) defects per §79 lesson #26. §22's wave hid the data-starvation signal. §81's wave hides packaging readiness. Each wave needs its own surfacing dispatch — running P0-A/B/C against a real checkpoint is what surfaced this wave. Revised priority — §80's queue is invalidated mid-flight: P0-D NEW: embed tokenizer in apr pretrain output (~50 LOC) P0-E NEW: write arch metadata (hidden_size, ...) (~30 LOC) P0-F NEW: HF→GGUF arch case mapping in apr export (~10 LOC) --- then --- P0-A: apr qa (was originally P0-A in §80) P0-B: apr bench (was originally P0-B in §80) P0-C: apr export→llama (was originally P0-C in §80) Total: ~90 LOC + 3 tests, 1-2 days code work, 0 compute. After landing, re-dispatch P0-A/B/C to reach §80's predicted +6pp (75% → 81%). Methodology lesson #28 NEW: Surface defects in waves; each lifecycle stage needs its own dispatch. Training works ≠ checkpoint is usable. Add a P2-C smoke pipeline test (`cargo test pretrain_e2e`) that runs train → qa → bench → export end-to-end on a tiny model. Spec v3.26.0 → v3.27.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-05-15T13:31:44Z

Consolidated into #1702 (or successor) — original DIRTY against main from overlapping header edits with §77/§78. Content preserved verbatim in the consolidated PR.

…0 packaging gaps (#1702) Triple-amendment to SPEC-SHIP-TWO-001 capturing the §78 → §80 dispatch arc that revealed a Class 3 packaging-defect wave in apr pretrain output. Consolidates the content of PRs #1695 (§79), #1697 (§80), #1698 (§81) which were all DIRTY against main due to overlapping spec-header edits. §79 — External audit + Five-Whys retrospective on MODEL-2 convergence Synthesizes docs/specifications/two-model-spec-audit.md. Identifies three compounding root causes for the val_loss=9.75 plateau: 1. Data starvation (0.24% of Chinchilla-optimal token count) 2. False plateau hypothesis (LR-budget falsification) 3. Infrastructure masking bugs (silent CPU fallback, exhaustion placeholder, premature early-stop) Five-Whys for Case A (silent corpus exhaustion), Case B (early stop), Case C (val_loss=9.75 plateau). Reconciles audit Recommendations 1-3 vs §78's §49-pivot path. §80 — Prioritized open-follow-up backlog Ranks all open SHIP-TWO-001 work by ship-% delta ÷ effort. P0 trio (apr qa / bench / export against epoch-004.apr) + P1 Chinchilla gate + P1 python validity + P1 HumanEval + P2-A long train = MODEL-2 theoretical ceiling 92% at ~6-10h RTX 4090 compute. §81 — P0 dispatch surfaced 3 systemic packaging-defect gaps Dispatching §80's P0 trio against §78's epoch-004.apr revealed: - P0-A apr qa → "APR missing embedded tokenizer" - P0-B apr bench → "C-03: APR model missing 'hidden_size' metadata" - P0-C apr export → PASSED, but llama-cli refused with "unknown model architecture: 'LlamaForCausalLM'" (GGUF expects lowercase "llama") Companion code PRs: - #1699 P0-F → HF→GGUF arch case mapping in apr export - #1701 P0-D + P0-E → embed tokenizer + write arch metadata in apr pretrain output AC-SHIP2-010 → DISCHARGED (315.5 tok/s on Qwen-0.5B fine-tune; 3.15× over the 100 tok/s floor). Methodology lessons added: #26 NEW: Three-class root-cause taxonomy for ML convergence failures (data starvation / optimization defects / infrastructure masking). Diagnose which class is binding before tuning. #27 NEW: Prioritize by ship-% delta ÷ effort, not alphabetical AC order. P0 dispatches are 0.1% the compute cost of P2-A. #28 NEW: Class 3 defects come in waves. Training works ≠ checkpoint is usable. Each lifecycle stage needs its own surfacing dispatch. Ship-% movement: MODEL-1: 100% (unchanged) MODEL-2: 75% (unchanged in this PR; +2pp expected on #1701 merge) Spec v3.24.0 → v3.27.0. Replaces PRs #1695, #1697, #1698 (all DIRTY against main). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 15, 2026 12:41

noahgift mentioned this pull request May 15, 2026

docs(spec): §79 + §80 + §81 consolidated — audit retrospective + priority queue + P0 packaging gaps #1702

Merged

noahgift closed this May 15, 2026

auto-merge was automatically disabled May 15, 2026 13:31
Pull request was closed

noahgift deleted the spec/81-p0-metadata-gaps branch May 15, 2026 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(spec): §81 — P0 dispatch surfaces 3 systemic apr pretrain metadata gaps#1698

docs(spec): §81 — P0 dispatch surfaces 3 systemic apr pretrain metadata gaps#1698
noahgift wants to merge 1 commit into
mainfrom
spec/81-p0-metadata-gaps

noahgift commented May 15, 2026

Uh oh!

noahgift commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 15, 2026

Summary

The three defects

Root cause (per §79 lesson #26)

Revised priority queue

Methodology lesson #28 NEW

Ship-% movement

Uh oh!

noahgift commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant