Skip to content

docs(spec): §81 — P0 dispatch surfaces 3 systemic apr pretrain metadata gaps#1698

Closed
noahgift wants to merge 1 commit into
mainfrom
spec/81-p0-metadata-gaps
Closed

docs(spec): §81 — P0 dispatch surfaces 3 systemic apr pretrain metadata gaps#1698
noahgift wants to merge 1 commit into
mainfrom
spec/81-p0-metadata-gaps

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Surfaces a Class 3 (infrastructure/packaging) defect wave. §80's predicted +6pp from the P0 trio (apr qa / bench / export against §78's epoch-004.apr) was BLOCKED on three packaging defects in MODEL-2 checkpoint output.

The three defects

P0 Predicted Actual blocker
apr qa +2pp, AC-SHIP2-006 `Validation failed: APR missing embedded tokenizer`
apr bench +2pp, AC-SHIP2-010 `C-03: APR model missing 'hidden_size' metadata`
apr export → llama-cli +2pp, AC-SHIP2-009 Export PASSED; llama-cli refused: `unknown model architecture: 'LlamaForCausalLM'` (GGUF wants lowercase `llama`)

`apr inspect` still reports the file structure as sound (`valid=true / format="APR v2" / tensor_count=291 / checksum_valid=true`) — only downstream-tool-required metadata is missing.

Root cause (per §79 lesson #26)

All three are Class 3 (infrastructure/packaging) defects, not Class 1 (data) or Class 2 (optimization). §22's wave hid the data-starvation signal. §81's wave hides packaging readiness.

Revised priority queue

§80's order is invalidated mid-flight. New blockers go first:

New item Effort Scope
P0-D embed tokenizer ~50 LOC `pretrain.rs`: read `tokenizer.json` from `--tokenizer` dir, embed via `AprWriter::add_tokenizer`
P0-E arch metadata ~30 LOC `pretrain.rs`: persist `hidden_size` + heads + ffn + layers via `AprWriter::set_metadata`
P0-F arch case mapping ~10 LOC `export.rs`: map HF arch names (LlamaForCausalLM, Qwen2ForCausalLM, ...) to GGUF lowercase

Total: ~90 LOC + 3 tests, 1-2 days, 0 compute. Then re-dispatch P0-A/B/C for the predicted +6pp.

Methodology lesson #28 NEW

Surface defects in waves; each lifecycle stage needs its own dispatch. Training works ≠ checkpoint is usable. Adding a P2-C smoke test (cargo test pretrain_e2e — train tiny model 1 step + run qa+bench+export against output) would catch P0-D/E/F at PR time instead of via field discovery.

Ship-% movement

  • MODEL-1: 100% (unchanged)
  • MODEL-2: 75% (unchanged; §80's +6pp delta blocked until P0-D/E/F land)

Spec v3.26.0 → v3.27.0.

🤖 Generated with Claude Code

…ta gaps

§80's predicted +6pp from the P0 dispatch trio (apr qa / bench / export)
against §78's epoch-004.apr was BLOCKED on three packaging defects in
the MODEL-2 checkpoint output:

  P0-A apr qa     → "APR missing embedded tokenizer"
                     (apr pretrain doesn't embed --tokenizer dir's
                      tokenizer.json into output .apr)

  P0-B apr bench  → "C-03: APR model missing 'hidden_size' metadata"
                     (apr pretrain doesn't write hidden_size +
                      num_attention_heads + ... to .apr metadata)

  P0-C apr export → PASSED — 2.35 GiB GGUF / 291 tensors
       llama-cli → FAILED — "unknown model architecture: 'LlamaForCausalLM'"
                     (apr export --format gguf writes HF-convention
                      architecture string; GGUF convention is lowercase
                      "llama")

apr inspect still reports valid=true / format="APR v2" / tensor_count=291 /
checksum_valid=true — file structure is sound, only metadata is missing.

Root cause: All three are Class 3 (infrastructure / packaging) defects
per §79 lesson #26. §22's wave hid the data-starvation signal. §81's
wave hides packaging readiness. Each wave needs its own surfacing
dispatch — running P0-A/B/C against a real checkpoint is what surfaced
this wave.

Revised priority — §80's queue is invalidated mid-flight:

  P0-D NEW: embed tokenizer in apr pretrain output  (~50 LOC)
  P0-E NEW: write arch metadata (hidden_size, ...)   (~30 LOC)
  P0-F NEW: HF→GGUF arch case mapping in apr export  (~10 LOC)
  --- then ---
  P0-A: apr qa             (was originally P0-A in §80)
  P0-B: apr bench          (was originally P0-B in §80)
  P0-C: apr export→llama   (was originally P0-C in §80)

Total: ~90 LOC + 3 tests, 1-2 days code work, 0 compute. After landing,
re-dispatch P0-A/B/C to reach §80's predicted +6pp (75% → 81%).

Methodology lesson #28 NEW: Surface defects in waves; each lifecycle
stage needs its own dispatch. Training works ≠ checkpoint is usable.
Add a P2-C smoke pipeline test (`cargo test pretrain_e2e`) that runs
train → qa → bench → export end-to-end on a tiny model.

Spec v3.26.0 → v3.27.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift
Copy link
Copy Markdown
Contributor Author

Consolidated into #1702 (or successor) — original DIRTY against main from overlapping header edits with §77/§78. Content preserved verbatim in the consolidated PR.

@noahgift noahgift closed this May 15, 2026
auto-merge was automatically disabled May 15, 2026 13:31

Pull request was closed

@noahgift noahgift deleted the spec/81-p0-metadata-gaps branch May 15, 2026 13:31
noahgift added a commit that referenced this pull request May 15, 2026
…0 packaging gaps (#1702)

Triple-amendment to SPEC-SHIP-TWO-001 capturing the §78 → §80 dispatch
arc that revealed a Class 3 packaging-defect wave in apr pretrain output.

Consolidates the content of PRs #1695 (§79), #1697 (§80), #1698 (§81)
which were all DIRTY against main due to overlapping spec-header edits.

§79 — External audit + Five-Whys retrospective on MODEL-2 convergence
  Synthesizes docs/specifications/two-model-spec-audit.md. Identifies
  three compounding root causes for the val_loss=9.75 plateau:
    1. Data starvation (0.24% of Chinchilla-optimal token count)
    2. False plateau hypothesis (LR-budget falsification)
    3. Infrastructure masking bugs (silent CPU fallback, exhaustion
       placeholder, premature early-stop)
  Five-Whys for Case A (silent corpus exhaustion), Case B (early stop),
  Case C (val_loss=9.75 plateau). Reconciles audit Recommendations 1-3
  vs §78's §49-pivot path.

§80 — Prioritized open-follow-up backlog
  Ranks all open SHIP-TWO-001 work by ship-% delta ÷ effort. P0 trio
  (apr qa / bench / export against epoch-004.apr) + P1 Chinchilla gate
  + P1 python validity + P1 HumanEval + P2-A long train = MODEL-2
  theoretical ceiling 92% at ~6-10h RTX 4090 compute.

§81 — P0 dispatch surfaced 3 systemic packaging-defect gaps
  Dispatching §80's P0 trio against §78's epoch-004.apr revealed:
    - P0-A apr qa     → "APR missing embedded tokenizer"
    - P0-B apr bench  → "C-03: APR model missing 'hidden_size' metadata"
    - P0-C apr export → PASSED, but llama-cli refused with
                        "unknown model architecture: 'LlamaForCausalLM'"
                        (GGUF expects lowercase "llama")
  Companion code PRs:
    - #1699 P0-F      → HF→GGUF arch case mapping in apr export
    - #1701 P0-D + P0-E → embed tokenizer + write arch metadata in
                          apr pretrain output
  AC-SHIP2-010 → DISCHARGED (315.5 tok/s on Qwen-0.5B fine-tune;
  3.15× over the 100 tok/s floor).

Methodology lessons added:
  #26 NEW: Three-class root-cause taxonomy for ML convergence failures
          (data starvation / optimization defects / infrastructure
          masking). Diagnose which class is binding before tuning.
  #27 NEW: Prioritize by ship-% delta ÷ effort, not alphabetical AC
          order. P0 dispatches are 0.1% the compute cost of P2-A.
  #28 NEW: Class 3 defects come in waves. Training works ≠ checkpoint
          is usable. Each lifecycle stage needs its own surfacing
          dispatch.

Ship-% movement:
  MODEL-1: 100% (unchanged)
  MODEL-2: 75% (unchanged in this PR; +2pp expected on #1701 merge)

Spec v3.24.0 → v3.27.0.

Replaces PRs #1695, #1697, #1698 (all DIRTY against main).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant