Skip to content

feat(apr-cli): wire apr pretrain --init <model.apr> — §49 step 4#1471

Merged
noahgift merged 1 commit into
mainfrom
feat/apr-pretrain-init-flag-step4
May 4, 2026
Merged

feat(apr-cli): wire apr pretrain --init <model.apr> — §49 step 4#1471
noahgift merged 1 commit into
mainfrom
feat/apr-pretrain-init-flag-step4

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 4, 2026

Summary

Implements apr pretrain --init <PATH> per the contract authored in #1470. §49 step 4 of the MODEL-2 pretrained-init pivot.

§49 (spec v2.94.0, 2026-05-04, #1461) retired the from-scratch MODEL-2 strategy after §24.8's 80K-step LR-budget falsification confirmed val_loss=9.75 as a corpus-bottleneck floor. Fine-tuning a Qwen2.5-class pretrained checkpoint (which has already paid the 1T-token data tax) is the load-bearing path.

What this PR adds

  • Clap field init: Option<PathBuf> on ExtendedCommands::Pretrain (extended_commands.rs:635). Optional — absence preserves existing pretrain behavior.
  • Plumbing through dispatch_analysis.rs:346 → commands::pretrain::run (new init: Option<&Path> param)
  • Helper validate_init_apr_path() in pretrain.rs:
    • File-open → FALSIFY-003 (missing → exit non-zero)
    • Read 4 magic bytes → FALSIFY-004 (read fails → exit non-zero)
    • Compare magic vs APR\\0 / APRN → FALSIFY-004
    • Valid magic → "not yet wired" error pointing at §49 step 5 (no silent random-init fallback)
  • 7 new unit tests (all green)
  • Contract bump PROPOSED → PARTIAL_ALGORITHM_LEVEL (v1.0.0 → v1.1.0) with changelog

Test results

$ cargo test -p apr-cli --lib commands::pretrain::tests::
running 21 tests
test commands::pretrain::tests::pretrain_init_empty_file_errors ... ok
test commands::pretrain::tests::pretrain_init_bad_magic_errors ... ok
test commands::pretrain::tests::pretrain_init_missing_file_errors ... ok
test commands::pretrain::tests::pretrain_init_v1_magic_aprn_recognised ... ok
test commands::pretrain::tests::pretrain_init_valid_apr_rejected_until_step5 ... ok
test commands::pretrain::tests::pretrain_init_flag_absent_parses_to_none ... ok
test commands::pretrain::tests::pretrain_init_flag_parses_path ... ok
... (14 existing tests still pass)

test result: ok. 21 passed; 0 failed

Help output

$ apr pretrain --help | grep -A2 init
      --init <PATH>
          Initial weights from a pretrained APR file (contract `apr-pretrain-from-init-v1`).
          Per spec §49's MODEL-2 pretrained-init pivot: ...

Step 5 follow-up scope (~150 LOC)

  • Architecture matching: read APR header, compare vocab/hidden/layers/heads → discharges FALSIFY-005
  • Actual weight load: read tensor shards, materialize as optimizer initial state → discharges FALSIFY-006/009/010
  • LIVE 500-step fine-tune on Qwen2.5-Coder-0.5B-Instruct.apr → DISCHARGED (val_loss < 9.38)

Plain ship-% update

  • MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  • MODEL-2: unchanged at 57% — first ship-% movement gated on §49 step 5 (LIVE 500-step fine-tune)

Five Whys

  1. Why a small partial-state PR instead of full step 4+5? §49 step 4 was scoped at ~50 LOC for "wire the flag"; step 5 does the full weight load. Splitting keeps each PR small and reviewable.
  2. Why reject EVERY valid APR right now? Honors the contract's no-silent-fallback invariant. If we accepted valid APRs and silently used random init while step 5 is open, an operator could ship a "fine-tune" run that's actually random — exactly the §24 silent-default defect class.
  3. Why a custom error naming "§49 step 5" instead of just "not implemented"? Operators tracing failure to the source PR can grep the spec; "not implemented" gives no thread to pull. The error message IS the breadcrumb.
  4. Why bump contract from PROPOSED to PARTIAL_ALGORITHM_LEVEL in the same PR? Atomicity: contract status describes the impl level we have evidence for. Leaving it at PROPOSED while impl is on main creates drift between status and reality.
  5. Why not implement step 5 in the same PR? MappedAprModel architecture extraction is deeper plumbing (header reading, qtype decoding, optimizer state init) warranting its own commit. Single-piece flow per Toyota Way.

Test plan

  • cargo test -p apr-cli --lib commands::pretrain::tests:: — 21/21 pass
  • cargo check -p apr-cli --lib — clean
  • pv validate contracts/apr-pretrain-from-init-v1.yaml — 0 errors
  • apr pretrain --help | grep init — help text rendered
  • CI checks (gate, test, lint, coverage)
  • pmat quality-gates (pre-commit hook)

Refs

🤖 Generated with Claude Code

Implements `apr pretrain --init <PATH>` per the contract authored in
PR #1470 (`apr-pretrain-from-init-v1` v1.0.0 PROPOSED). This is §49
step 4 of the MODEL-2 pretrained-init pivot.

Spec §49 (v2.94.0, 2026-05-04, PR #1461) retired the from-scratch
MODEL-2 strategy after §24.8's 80K-step LR-budget falsification
confirmed val_loss=9.75 as a corpus-bottleneck floor on the 565M-token
corpus. Fine-tuning a Qwen2.5-class pretrained checkpoint (which has
already paid the 1T-token data tax) is the load-bearing path. This PR
adds the flag that loads weights from an APR file as the initial
weights for the pretrain optimizer.

What this PR adds:

  1. Clap field `init: Option<PathBuf>` on the `Pretrain` variant
     in `extended_commands.rs:635`. Optional — absence preserves
     existing pretrain behavior (no §24/§25 regression).
  2. Plumbing through `dispatch_analysis.rs:346` to
     `commands::pretrain::run` (new `init: Option<&Path>` param).
  3. New helper `validate_init_apr_path()` in `pretrain.rs`:
       a. Open file → FALSIFY-003 (missing → exit non-zero)
       b. Read 4 magic bytes → FALSIFY-004 (read fails → exit non-zero)
       c. Compare magic bytes vs APR\\0 / APRN → FALSIFY-004
       d. If valid magic → return "not yet wired" error pointing at
          §49 step 5 (no silent random-init fallback)
  4. 7 new unit tests in `pretrain::tests`:
       - pretrain_init_flag_absent_parses_to_none (FALSIFY-001/002)
       - pretrain_init_flag_parses_path           (FALSIFY-001)
       - pretrain_init_missing_file_errors        (FALSIFY-003)
       - pretrain_init_bad_magic_errors           (FALSIFY-004)
       - pretrain_init_empty_file_errors          (FALSIFY-004 edge)
       - pretrain_init_valid_apr_rejected_until_step5 (partial-state guard)
       - pretrain_init_v1_magic_aprn_recognised   (v1 magic acceptance)
  5. Contract status bump: PROPOSED → PARTIAL_ALGORITHM_LEVEL via
     v1.0.0 → v1.1.0 metadata update + changelog entry.

Test results (cargo test -p apr-cli --lib commands::pretrain::):
    21 passed; 0 failed; 0 ignored

Step 5 follow-up scope (~150 LOC):

  - Architecture matching: read APR header, compare vocab/hidden/
    layers/heads against pretrain target → discharges FALSIFY-005
  - Actual weight load: read tensor shards, materialize into
    optimizer's initial state → discharges FALSIFY-006/009/010
  - LIVE 500-step fine-tune on Qwen2.5-Coder-0.5B-Instruct.apr →
    DISCHARGED (val_loss < 9.38)

Five Whys:

  1. Why a small partial-state PR instead of full step 4+5? §49
     step 4 was scoped at ~50 LOC for "wire the flag"; step 5
     does the full weight load. Splitting keeps each PR small,
     reviewable, and lets CI catch silent-fallback regressions
     between the two steps.

  2. Why have validate_init_apr_path() reject EVERY valid APR right
     now? Honors the contract's no-silent-fallback invariant. If we
     accepted valid APRs and silently used random init while step 5
     is open, an operator could ship a "fine-tune" run that's
     actually a from-scratch run — exactly the §24 silent-default
     defect class this PR is built to prevent.

  3. Why a custom error message naming "§49 step 5" instead of just
     "not implemented"? Operators tracing a failure to the source
     PR can find the next-step contract obligations by grep'ing the
     spec; "not implemented" gives them no thread to pull. The
     error message IS the breadcrumb to the next-cycle work.

  4. Why bump the contract from PROPOSED to PARTIAL_ALGORITHM_LEVEL
     in the same PR? Atomicity: the contract describes the flag's
     algorithm at the LEVEL we have impl evidence for. PROPOSED
     means "no impl"; PARTIAL means "compile-bound + algorithm-bound
     at sub-falsifier granularity". This PR delivers exactly that.
     Leaving the contract at PROPOSED while the impl is on main
     creates a drift between status and reality.

  5. Why not implement step 5 in the same PR? The MappedAprModel
     architecture extraction is a deeper plumbing question (header
     reading, GGUF qtype decoding, optimizer state initialization)
     that warrants its own commit + review. Going small + atomic
     is the Toyota Way single-piece flow.

Plain ship-% update:

  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §49
    step 5 weight-load impl + LIVE 500-step fine-tune (FALSIFY-006)

Refs:

  - SPEC-SHIP-TWO-001 §49 — MODEL-2 strategy pivot (#1461)
  - contracts/apr-pretrain-from-init-v1.yaml v1.0.0 → v1.1.0
  - PR #1470 — contract authoring (merged)
  - feedback_cli_subcommand_three_surface_drift.md (3-surface rule)
  - feedback_no_guessing.md
  - memory:project_qwen2_0_5b_is_ship_007_manifestation.md (orthogonal)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 4, 2026 13:46
@noahgift noahgift merged commit 6e7cb35 into main May 4, 2026
11 checks passed
@noahgift noahgift deleted the feat/apr-pretrain-init-flag-step4 branch May 4, 2026 14:09
noahgift added a commit that referenced this pull request May 4, 2026
…oupling finding (#1472)

Adds §50 documenting the architecture-mismatch finding caught after §49.6
steps 3+4 landed (PR #1470 contract + PR #1471 wire-up). The remaining
§49.6 step 5 was scoped at "0 LOC, just run apr pretrain --init" — that
assumption is empirically wrong.

Empirical finding (§50.1):
  pretrain_real.rs:38-46 HARDCODES Llama370MConfig::* for every
  architectural constant. Qwen2.5-Coder-0.5B-Instruct has different
  shape across the board:

    Param            | Llama370M | Qwen2.5-Coder-0.5B
    -----------------|-----------|--------------------
    hidden_size      | 1024      | 896
    num_attention_heads | 16     | 14
    num_kv_heads     | 4 (GQA-4:1) | 2 (GQA-7:1)
    intermediate_size | 2816    | 4864
    vocab_size       | 50_257    | 151_936
    rope_theta       | 10_000    | 1_000_000

  Every tensor mismatches. Loading Qwen2.5 weights into a Llama370M-
  shaped optimizer is a category error.

Three options surfaced (§50.3):
  A: Find/build a Llama-shaped 0.5B pretrained checkpoint
     (~5K LOC + multi-week training; recreates §24/§25 corpus problem)
  B: Make trainer architecture-polymorphic
     (~200-400 LOC; preserves §24/§25 falsification; recommended)
  C: Replace Llama370MConfig with Qwen2_5_Coder_0_5B_Config outright
     (~300 LOC; deletes a working falsification path)

Recommendation (§50.5): Option B — preserves §24/§25 falsification
evidence, exercises TransformerConfig's designed polymorphism, binds
each new component (qwen2_0_5b constructor, GQA-7:1 attention, Qwen
tokenizer surface) to its own falsifier.

Re-scoped roadmap (§50.4) — 8 sub-steps replacing original step 5:
  5a. Author apr-pretrain-arch-polymorphic-v1.yaml contract  (~80 LOC)
  5b. TransformerConfig::qwen2_0_5b() constructor           (~40 LOC)
  5c. Extract arch from init APR file metadata              (~80 LOC)
  5d. Qwen tokenizer-vocab compatibility check              (~30 LOC)
  5e. GQA-7:1 attention forward-pass verification           (~50 LOC)
  5f. Wire actual weight load                              (~120 LOC)
  5g. LIVE 500-step smoke fine-tune (operator dispatch)        0 LOC
  5h. Stamp + publish as MODEL-2 v2                         (~10 LOC)

  Total: ~410 LOC + 1 LIVE training run.

Five Whys (§50.6):
  1. Why didn't §49 catch this? §49 was authored from strategy/
     data-budget reasoning; the 0-LOC step-5 cost implicitly
     assumed polymorphism. Live source inspection (this section's
     empirical move) revealed pretrain_real.rs:38-46 predates the
     assumption.
  2. Why catch this NOW and not in step 5 implementation? Per
     feedback_no_guessing.md: read live source before forming
     implementation plan. Surfacing the mismatch BEFORE writing
     200 LOC of weight-load code that fails at runtime is the
     cheapest place to pay cost-of-defect. The §50-prior wrong-
     premise PRs (#1466/#1467/#1468 closed) on the SHIP-007 / 0.5B
     gibberish track were the same defect class.
  3. Why option B over A or C? Preserves §24/§25 falsification
     evidence (we KEEP knowing from-scratch fails at 9.75; we just
     don't ship it as MODEL-2). Exercises the polymorphism
     TransformerConfig was designed for. Each new component becomes
     its own falsifier rather than a hidden coupling.
  4. Why is FALSIFY-005 the right place to fail-fast? PR #1470
     already pinned "Architecture mismatch is FAIL-FAST, not silent-
     truncate". Step 4 (PR #1471) doesn't enforce arch matching yet
     — returns "not yet wired" before getting there. So FALSIFY-005
     is currently UNBOUND but its discharge gate is well-defined:
     read APR header, compare against pretrain target, error with
     names of mismatched fields.
  5. Why isn't this a "punt"? A punt would say "blocked, await
     operator". This amendment names three options with LOC
     estimates, recommends one with reasoning, gives a concrete 8-
     step roadmap with falsifier discharge mapped to each sub-step.
     The work IS shippable; it's just bigger than 0 LOC.

Plain ship-% update:
  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4
    step 5g (LIVE 500-step fine-tune producing val_loss < 9.38).
    Sub-steps 5a-5f can each individually move 1% with falsifier
    discharge (architecture-polymorphic infrastructure shipped ==
    evidence that the §49 path is REACHABLE, not just theoretical).

Refs:
  - §49 — MODEL-2 strategy pivot (PR #1461)
  - PR #1470 — apr-pretrain-from-init-v1 v1.0.0 PROPOSED contract
  - PR #1471 — apr pretrain --init clap field + magic-byte validate
  - feedback_no_guessing.md — read source before forming hypothesis
  - feedback_fix_root_cause_never_route_around.md

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 4, 2026
… 5b + DEFECT FIX (#1474)

§50.4 step 5b authored a contract assuming `qwen2_0_5b()` did not
exist. Live source inspection during impl revealed the constructor
ALREADY EXISTS at `transformer/config.rs:156`. Reading the HF config
byte-for-byte (per `feedback_no_guessing.md`) revealed a real defect:

  HF config (Qwen2.5-Coder-0.5B-Instruct):  tie_word_embeddings: true
  Existing code (qwen2_0_5b):                tie_word_embeddings: false

Fix: 1 LOC change `false → true`. Per Qwen scaling-law convention
verified against the HF cache:

  - Qwen2.5-Coder-0.5B: tie=true   (HF cache 2026-05-04 ✓)
  - Qwen2.5-Coder-1.5B: tie=true   (HF cache 2026-05-04 ✓; inherits
                                    via `..Self::qwen2_0_5b()` spread)
  - Qwen2.5-Coder-7B:   tie=false  (HF cache 2026-05-04 ✓; explicit
                                    in qwen2_7b())

Why the defect matters: tied vs untied embeddings is a load-bearing
architectural property. With tie=false (current bug), if an operator
fine-tunes from a Qwen2.5-0.5B init checkpoint, the lm_head will be
allocated as a separate tensor that doesn't get loaded (because the
APR file only contains the embed_tokens tensor — they share weights).
The result: lm_head random-initialized and untrained, producing
silent gibberish at val time. This is exactly the §49 / §50 failure
class the contract was authored to prevent.

What this PR adds:

  1. Fix `tie_word_embeddings: false → true` in `qwen2_0_5b()` at
     `transformer/config.rs:156-174`
  2. Add docstring noting the empirical verification + HF cache path
     + Qwen scaling-law quirk
  3. Add 3 new unit tests in `transformer::config::tests`:
       - `qwen2_0_5b_matches_hf_config_2026_05_04` (FALSIFY-001 byte-
         identity verification — 11 fields)
       - `qwen2_1_5b_inherits_tie_word_embeddings_from_0_5b` (drift-
         prevention; catches future spread-split refactors)
       - `qwen2_7b_does_not_tie_embeddings` (drift-prevention; pins
         the 7B Qwen scaling-law quirk against silent flips)

Test results (cargo test -p aprender-train --lib transformer::config::tests::qwen2):
    3 passed; 0 failed; 0 ignored

Discharges FALSIFY-APR-PRETRAIN-ARCH-001 in PR #1473's contract.

Five Whys:

  1. Why was the constructor already there but with the wrong tie
     setting? Likely authored before the spec-§49 use case became the
     load-bearing target. The constants for `qwen2_0_5b` were correct
     for inference, but tie_word_embeddings is mostly a training-
     pipeline concern — it determines whether lm_head is a separate
     trainable parameter or shares with embed_tokens.

  2. Why didn't pmat query / cargo test catch this earlier? Existing
     tests pinned shape (hidden, layers, heads, etc.) but no test
     verified `tie_word_embeddings`. This PR adds the missing
     drift-prevention test that catches the defect class.

  3. Why fix this in the same PR as the test (not a separate fix)?
     Toyota Way: the test IS the discharge mechanism for FALSIFY-001.
     A test that passed against the (defective) status quo would be a
     liar. Fixing first + testing second guarantees the test pins
     correct behavior, not whatever happened to be in the code.

  4. Why also pin qwen2_1_5b (inheritance) and qwen2_7b (anti-spread)?
     Those are drift-prevention. The spread-inheritance pattern
     `..Self::qwen2_0_5b()` is fragile — a future refactor could
     split the inheritance chain and silently flip tie_word_embeddings
     back to false on 1.5B. Test catches that. Similarly, an over-
     enthusiastic refactor could homogenize 7B with 0.5B (incorrectly
     setting 7B's tie=true). Test catches that too.

  5. Why §50.4 step 5b was overscoped at 40 LOC: §50 was authored
     under the assumption that the constructor didn't exist. Live
     source inspection (per `feedback_no_guessing.md`) revealed the
     foundation was already there, just with one defect. This is the
     same lesson as §50 itself — read source before authoring scope.
     The contract from PR #1473 is still valid; only the LOC estimate
     in §50.4's table was wrong.

Plain ship-% update:
  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4
    step 5g (LIVE 500-step fine-tune producing val_loss < 9.38)

Refs:
  - SPEC-SHIP-TWO-001 §49 — MODEL-2 strategy pivot (#1461)
  - SPEC-SHIP-TWO-001 §50 — architecture-coupling finding (#1472, in flight)
  - PR #1470 — apr-pretrain-from-init-v1 contract (merged)
  - PR #1471 — apr pretrain --init wire-up (merged)
  - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight)
  - HF config: ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-{0.5B,1.5B,7B}-Instruct/.../config.json
  - feedback_no_guessing.md — read source before forming hypothesis

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 4, 2026
…tep 5a (#1473)

Authors `contracts/apr-pretrain-arch-polymorphic-v1.yaml` v1.0.0
PROPOSED — the contract layer driving §50.4 steps 5b-5f (the
architecture-polymorphic pretrain trainer that unblocks fine-tuning
from a Qwen2.5-class init checkpoint).

Per §50 (PR #1472), the existing pretrain trainer
`pretrain_real.rs:38-46` HARDCODES every architectural constant from
`Llama370MConfig`. Loading Qwen2.5-Coder-0.5B-Instruct weights into
this fixed shape is a category error (vocab 50K vs 152K, hidden 1024
vs 896, GQA 4:1 vs 7:1, etc.). This contract pins the polymorphic
builder + 4 invariants:

  1. arch_extraction_signature — init=None preserves §24/§25 baseline;
     init=Some extracts all 10 fields from APR header, no silent defaults
  2. qwen2_0_5b_constructor — TransformerConfig::qwen2_0_5b() returns
     a config matching HF config.json byte-for-byte (vocab=151_936,
     hidden=896, GQA-7:1, rope_theta=1e6, use_bias=true,
     tie_word_embeddings=true)
  3. gqa_7_to_1_invariants — attention kernel handles GQA-7:1 without
     per-ratio specialization; cosine ≥ 0.9999 vs GQA-1:1 reference
  4. qwen_tokenizer_vocab_compatibility — preflight gates by EXTRACTED
     vocab (151_936 for Qwen) when --init present, falls back to
     Llama370MConfig::VOCAB_SIZE (50_257) when absent

8 falsifiers (FALSIFY-APR-PRETRAIN-ARCH-001..008), 6 proof obligations,
2 kani harnesses. `pv validate` exits 0 with 0 errors / 0 warnings.

This contract DOES NOT replace apr-pretrain-from-init-v1; the two
compose. apr-pretrain-from-init-v1 pins the --init flag's CLI surface
+ magic-byte validation; this contract pins the architecture
extraction algorithm that --init's weight load depends on.
FALSIFY-APR-PRETRAIN-INIT-005 (arch mismatch) becomes DISCHARGED when
this contract's FALSIFY-007 lands.

Five Whys:

  1. Why a contract before the impl? §50.4 step 5a is THE first step
     of the re-scoped roadmap. The contract pins what 5b-5f must
     satisfy — without it, the impl PRs would each pick their own
     arbitrary semantics for "extract arch from APR". Contract-first
     prevents 5-PR scope drift.

  2. Why 8 falsifiers, not 4? Each of the 4 equations decomposes into
     2 falsifiable claims: (existence + correctness) for the
     constructor, (init=None + init=Some) for the builder, (forward-
     pass + reference-comparison) for GQA-7:1, (positive + negative
     case) for the tokenizer surface. 8 covers every silent-failure
     mode the §24 retrospective showed is possible.

  3. Why also pin GQA-7:1 here, not just in gqa-kernel-v1? The
     existing gqa-kernel-v1 covers GQA generally; what's NEW is that
     the Llama370M codepath empirically only exercised 4:1 (kv=4,
     q=16). Qwen2.5 exercises 7:1 (kv=2, q=14). FALSIFY-004 makes
     this transition contract-bound rather than tribal knowledge.

  4. Why not just delete Llama370MConfig outright? Per §50.3 Option C
     analysis: that deletes the §24/§25 falsification evidence (we
     KEEP knowing from-scratch fails at val_loss=9.75 on the existing
     corpus). The polymorphic builder preserves both paths — Llama370M
     for the from-scratch baseline, Qwen2.5 (or any future init) for
     the fine-tune path.

  5. Why is FALSIFY-007 (encoder/decoder mismatch) load-bearing?
     Without it, an operator who points --init at e.g. CodeBERT (an
     encoder) would silently load weights into a decoder-shaped
     trainer, producing nonsense gradients. The error message must
     name the architecture-family mismatch, not crash later with
     cryptic shape errors during the first forward pass.

Plain ship-% update:
  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4
    step 5g (LIVE 500-step fine-tune producing val_loss < 9.38)

Refs:
  - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling finding (#1472)
  - SPEC-SHIP-TWO-001 §50.4 — re-scoped roadmap (steps 5a-5h)
  - contracts/apr-pretrain-from-init-v1.yaml v1.1.0 PARTIAL (#1471, sibling)
  - contracts/training-loop-pretrain-v1.yaml v1.5.0 ACTIVE (parent)
  - contracts/architecture-requirements-v1.yaml (sibling)
  - contracts/gqa-kernel-v1.yaml (sibling — GQA ratio invariants)
  - feedback_no_guessing.md
  - feedback_fix_root_cause_never_route_around.md

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495)

§50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at
2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now
end-to-end functional on CPU; the legacy "not yet wired" Err is
RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-%
can move from 57% → ≥58%.

Spec amendment §53:
- Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007
  via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass
  smoke + 008 contract validation are inherently algorithm-level).
- Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED;
  5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090.
- Cascade ships statistics: 11 PRs over 2 days
  (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494).
- MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%
  (gated on 5g empirical val_loss < 9.38 evidence).
- 3 CI andon classes documented as feedback memories during cascade
  (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge
  behind-state).

Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL:
- All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the
  user-facing `apr pretrain --init` flow.
- verification_summary updated: tested 7 → 8; status partial →
  functional.
- Added §52 + §53 references.
- Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical
  500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr
  producing val_loss < 9.38.

`pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant