spec(ship-two-models): v3.01 → v3.02 — §57 drift sweep + 5g.1 throughput characterization#1508
Merged
Merged
Conversation
…oughput characterization §56 closed with 5g.1 full-corpus retokenization dispatched (PID 2767124, ~17hr wall projected). §57 records the parallel drift-sweep work that landed during the 5g.1 wait + throughput characterization of 5g.1 mid-run. ## Drift sweep (4 PRs) While 5g.1 ran in the background, a sweep of the §50.4 cascade contracts surfaced THE SAME drift class across multiple contracts: cited test names that didn't match what the impl PR actually authored. PR | Contract | v_old → v_new | Drift --- | --- | --- | --- #1502 | apr-pretrain-arch-polymorphic-v1 | v1.3 → v1.4 | CUDA-001 was REFERENCED in changelog but had no formal falsification_test entry #1504 | apr-pretrain-from-init-v1 | v1.1 → v1.2 | 7 of 8 cited test names didn't exist; re-aligned to existing tests #1505 | apr-pretrain-arch-polymorphic-v1 | v1.4 → v1.5 | FALSIFY-005/006 cited names diverged from PR #1476's actual authoring #1506 | apr-cli-tokenize-import-hf-v1 | v1.0 → v1.1 | FALSIFY-001 cited "or equivalent" — no real test name After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001 errors across all 870+ contracts. The drift class is fully closed. ## 5g.1 throughput (real-time mid-run) Shard | Closed at | Δ from prev 0 | 07:08 | (start) 1 | 07:24 | 16 min 2 | 07:39 | 15 min 3 | 07:55 | 16 min ... 12 | 10:16 | (in progress) Mean wall: 16.3 min/shard. Linear projection: 57 shards × 16.3 min = 929 min = ~15.5 hr total → ETA ~22:30Z (slightly under §56's 17hr smoke estimate). ## Methodology takeaway When a contract is authored in PR_A alongside its impl, AND the impl's test names are stamped in the contract's `test:` field BEFORE the impl PR finalizes the names, the names diverge at the cascade boundary. Happened in 3 of 4 §50.4 cascade contracts. Prevention rule: when authoring a new contract that cites tests, EITHER reference tests that already exist on main, OR mark them `PENDING_PR_<N>:` with the impl PR ref so PV-VER-001 lint can flag dangling refs at contract-merge time. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that blocks contract merge when any `test:` field doesn't resolve to an existing test invocation. Out of §57 scope. ## Net effects - Spec v3.01.0 → v3.02.0. - Three contract bumps land cleanly (apr-pretrain-arch-polymorphic-v1 v1.3→v1.4→v1.5, apr-pretrain-from-init-v1 v1.1→v1.2, apr-cli-tokenize-import-hf-v1 v1.0→v1.1). - pv lint contracts/ 0 PV-VER-001 errors across 870+ contracts. - 5g.1 full corpus run progressing at 16.3 min/shard; ETA ~22:30Z. - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% until step 5g.3 produces val_loss < 9.38. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PRs #1502/#1504/#1505/#1506 (drift sweep), apr-cookbook spec v5.1.0 (companion update — operator-facing recipe) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Records the parallel drift-sweep work that landed during the 5g.1 corpus retokenization wait + characterizes 5g.1's mid-run throughput.
Drift sweep (4 PRs this session)
After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001 errors across all 870+ contracts.
5g.1 throughput
Mean wall: 16.3 min/shard. Linear projection: 57 shards = ~15.5hr → ETA ~22:30Z.
Companion spec
apr-cookbook v5.1.0 (commit 26415568) adds the operator-facing 4-step recipe for SHIP-TWO-001 fine-tune-from-init, mapping each step to its contract.
Methodology takeaway
When a contract is authored alongside its impl in the same cascade, AND the test names are stamped before the impl PR finalizes them, names diverge at the boundary. Happened in 3 of 4 §50.4 cascade contracts. Prevention rule documented in §57.4.
Net effects
Test plan
🤖 Generated with Claude Code