Skip to content

spec(ship-two-models): v3.01 → v3.02 — §57 drift sweep + 5g.1 throughput characterization#1508

Merged
noahgift merged 2 commits into
mainfrom
spec/section-57-drift-sweep-summary
May 5, 2026
Merged

spec(ship-two-models): v3.01 → v3.02 — §57 drift sweep + 5g.1 throughput characterization#1508
noahgift merged 2 commits into
mainfrom
spec/section-57-drift-sweep-summary

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 5, 2026

Summary

Records the parallel drift-sweep work that landed during the 5g.1 corpus retokenization wait + characterizes 5g.1's mid-run throughput.

Drift sweep (4 PRs this session)

PR Contract v_old → v_new Drift
#1502 apr-pretrain-arch-polymorphic-v1 v1.3 → v1.4 CUDA-001 referenced but not formally bound
#1504 apr-pretrain-from-init-v1 v1.1 → v1.2 7 of 8 cited test names didn't exist
#1505 apr-pretrain-arch-polymorphic-v1 v1.4 → v1.5 FALSIFY-005/006 names diverged from impl
#1506 apr-cli-tokenize-import-hf-v1 v1.0 → v1.1 FALSIFY-001 cited "or equivalent"

After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001 errors across all 870+ contracts.

5g.1 throughput

Mean wall: 16.3 min/shard. Linear projection: 57 shards = ~15.5hr → ETA ~22:30Z.

Companion spec

apr-cookbook v5.1.0 (commit 26415568) adds the operator-facing 4-step recipe for SHIP-TWO-001 fine-tune-from-init, mapping each step to its contract.

Methodology takeaway

When a contract is authored alongside its impl in the same cascade, AND the test names are stamped before the impl PR finalizes them, names diverge at the boundary. Happened in 3 of 4 §50.4 cascade contracts. Prevention rule documented in §57.4.

Net effects

  • Spec v3.01.0 → v3.02.0.
  • 0 PV-VER-001 errors across 870+ contracts.
  • MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% until 5g.3.

Test plan

  • PMAT pre-commit quality gates pass
  • No code changes; spec-only
  • CI gate green
  • Auto-merge fires on green CI

🤖 Generated with Claude Code

…oughput characterization

§56 closed with 5g.1 full-corpus retokenization dispatched (PID
2767124, ~17hr wall projected). §57 records the parallel drift-sweep
work that landed during the 5g.1 wait + throughput characterization
of 5g.1 mid-run.

## Drift sweep (4 PRs)

While 5g.1 ran in the background, a sweep of the §50.4 cascade
contracts surfaced THE SAME drift class across multiple contracts:
cited test names that didn't match what the impl PR actually authored.

  PR     | Contract                              | v_old → v_new | Drift
  ---    | ---                                   | ---           | ---
  #1502  | apr-pretrain-arch-polymorphic-v1      | v1.3 → v1.4   | CUDA-001 was REFERENCED in changelog but had no formal falsification_test entry
  #1504  | apr-pretrain-from-init-v1             | v1.1 → v1.2   | 7 of 8 cited test names didn't exist; re-aligned to existing tests
  #1505  | apr-pretrain-arch-polymorphic-v1      | v1.4 → v1.5   | FALSIFY-005/006 cited names diverged from PR #1476's actual authoring
  #1506  | apr-cli-tokenize-import-hf-v1         | v1.0 → v1.1   | FALSIFY-001 cited "or equivalent" — no real test name

After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001
errors across all 870+ contracts. The drift class is fully closed.

## 5g.1 throughput (real-time mid-run)

  Shard | Closed at | Δ from prev
  0     | 07:08    | (start)
  1     | 07:24    | 16 min
  2     | 07:39    | 15 min
  3     | 07:55    | 16 min
  ...
  12    | 10:16    | (in progress)

Mean wall: 16.3 min/shard. Linear projection: 57 shards × 16.3 min =
929 min = ~15.5 hr total → ETA ~22:30Z (slightly under §56's 17hr
smoke estimate).

## Methodology takeaway

When a contract is authored in PR_A alongside its impl, AND the
impl's test names are stamped in the contract's `test:` field BEFORE
the impl PR finalizes the names, the names diverge at the cascade
boundary. Happened in 3 of 4 §50.4 cascade contracts.

Prevention rule: when authoring a new contract that cites tests,
EITHER reference tests that already exist on main, OR mark them
`PENDING_PR_<N>:` with the impl PR ref so PV-VER-001 lint can flag
dangling refs at contract-merge time.

A future spec amendment could codify a `pv lint --strict-test-binding`
enforcement that blocks contract merge when any `test:` field doesn't
resolve to an existing test invocation. Out of §57 scope.

## Net effects

- Spec v3.01.0 → v3.02.0.
- Three contract bumps land cleanly (apr-pretrain-arch-polymorphic-v1
  v1.3→v1.4→v1.5, apr-pretrain-from-init-v1 v1.1→v1.2,
  apr-cli-tokenize-import-hf-v1 v1.0→v1.1).
- pv lint contracts/ 0 PV-VER-001 errors across 870+ contracts.
- 5g.1 full corpus run progressing at 16.3 min/shard; ETA ~22:30Z.
- MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%
  until step 5g.3 produces val_loss < 9.38.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PRs #1502/#1504/#1505/#1506 (drift sweep),
      apr-cookbook spec v5.1.0 (companion update — operator-facing recipe)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 975e39b into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the spec/section-57-drift-sweep-summary branch May 5, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant