feat(gate-ship-007-012): complete §6 Compound Ship Gates coverage (12/12 algorithmically bound) by noahgift · Pull Request #1042 · paiml/aprender

noahgift · 2026-04-24T08:09:19Z

Summary

Completes §6 Compound Ship Gates coverage by algorithmically binding the 6 merge-gate meta-policy rows (GATE-SHIP-007..012) at PARTIAL_ALGORITHM_LEVEL. Prior #1041 covered the 6 ship-blocking rows (001..006); this PR stacks on top and closes out the remaining half.

§6 Compound Ship Gates is now 12/12 algorithmically bound (6 ship-blocking + 6 merge-gate). Stack total: 42 PARTIAL + 3 DISCHARGED.

6 new verdict fns + constants

Gate	Const	Verdict fn
GATE-SHIP-007	`AC_GATE_SHIP_007_MAX_TOLERATED_UNWRAP_COUNT: u32 = 0`	`const fn verdict_from_unwrap_count(u32)`
GATE-SHIP-008	`AC_GATE_SHIP_008_MIN_CONTRACT_DENSITY_NEW_CODE: f32 = 1.0`	`fn verdict_from_contract_density(u32, u32, f32)`
GATE-SHIP-009	`AC_GATE_SHIP_009_REQUIRED_CHECK_COUNT: usize = 3`	`const fn verdict_from_ci_aggregate(bool, bool, bool)`
GATE-SHIP-010	`AC_GATE_SHIP_010_MAX_TOLERATED_ADVISORY_COUNT: u32 = 0`	`const fn verdict_from_advisory_count(u32)`
GATE-SHIP-011	`AC_GATE_SHIP_011_MIN_PMAT_TDG_SCORE: f32 = 90.0`	`const fn verdict_from_tdg_score(f32, f32)`
GATE-SHIP-012	`AC_GATE_SHIP_012_MIN_LINE_COVERAGE_PCT: f32 = 95.0`	`const fn verdict_from_line_coverage_pct(f32, f32)`

Each module ships a 5-8 section mutation survey, including an exhaustive 2^3 = 8 bitmask proof for GATE-SHIP-009 (CI aggregate-AND).

Contract + spec bumps

contracts/compound-ship-gates-v1.yaml v1.0.0 → v1.1.0 (stays PROPOSED) — adds 6 new falsification_tests, 6 equations, 6 proof_obligations.
docs/specifications/aprender-train/ship-two-models-spec.md v2.42.0 → v2.43.0 — §6 table rows 007..012 annotated PARTIAL_ALGORITHM_LEVEL v2.43.0.

Full discharge blocks on live CI tooling

Each gate's full discharge still blocks on running the external tool:

GATE-SHIP-007: cargo clippy --all-targets --all-features -- -D warnings
GATE-SHIP-008: pmat density --new-code --json
GATE-SHIP-009: branch-protection ci / gate + workspace-test
GATE-SHIP-010: cargo deny check advisories
GATE-SHIP-011: pmat tdg . --format json
GATE-SHIP-012: cargo llvm-cov report --json

Stacking note

This branch is stacked on feat/gate-ship-001-006-bundle (PR #1041). The commit on this branch applies cleanly on top of main once #1041 merges; no rebase required.

Test plan

cargo fmt --manifest-path crates/aprender-core/Cargo.toml --check — clean
cargo test -p aprender-core --lib format::gate_ship_007..012 — 6/6 pass
cargo test -p aprender-core --doc format::gate_ship_007..012 — 6/6 doctests pass
pv validate contracts/compound-ship-gates-v1.yaml — 0 errors, 0 warnings
cargo clippy -p aprender-core --lib --no-deps — clean

🤖 Generated with Claude Code

… race (ANDON paiml/infra#77) (#1043) * fix(ci): per-PR cargo registry to break intel-runner concurrent-write race (paiml/infra#77) ANDON 2026-04-24 — aprender 11-PR stack (#1031..#1042) all failing `ci / security` and `workspace-test` with: error: couldn't read /home/noah/.cargo/registry/src/<crate>/lib.rs: Permission denied (os error 13) and the rustix-0.38 equivalent (E0432 unresolved import `libc`/`libc_errno` originating in the `syscall` macro, which the rustix build.rs regenerates from src/ files — missing src/ → macro can't find libc crate → cascading errors). FIVE WHYS ───────── 1 `ci / security` fails: `cargo install cargo-audit --locked` hits EACCES reading `fnv-1.0.7/lib.rs`. 2 EACCES: the file is missing OR owned by root (docker container creates extractions as root on the bind-mounted host registry). 3 Concurrent writers: 16 self-hosted `intel-clean-room-*` runners bind-mount the SAME /home/noah/.cargo/registry — cargo extractions, the ci-reaper TTL sweep, and cross-container chown cycles all touch identical paths. 4 Shared by design: ci.yml:49 was authored for throughput — re-downloading crates per job is ~200MB, so the host registry was shared across all runners. Race class not modeled. 5 Precedent already exists: target/ hit the identical race under concurrent PRs (task #134) and was fixed by per-PR isolation on /mnt/nvme-raid0/targets/aprender-ci/<pr#>. The registry simply never got the same treatment. ROOT CAUSE ────────── Shared mutable bind mount + concurrent multi-runner write access ≈ guaranteed race. The existing band-aid (PR #1025 "self-heal cargo registry cache", cargo-ok + Cargo.toml marker check) only runs inside `ci / security` and itself races with concurrent jobs that have already passed the cache check. FIX (this PR) ───────────── Mirror the target-dir pattern from ci.yml:55 for the cargo registry. Each PR (or branch) gets its own registry under /mnt/nvme-raid0/cargo-ci/registry/<pr#>. Docker auto-creates the leaf dir on first mount; the ci-reaper TTL sweep (ci-reaper.sh:308) needs a companion infra update (paiml/infra#77) to include the new /mnt path. - Removes: /home/noah/.cargo/registry:/usr/local/cargo/registry - Adds: /mnt/nvme-raid0/cargo-ci/registry/${pr#|ref_name}:/usr/local/cargo/registry Cost: ~200MB per PR on first run (cargo re-downloads crates). Same cost profile as the target/ isolation fix, which the fleet already absorbed. Once cargo-ci/registry/<pr#> warms on run 1, run 2+ hit the cache. FOLLOW-UP ───────── paiml/infra#77 tracks: - forjar recipe to pre-create /mnt/nvme-raid0/cargo-ci/ owner=noah:noah - reaper extension: GC /mnt/nvme-raid0/cargo-ci/registry/<pr#>/src with same TTL - once infra lands, drop the ANDON comment above 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: trigger fresh run to pick up paiml/.github#32 security-job CARGO_HOME fix --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ce-v1 multi-bind FALSIFY-SHIP-009 (AC-SHIP1-009 "MODEL-1 teacher license + data provenance recorded in model.apr metadata") attains PARTIAL_ALGORITHM_LEVEL by attaching a second binding to the same C-APR-PROVENANCE contract that already discharges MODEL-2's AC-SHIP2-012. The AprV2Metadata + serde-JSON decision rule is model-agnostic, so one contract cleanly carries both discharges. Changes: - contracts/apr-provenance-v1.yaml v1.0.0 → v1.1.0 (stays ACTIVE): new GATE-APR-PROV-004 block binds AC-SHIP1-009 / FALSIFY-SHIP-009 at PARTIAL_ALGORITHM_LEVEL with ship_blocking=true; full discharge blocks on teacher .apr republish populating license, data_source, data_license as named fields (PMAT-686 fixture-swap). - crates/aprender-core/src/format/tests/provenance_tests.rs: - falsify_ship_009_apr_metadata_applies_to_model_1_teacher — teacher-representative round-trip (license="apache-2.0", data_source="qwen2.5-coder-7b-instruct", data_license="apache-2.0"). - falsify_ship_009_gate_apr_prov_004_has_partial_discharge_marker — include_str! YAML-binding assertion that the new gate has the correct binds_to / falsification_id / discharge_status / flags. - crates/aprender-core/Cargo.toml: add serde_yaml to [dev-dependencies] (needed for the YAML-binding test). - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0: new v2.24.0 amendment block documenting the first MODEL-1 PARTIAL and first multi-model multi-bind on one contract. Pattern extensions: - First MODEL-1 PARTIAL (prior six targeted MODEL-2). - First multi-model multi-bind on ONE contract (prior PARTIALs each had a dedicated contract). - Sixth falsification of the "exhausted" verdict: SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016 → SHIP-009 — sixth is cross-model, strictly more surprising than the prior five. All 5 provenance tests green (3 SHIP-022 + 2 SHIP-009). Status after v2.24.0: - MODEL-2: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%) - MODEL-1: 9/10 DISCHARGED (via SHIP-TWO-001-MODEL-1-TEACHER tag) + 1/10 PARTIAL (009). Will flip to fully ACTIVE when PMAT-686 republishes teacher.apr with provenance fields populated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…EVEL discharge (task #149) MODEL-2 (albor 370M Sovereign) gate #4 at PARTIAL: binds AC-SHIP2-007 ("apr run produces syntactically valid Python on 100 held-out prompts") to FALSIFY-SHIP-017 via new GATE-ARCH-370M-005 with `discharge_status: PARTIAL_ALGORITHM_LEVEL`. The decision rule — "≤ 1 SyntaxError tolerated out of 100, ≥ 2 is a ship-blocker" — is a pure integer threshold and is proven correct at `cargo test` time today. Full discharge (100-prompt `apr run` harness against a trained 370M .apr) remains PENDING on pretraining compute-dispatch (AC-SHIP2-003/004) — fixture swap is data-only, no harness rewrite required. Changes: - crates/aprender-train/src/models/llama_370m.rs: - Adds `AC_SHIP2_007_HELDOUT_PROMPT_COUNT` (=100) + `AC_SHIP2_007_MAX_TOLERATED_SYNTAX_ERRORS` (=1) consts mirroring the spec §6 harness size and §8.3 FALSIFY-SHIP-017 tolerance. - Adds `verdict_from_syntax_error_count(errors) -> Ship017Verdict` const fn — the pure threshold. - Adds `falsify_ship_017_syntax_error_count_threshold_logic` — Pass boundary (0,1), Fail boundary (2,50,100), monotonicity sweep ∈ [0,100], and provenance pinning. - Adds `falsify_ship_017_gate_arch_370m_005_has_partial_discharge_marker` — binds sovereign contract YAML shape (falsification_id, binds_to, discharge_status, evidence_discharged_by, full_discharge_blocks_on, ship_blocking) to Rust tests via include_str!. - contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE): adds GATE-ARCH-370M-005. - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.25.0 with amendment block: counter-example survey continues to find new PARTIAL levers after two prior "exhausted" verdicts (SHIP-015 → SHIP-019 → SHIP-017). New status: 3/12 ACTIVE + 4/12 PARTIAL = 7/12 touched (58.3%). Verification: - cargo test -p aprender-train --lib llama_370m → 12/12 pass (including both new falsify_ship_017_* tests) - cargo clippy -p aprender-train --lib -- -D warnings → clean - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → Contract is valid Closes task #149. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Binds AC-SHIP2-010 (inference decode throughput ≥ 100 tok/s on RTX 4090) to a new GATE-ARCH-370M-006 in the sovereign contract via a pure f32 threshold fn + two unit tests. The compute-heavy half (`apr bench` on a real trained 370M .apr) is deferred to AC-SHIP2-003/004 compute-dispatch; the decision rule itself is proven today. Changes: - crates/aprender-train/src/models/llama_370m.rs: * AC_SHIP2_010_MIN_DECODE_TPS_RTX4090 = 100.0 (const floor) * Ship020Verdict { Pass, Fail } * verdict_from_decode_tps(f32) -> Ship020Verdict (fn, non-finite → Fail) * falsify_ship_020_decode_tps_threshold_logic (5 invariants: Pass boundary, Fail boundary at one f32 ULP, monotonicity in both directions, conservative Fail for NaN/±∞, provenance pinning that the const stays = 100.0) * falsify_ship_020_gate_arch_370m_006_has_partial_discharge_marker (contract parses + advertises PARTIAL_ALGORITHM_LEVEL + evidence_discharged_by populated + full_discharge_blocks_on documented + ship_blocking:true) - contracts/model-families/llama-370m-sovereign-v1.yaml: * v1.5.0 → v1.6.0, stays ACTIVE * New GATE-ARCH-370M-006 binding AC-SHIP2-010 ↔ FALSIFY-SHIP-020 with discharge_status: PARTIAL_ALGORITHM_LEVEL - docs/specifications/aprender-train/ship-two-models-spec.md: * v2.23.0 → v2.26.0 with amendment block * MODEL-2 ship-gate status updated: 3/12 ACTIVE + 5/12 PARTIAL = 8/12 touched (66.7%) - crates/aprender-train/src/train/device.rs: * 2 pre-existing fmt fixes (6 lines of whitespace) — restores `cargo fmt -p aprender-train --check` green. Pre-existing on origin/main; kept in this PR under Toyota Way "all defects are your defects" rule. Pattern lesson: v2.22.0 declared MODEL-2 non-compute PARTIAL levers "exhausted" — re-running the counter-example survey has now falsified that verdict three times (SHIP-019 → SHIP-017 → SHIP-020). When a SHIP gate names a threshold / tolerance / ratio / cut-off and the compute-heavy harness is separable from the decision function, the threshold fn can land today at unit-test time — even when the full end-to-end harness is blocked on compute. Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004 compute-dispatch + three independent `apr bench --tokens 128 --json` medians on RTX 4090 host. Fixture-swap only — no decision-rule rewrite. Verification: - cargo test -p aprender-train --lib models::llama_370m → 11/11 PASS - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → "Contract is valid. 0 error(s), 0 warning(s)." - cargo clippy -p aprender-train --lib → green - cargo fmt -p aprender-train --check → green Task #150. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AC-SHIP2-008 / FALSIFY-SHIP-018 bound via new GATE-ARCH-370M-007 at PARTIAL_ALGORITHM_LEVEL. Pure two-number threshold fn `verdict_from_pass_at_1(correct, total, threshold_pct)` + const `AC_SHIP2_008_MIN_HUMANEVAL_PASS_AT_1_PCT = 30.0` in crates/aprender-train/src/models/llama_370m.rs — proves the spec's 'HumanEval pass@1 ≥ 30.0%' decision rule at `cargo test` time, independent of a trained artifact. Two unit tests prove: - boundary (f32-exact 50/100 = 50.0% with ±ULP shift showing `>=` is inclusive; 49/164 and 29/100 fail the 30.0 floor) - monotonicity (correct sweep 0..=164 at total=164 never flips Pass → Fail) - div-safety (total=0 fails closed) + sanity (correct>total fails) - non-finite threshold guard (NaN / ±∞ all Fail) - provenance pin (const stays = 30.0) - YAML marker (GATE-ARCH-370M-007 carries PARTIAL_ALGORITHM_LEVEL, binds AC-SHIP2-008, cites FALSIFY-SHIP-018, ship_blocking:true) Full discharge blocks on real 370M .apr (AC-SHIP2-003/004 compute) + three seed=0 `apr eval --benchmark humaneval --json` median pass@1 values fed into the verdict fn — all three must Pass. Fixture-swap only; no harness rewrite. 6th PARTIAL for MODEL-2 (after SHIP-012/015/017/019/020). Spec v2.22.0's 'exhausted' verdict now falsified 4×. Remaining 5th-PARTIAL candidate: SHIP-016 (`apr qa` 8-of-8 aggregate — not a single threshold). SHIP-013/014 genuinely need real compute. Contract: llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE). Spec: ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block). Also: 6-line pre-existing fmt fix in train/device.rs under Toyota Way "all defects are your defects" (same pattern as PR #1005). Status: MODEL-2 ship-gates 3/12 ACTIVE + 6/12 PARTIAL = 9/12 touched (75.0%). Remaining 3 (003/004/006) all need real 370M compute. Tests: cargo test -p aprender-train --lib models::llama_370m → 11/11 pass. `pv validate contracts/model-families/llama-370m-sovereign-v1.yaml` → Contract is valid. cargo fmt -p aprender-train --check → clean. cargo clippy -p aprender-train --lib -- -D warnings → clean. Refs: SHIP-TWO-001, task #151, FALSIFY-SHIP-018. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ND verdict fn Wires GATE-ARCH-370M-008 (AC-SHIP2-006) to a pure verdict_from_qa_gates(&[bool]) -> Ship016Verdict aggregate-AND fn in aprender-train/src/models/llama_370m.rs, proven today by exhaustive 2^8 = 256-combination sweep + single-gate-flip falsifiability + monotonicity + 3 contract-drift guards (slice length 0/7/9/16 → Fail even when all-true). Discharge marker: PARTIAL_ALGORITHM_LEVEL. Pattern note: SHIP-016 is the first aggregate-AND shape — SHIP-017/018/020 were single-threshold shapes. The proof pattern now covers two distinct decision-rule shapes, confirming decision-rule/compute-harness separation is a reusable pattern, not a one-off. **5th PARTIAL after "exhausted" verdict falsified 4× already** (SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016). **MODEL-2 ship-gate coverage: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%).** Remaining 2 truly compute-blocked (003 CE ≤ 2.2, 004 ≤21-day wall-clock) have no fixture-swap trick. Changes: - contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (GATE-ARCH-370M-008 block added; stays ACTIVE) - crates/aprender-train/src/models/llama_370m.rs: + AC_SHIP2_006_REQUIRED_QA_GATE_COUNT = 8 const + Ship016Verdict enum + verdict_from_qa_gates(&[bool]) pure fn with aggregate-AND + falsify_ship_016_apr_qa_aggregate_and_logic test (2^8 sweep + single-gate-flip + monotonicity + 3 contract-drift guards) + falsify_ship_016_gate_arch_370m_008_has_partial_discharge_marker test (YAML binding: binds_to AC-SHIP2-006, falsification_id FALSIFY-SHIP-016, discharge_status PARTIAL_ALGORITHM_LEVEL) - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block documenting 5th PARTIAL, first aggregate-AND shape) - crates/aprender-train/src/train/device.rs: pre-existing fmt fixes bundled per Toyota Way "all defects are your defects" Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004 compute-dispatch + 8-gate apr qa harness invocation with exit 0 → feed the 8 gate-result booleans into verdict_from_qa_gates and require Ship016Verdict::Pass. Fixture-swap only — no harness rewrite. Refs #152 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…X 4090 training-budget PARTIAL discharges (12/12 MODEL-2 complete) Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two untouched MODEL-2 AC rows: AC-SHIP2-003 (val CE ≤ 2.2) and AC-SHIP2-004 (training ≤ 21 days on RTX 4090). First bundled double-discharge on the SHIP-TWO-001 surface. **FALSIFY-SHIP-013 / AC-SHIP2-003 / GATE-ARCH-370M-013** — val CE floor - `AC_SHIP2_003_MAX_VAL_CROSS_ENTROPY_LOSS: f32 = 2.2` - `Ship013Verdict { Pass, Fail }` - `const fn verdict_from_val_ce_loss(f32) -> Ship013Verdict` — Pass iff measured CE is finite AND non-negative AND ≤ 2.2. Negative values Fail conservatively because cross-entropy H(p,q) ≥ 0 by definition. - `falsify_ship_013_val_ce_loss_threshold_logic` — 7-section mutation survey: 1. Exact boundary 2.2 → Pass (inclusive floor, not strict <) 2. ULP asymmetry — above 2.2 → Fail, below 2.2 → Pass 3. Clear Pass band {0.0, 0.5, 1.0, 2.0, 2.199} 4. Clear Fail band {2.201, 3.0, 10.0, f32::MAX} 5. Non-finite {NaN, +∞, -∞} → Fail conservatively 6. Negative-CE domain-violation Fail ({-0.001, -1.0, -∞}) 7. Provenance pin: const stays = 2.2_f32 **FALSIFY-SHIP-014 / AC-SHIP2-004 / GATE-ARCH-370M-014** — training budget - `AC_SHIP2_004_MAX_TRAINING_DURATION_DAYS: u32 = 21` - `Ship014Verdict { Pass, Fail }` - `const fn verdict_from_training_duration_days(u32) -> Ship014Verdict` — Pass iff measured ≤ 21. u32 auto-rules out negatives and non-finites. - `falsify_ship_014_training_duration_threshold_logic` — 6-section mutation survey: 1. Exact boundary 21 → Pass (inclusive ceiling) 2. Adjacent: 20 → Pass, 22 → Fail 3. Clear Pass band {0, 1, 7, 14, 20, 21} 4. Clear Fail band {22, 30, 100, u32::MAX} 5. Monotonicity sweep 0..=42 — flips exactly once at 21→22 6. Provenance pin: const stays = 21_u32 **Changes:** - crates/aprender-train/src/models/llama_370m.rs: * 2 new public const floors + 2 verdict enums + 2 pure `const fn` verdict fns * 2 new mutation-survey unit tests (inside existing tests mod) - contracts/model-families/llama-370m-sovereign-v1.yaml: * v1.9.0 → v1.10.0, stays ACTIVE * New GATE-ARCH-370M-013 binding AC-SHIP2-003 ↔ FALSIFY-SHIP-013 with discharge_status: PARTIAL_ALGORITHM_LEVEL * New GATE-ARCH-370M-014 binding AC-SHIP2-004 ↔ FALSIFY-SHIP-014 with discharge_status: PARTIAL_ALGORITHM_LEVEL * v1.10.0 changelog entry at top of changelog block - docs/specifications/aprender-train/ship-two-models-spec.md: * Version 2.37.0 → 2.38.0 * v2.38.0 Date-field entry describing the bundle * AC-SHIP2-003 and AC-SHIP2-004 rows tagged `**(PARTIAL_ALGORITHM_LEVEL v2.38.0)**` **Verification:** - `cargo fmt -p aprender-train --check` — clean - `cargo test -p aprender-train --lib ship_013` → 1 passed - `cargo test -p aprender-train --lib ship_014` → 1 passed - `cargo test -p aprender-train --lib llama_370m` → 20 passed - `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/model-families/llama-370m-sovereign-v1.yaml` → 0 errors **Full discharge still blocks on:** - SHIP-013: live `apr pretrain --mode from-scratch --validate` loop on RTX 4090 with `--features cuda` producing a real MODEL-2 val CE. - SHIP-014: real wall-clock measurement of a MODEL-2 pretraining run on RTX 4090 from first `apr pretrain` dispatch to final checkpoint write. **Status shift:** - MODEL-2 coverage: 8/12 → **12/12 PARTIAL_ALGORITHM_LEVEL touched** (complete) - Across both models: 23 PARTIAL + 3 DISCHARGED Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…us for single-source-of-truth Back-annotates discharge status already documented in prior v2.20/v2.21/v2.22 amendments into the §5.2 MODEL-2 acceptance-criteria table, and adds PARTIAL_ALGORITHM_LEVEL + contract + `cargo test` cross-references to §7.1/§7.2 falsification tables — so the three tables together form a true single source of truth for SHIP-TWO-001 algorithm-level ship-gate coverage. Changes: - §5.2 MODEL-2 table: 6 new annotations (3 DISCHARGED + 3 PARTIAL_ALGORITHM_LEVEL) * AC-SHIP2-001 FALSIFY-SHIP-011 DISCHARGED v2.21.0 (evidence 338c6eb) * AC-SHIP2-002 FALSIFY-SHIP-012 PARTIAL v2.21.0 (evidence 2e8b8b8) * AC-SHIP2-005 FALSIFY-SHIP-015 PARTIAL v2.21.0 (evidence bfb8831) * AC-SHIP2-009 FALSIFY-SHIP-019 PARTIAL v2.22.0 (evidence 846cc1d) * AC-SHIP2-011 FALSIFY-SHIP-021 DISCHARGED v2.20.0 (evidence 0b8ca8c) * AC-SHIP2-012 FALSIFY-SHIP-022 DISCHARGED v2.20.0 (evidence 8f0607d) - §4.2 MODEL-1 table drift fix: AC-SHIP1-007 v2.27.0 → v2.29.0 (correct SHIP-007 amendment ref) - §7.1 MODEL-1 Falsification: 6 new PARTIAL cross-references (SHIP-001/003/004/007/009/010) - §7.2 MODEL-2 Falsification: 12 new annotations (2 DISCHARGED + 10 PARTIAL) covering SHIP-011..022 - Version bump 2.38.0 → 2.39.0 + v2.39.0 changelog line appended to Date field Pure documentation hygiene; no Rust, no contracts, no tests, no meaning changes. Task #119. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….2pp + adversarial-suite 0-tolerance PARTIAL discharges Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two MODEL-1 §7.1 stability tests: SHIP-023 (cross-run score drift ≤ 1.2 pp) + SHIP-024 (adversarial suite 0-tolerance across ≥ 50 prompts). Files: ship_023.rs + ship_024.rs + mod.rs + qwen2-e2e-verification-v1.yaml v1.7.0 + ship-two-models-spec.md v2.40.0. Tests: 1 unit + 1 doc-test each; pv validate → 0 errors; completes MODEL-1 §7.1 at 12/12 algorithmically bound; 25 PARTIAL + 3 DISCHARGED aggregate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…Y-GPUTRAIN PARTIAL discharges Binds the remaining 5 GPU-training-backend invariants (003..007) at PARTIAL_ALGORITHM_LEVEL via pure Rust verdict functions, each accompanied by a 6-8 section mutation survey. GPUTRAIN-001 (grammar) and GPUTRAIN-002 (no-silent-fallback) are already bound in `crates/aprender-train/src/train/device.rs` with 17 passing tests. New modules (5): - crates/aprender-train/src/train/gputrain_003.rs — nvidia-smi residency proof: parse_nvidia_smi_compute_apps + verdict_from_residency bound to 5-s poll window + 1-MiB floor; 7-section survey (happy path / zero-mem / other-pid / empty / multi-process / malformed / u32::MAX-u64::MAX boundary / provenance pin) - crates/aprender-train/src/train/gputrain_004.rs — CPU-fallback- preserved dispatch invariant via verdict_from_dispatch_label over disjoint CPU/CUDA label sets; 7-section survey (cpu→cpu / cuda→cuda / cpu→cuda silent-promotion Fail / cuda→cpu task-#126 silent-fallback Fail / unknown / empty / case-sensitivity) - crates/aprender-train/src/train/gputrain_005.rs — 500-ms step-time ceiling on RTX 4090 370M via const fn verdict_from_step_time_ms; 7-section survey mirroring SHIP-007/020 shape (inclusive boundary / ULP-above / Pass band / Fail band / non-finite / negative / provenance pin) - crates/aprender-train/src/train/gputrain_006.rs — same-device seed reproducibility at 1e-5 tolerance via verdict_from_loss_delta + aggregate verdict_from_loss_trajectories; 7-section survey (boundary / trajectory single-step-fail / length mismatch / empty / non-finite / negative tolerance / provenance pin) - crates/aprender-train/src/train/gputrain_007.rs — apr --version --json schema + field-shape invariants via verdict_from_version_json_keys + verdict_from_version_json_fields; 7-section survey (all-keys-present / each-key-missing / 3 valid (feature, runtime) combos Pass / FM-GPUTRAIN-STALE-BUILD Fail / boundary 16 Pass / 17 Fail / forward-compat extras / provenance pin) Contract (contracts/entrenar/gpu-training-backend-v1.yaml): v1.0.0 PROPOSED → v1.1.0 PROPOSED (stays PROPOSED until Phase 3 live evidence). Each of FALSIFY-GPUTRAIN-003..007 now carries discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by listing the Rust symbols, full_discharge_blocks_on describing the live lambda-labs harness, and 6 counter_example_classes. Spec (docs/specifications/aprender-train/ship-two-models-spec.md): v2.40.0 → v2.41.0; §14.5 table updated to mark the algorithm-level Phase-2 row DONE and leave the live-wire Phase-2 row pending; across both models: 30 PARTIAL + 3 DISCHARGED. Validation gates (all green): - cargo fmt --all --check — clean - cargo test -p aprender-train --lib gputrain_003 → 1/1 pass - cargo test -p aprender-train --lib gputrain_004 → 1/1 pass - cargo test -p aprender-train --lib gputrain_005 → 1/1 pass - cargo test -p aprender-train --lib gputrain_006 → 1/1 pass - cargo test -p aprender-train --lib gputrain_007 → 1/1 pass - cargo test -p aprender-train --lib train::device → 17/17 pass (regression clean) - pv validate contracts/entrenar/gpu-training-backend-v1.yaml → 0 errors, 0 warnings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…gorithm-level bindings Algorithmically binds the 6 bindable compound ship gates from §6 of SHIP-TWO-001. Gates 007-012 are CI/lint meta-policy (enforced by .clippy.toml, .pmat-gates.toml, and CI workflows) and are intentionally out of scope. New contract: contracts/compound-ship-gates-v1.yaml v1.0.0 PROPOSED (metadata.kind: pattern; 6 falsification_tests each PARTIAL_ALGORITHM_LEVEL). New Rust modules (6, all in crates/aprender-core/src/format/): * gate_ship_001.rs — MODEL-1 aggregate-AND over 10 AC-SHIP1-* bools (AC_GATE_SHIP_001_MODEL_1_AC_COUNT = 10; verdict_from_model1_ac_aggregate; 6-section survey incl. 2^10=1024 exhaustive bitmask proof) * gate_ship_002.rs — MODEL-2 aggregate-AND over 12 AC-SHIP2-* bools (AC_GATE_SHIP_002_MODEL_2_AC_COUNT = 12; verdict_from_model2_ac_aggregate; 6-section survey incl. 2^12=4096 exhaustive bitmask proof) * gate_ship_003.rs — apr qa Golden Output byte-identity across quantize round-trip (verdict_from_golden_output_diff; 6-section survey with conservative-Fail on empty input — SKIPPED Golden Output = no regression proof) * gate_ship_004.rs — HumanEval bitwise-identical determinism on two seed=0 runs (verdict_from_identical_humaneval_scores uses f32::to_bits() equality — STRICTLY STRICTER than FALSIFY-SHIP-023's 1.2 pp drift tolerance; 7-section survey) * gate_ship_005.rs — License metadata byte-equal + non-empty + ASCII-printable (AC_GATE_SHIP_005_REQUIRED_LICENSE_FIELD = "license"; verdict_from_license_metadata; 6-section survey incl. SPDX case-sensitivity guard) * gate_ship_006.rs — GGUF round-trip first-token probability delta (AC_GATE_SHIP_006_MAX_FIRST_TOKEN_DELTA = 1e-3; const fn verdict_from_first_token_probability_delta; symmetric via .abs(); 7-section survey) Test counts: cargo test -p aprender-core --lib format::gate_ship → 6/6 pass cargo test -p aprender-core --doc format::gate_ship → 6/6 pass Contract validation: pv validate contracts/compound-ship-gates-v1.yaml → 0 errors, 0 warnings Spec update: v2.41.0 → v2.42.0; §6 Compound Ship Gates table annotates GATE-SHIP-001..006 with (PARTIAL_ALGORITHM_LEVEL v2.42.0) markers. Across both models: 30 PARTIAL + 3 DISCHARGED → 36 PARTIAL + 3 DISCHARGED. Full discharge of each gate blocks on the live compound-gate harness (all 10 per-AC MODEL-1 checks for GATE-SHIP-001; all 12 per-AC MODEL-2 checks for GATE-SHIP-002; apr qa --golden-output on pre+post quantize checkpoints for GATE-SHIP-003; two consecutive apr eval --seed 0 runs for GATE-SHIP-004; apr inspect .metadata.license vs upstream HF card for GATE-SHIP-005; apr run --emit-logprobs vs llama-cli --logits-all for GATE-SHIP-006). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rithm-level bindings (12/12 §6 total) Task #123: completes §6 Compound Ship Gates coverage by algorithmically binding the 6 merge-gate meta-policy rows (GATE-SHIP-007..012) at PARTIAL_ALGORITHM_LEVEL. Prior v2.42.0 bundle covered 001..006 (ship- blocking); this release adds the remaining 6 (merge-gate) via new sibling modules in `crates/aprender-core/src/format/gate_ship_0XX.rs`: - GATE-SHIP-007 (`.unwrap()` count): const fn `verdict_from_unwrap_count` bound to `AC_GATE_SHIP_007_MAX_TOLERATED_UNWRAP_COUNT = 0` via zero- tolerance threshold + 5-section survey. - GATE-SHIP-008 (contract density): `verdict_from_contract_density` bound to `AC_GATE_SHIP_008_MIN_CONTRACT_DENSITY_NEW_CODE = 1.0` via divide- by-zero-guarded ratio threshold + 7-section survey. - GATE-SHIP-009 (CI aggregate): const fn `verdict_from_ci_aggregate` over (fmt, clippy, test) + 8-section survey incl. exhaustive 2^3 = 8 bitmask proof + AND-symmetry pin. - GATE-SHIP-010 (advisory count): const fn `verdict_from_advisory_count` bound to `AC_GATE_SHIP_010_MAX_TOLERATED_ADVISORY_COUNT = 0` via zero- tolerance threshold + 5-section survey. - GATE-SHIP-011 (PMAT TDG): const fn `verdict_from_tdg_score` bound to `AC_GATE_SHIP_011_MIN_PMAT_TDG_SCORE = 90.0` via inclusive-floor threshold + 7-section survey. - GATE-SHIP-012 (line coverage): const fn `verdict_from_line_coverage_pct` bound to `AC_GATE_SHIP_012_MIN_LINE_COVERAGE_PCT = 95.0` via inclusive- floor threshold + 7-section survey. Contract: `contracts/compound-ship-gates-v1.yaml` v1.0.0 → v1.1.0 (stays PROPOSED) adds 6 new `falsification_tests` (FALSIFY-GATE-SHIP-007..012), 6 new equations, and 6 new proof_obligations. Spec: `docs/specifications/aprender-train/ship-two-models-spec.md` v2.42.0 → v2.43.0; §6 table rows 007..012 annotated PARTIAL_ALGORITHM_LEVEL v2.43.0. §6 Compound Ship Gates now 12/12 algorithmically bound. Full discharge still blocks on live CI tooling invocation (`cargo clippy -- -D warnings` / `pmat density` / branch-protection ci-gate / `cargo deny check advisories` / `pmat tdg` / `cargo llvm-cov report --json`). Validation: - cargo fmt --check — clean (aprender-core only) - cargo test -p aprender-core --lib format::gate_ship_0XX — 6/6 pass - cargo test -p aprender-core --doc format::gate_ship_0XX — 6/6 pass - pv validate contracts/compound-ship-gates-v1.yaml — 0 errors, 0 warnings Across both models: 42 PARTIAL + 3 DISCHARGED. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-04-24T11:42:34Z

Superseded by #1044 — 11-PR cascade collapsed into single squash-merge to avoid O(n²) rebase treadmill. Content identical; this branch's commit is in #1044.

noahgift enabled auto-merge (squash) April 24, 2026 08:09

noahgift mentioned this pull request Apr 24, 2026

fix(ci): per-PR cargo registry to break intel-runner concurrent-write race (ANDON paiml/infra#77) #1043

Merged

3 tasks

noahgift and others added 11 commits April 24, 2026 12:40

noahgift force-pushed the feat/gate-ship-007-012-bundle branch from a35aa1d to d553f4f Compare April 24, 2026 10:55

noahgift mentioned this pull request Apr 24, 2026

feat(ship-two-001): full algorithmic coverage bundle + README contract-backed rewrite (v2.30 → v2.43) #1044

Merged

noahgift closed this Apr 24, 2026

auto-merge was automatically disabled April 24, 2026 11:42
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gate-ship-007-012): complete §6 Compound Ship Gates coverage (12/12 algorithmically bound)#1042

feat(gate-ship-007-012): complete §6 Compound Ship Gates coverage (12/12 algorithmically bound)#1042
noahgift wants to merge 11 commits intomainfrom
feat/gate-ship-007-012-bundle

noahgift commented Apr 24, 2026

Uh oh!

noahgift commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 24, 2026

Summary

6 new verdict fns + constants

Contract + spec bumps

Full discharge blocks on live CI tooling

Stacking note

Test plan

Uh oh!

noahgift commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant