Skip to content

feat(gate-ship-007-012): complete §6 Compound Ship Gates coverage (12/12 algorithmically bound)#1042

Closed
noahgift wants to merge 11 commits intomainfrom
feat/gate-ship-007-012-bundle
Closed

feat(gate-ship-007-012): complete §6 Compound Ship Gates coverage (12/12 algorithmically bound)#1042
noahgift wants to merge 11 commits intomainfrom
feat/gate-ship-007-012-bundle

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Completes §6 Compound Ship Gates coverage by algorithmically binding the 6 merge-gate meta-policy rows (GATE-SHIP-007..012) at PARTIAL_ALGORITHM_LEVEL. Prior #1041 covered the 6 ship-blocking rows (001..006); this PR stacks on top and closes out the remaining half.

§6 Compound Ship Gates is now 12/12 algorithmically bound (6 ship-blocking + 6 merge-gate). Stack total: 42 PARTIAL + 3 DISCHARGED.

6 new verdict fns + constants

Gate Const Verdict fn
GATE-SHIP-007 AC_GATE_SHIP_007_MAX_TOLERATED_UNWRAP_COUNT: u32 = 0 const fn verdict_from_unwrap_count(u32)
GATE-SHIP-008 AC_GATE_SHIP_008_MIN_CONTRACT_DENSITY_NEW_CODE: f32 = 1.0 fn verdict_from_contract_density(u32, u32, f32)
GATE-SHIP-009 AC_GATE_SHIP_009_REQUIRED_CHECK_COUNT: usize = 3 const fn verdict_from_ci_aggregate(bool, bool, bool)
GATE-SHIP-010 AC_GATE_SHIP_010_MAX_TOLERATED_ADVISORY_COUNT: u32 = 0 const fn verdict_from_advisory_count(u32)
GATE-SHIP-011 AC_GATE_SHIP_011_MIN_PMAT_TDG_SCORE: f32 = 90.0 const fn verdict_from_tdg_score(f32, f32)
GATE-SHIP-012 AC_GATE_SHIP_012_MIN_LINE_COVERAGE_PCT: f32 = 95.0 const fn verdict_from_line_coverage_pct(f32, f32)

Each module ships a 5-8 section mutation survey, including an exhaustive 2^3 = 8 bitmask proof for GATE-SHIP-009 (CI aggregate-AND).

Contract + spec bumps

  • contracts/compound-ship-gates-v1.yaml v1.0.0 → v1.1.0 (stays PROPOSED) — adds 6 new falsification_tests, 6 equations, 6 proof_obligations.
  • docs/specifications/aprender-train/ship-two-models-spec.md v2.42.0 → v2.43.0 — §6 table rows 007..012 annotated PARTIAL_ALGORITHM_LEVEL v2.43.0.

Full discharge blocks on live CI tooling

Each gate's full discharge still blocks on running the external tool:

  • GATE-SHIP-007: cargo clippy --all-targets --all-features -- -D warnings
  • GATE-SHIP-008: pmat density --new-code --json
  • GATE-SHIP-009: branch-protection ci / gate + workspace-test
  • GATE-SHIP-010: cargo deny check advisories
  • GATE-SHIP-011: pmat tdg . --format json
  • GATE-SHIP-012: cargo llvm-cov report --json

Stacking note

This branch is stacked on feat/gate-ship-001-006-bundle (PR #1041). The commit on this branch applies cleanly on top of main once #1041 merges; no rebase required.

Test plan

  • cargo fmt --manifest-path crates/aprender-core/Cargo.toml --check — clean
  • cargo test -p aprender-core --lib format::gate_ship_007..012 — 6/6 pass
  • cargo test -p aprender-core --doc format::gate_ship_007..012 — 6/6 doctests pass
  • pv validate contracts/compound-ship-gates-v1.yaml — 0 errors, 0 warnings
  • cargo clippy -p aprender-core --lib --no-deps — clean

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) April 24, 2026 08:09
noahgift added a commit that referenced this pull request Apr 24, 2026
… race (ANDON paiml/infra#77) (#1043)

* fix(ci): per-PR cargo registry to break intel-runner concurrent-write race (paiml/infra#77)

ANDON 2026-04-24 — aprender 11-PR stack (#1031..#1042) all failing `ci / security`
and `workspace-test` with:

  error: couldn't read /home/noah/.cargo/registry/src/<crate>/lib.rs:
         Permission denied (os error 13)

and the rustix-0.38 equivalent (E0432 unresolved import `libc`/`libc_errno`
originating in the `syscall` macro, which the rustix build.rs regenerates from
src/ files — missing src/ → macro can't find libc crate → cascading errors).

FIVE WHYS
─────────
 1 `ci / security` fails: `cargo install cargo-audit --locked` hits EACCES
   reading `fnv-1.0.7/lib.rs`.
 2 EACCES: the file is missing OR owned by root (docker container creates
   extractions as root on the bind-mounted host registry).
 3 Concurrent writers: 16 self-hosted `intel-clean-room-*` runners bind-mount
   the SAME /home/noah/.cargo/registry — cargo extractions, the ci-reaper
   TTL sweep, and cross-container chown cycles all touch identical paths.
 4 Shared by design: ci.yml:49 was authored for throughput — re-downloading
   crates per job is ~200MB, so the host registry was shared across all
   runners. Race class not modeled.
 5 Precedent already exists: target/ hit the identical race under concurrent
   PRs (task #134) and was fixed by per-PR isolation on
   /mnt/nvme-raid0/targets/aprender-ci/<pr#>. The registry simply never got
   the same treatment.

ROOT CAUSE
──────────
Shared mutable bind mount + concurrent multi-runner write access ≈ guaranteed
race. The existing band-aid (PR #1025 "self-heal cargo registry cache",
cargo-ok + Cargo.toml marker check) only runs inside `ci / security` and
itself races with concurrent jobs that have already passed the cache check.

FIX (this PR)
─────────────
Mirror the target-dir pattern from ci.yml:55 for the cargo registry. Each
PR (or branch) gets its own registry under /mnt/nvme-raid0/cargo-ci/registry/<pr#>.
Docker auto-creates the leaf dir on first mount; the ci-reaper TTL sweep
(ci-reaper.sh:308) needs a companion infra update (paiml/infra#77) to include
the new /mnt path.

 - Removes: /home/noah/.cargo/registry:/usr/local/cargo/registry
 - Adds:    /mnt/nvme-raid0/cargo-ci/registry/${pr#|ref_name}:/usr/local/cargo/registry

Cost: ~200MB per PR on first run (cargo re-downloads crates). Same cost
profile as the target/ isolation fix, which the fleet already absorbed.
Once cargo-ci/registry/<pr#> warms on run 1, run 2+ hit the cache.

FOLLOW-UP
─────────
paiml/infra#77 tracks:
  - forjar recipe to pre-create /mnt/nvme-raid0/cargo-ci/ owner=noah:noah
  - reaper extension: GC /mnt/nvme-raid0/cargo-ci/registry/<pr#>/src with same TTL
  - once infra lands, drop the ANDON comment above

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: trigger fresh run to pick up paiml/.github#32 security-job CARGO_HOME fix

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift and others added 11 commits April 24, 2026 12:40
…ce-v1 multi-bind

FALSIFY-SHIP-009 (AC-SHIP1-009 "MODEL-1 teacher license + data
provenance recorded in model.apr metadata") attains
PARTIAL_ALGORITHM_LEVEL by attaching a second binding to the same
C-APR-PROVENANCE contract that already discharges MODEL-2's
AC-SHIP2-012. The AprV2Metadata + serde-JSON decision rule is
model-agnostic, so one contract cleanly carries both discharges.

Changes:
- contracts/apr-provenance-v1.yaml v1.0.0 → v1.1.0 (stays ACTIVE):
  new GATE-APR-PROV-004 block binds AC-SHIP1-009 / FALSIFY-SHIP-009
  at PARTIAL_ALGORITHM_LEVEL with ship_blocking=true; full discharge
  blocks on teacher .apr republish populating license, data_source,
  data_license as named fields (PMAT-686 fixture-swap).
- crates/aprender-core/src/format/tests/provenance_tests.rs:
  - falsify_ship_009_apr_metadata_applies_to_model_1_teacher —
    teacher-representative round-trip (license="apache-2.0",
    data_source="qwen2.5-coder-7b-instruct", data_license="apache-2.0").
  - falsify_ship_009_gate_apr_prov_004_has_partial_discharge_marker —
    include_str! YAML-binding assertion that the new gate has the
    correct binds_to / falsification_id / discharge_status / flags.
- crates/aprender-core/Cargo.toml: add serde_yaml to [dev-dependencies]
  (needed for the YAML-binding test).
- docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0
  → v2.24.0: new v2.24.0 amendment block documenting the first
  MODEL-1 PARTIAL and first multi-model multi-bind on one contract.

Pattern extensions:
- First MODEL-1 PARTIAL (prior six targeted MODEL-2).
- First multi-model multi-bind on ONE contract (prior PARTIALs each
  had a dedicated contract).
- Sixth falsification of the "exhausted" verdict: SHIP-019 →
  SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016 → SHIP-009 — sixth is
  cross-model, strictly more surprising than the prior five.

All 5 provenance tests green (3 SHIP-022 + 2 SHIP-009).

Status after v2.24.0:
- MODEL-2: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%)
- MODEL-1: 9/10 DISCHARGED (via SHIP-TWO-001-MODEL-1-TEACHER tag) +
  1/10 PARTIAL (009). Will flip to fully ACTIVE when PMAT-686
  republishes teacher.apr with provenance fields populated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…EVEL discharge (task #149)

MODEL-2 (albor 370M Sovereign) gate #4 at PARTIAL: binds
AC-SHIP2-007 ("apr run produces syntactically valid Python on 100
held-out prompts") to FALSIFY-SHIP-017 via new GATE-ARCH-370M-005
with `discharge_status: PARTIAL_ALGORITHM_LEVEL`.

The decision rule — "≤ 1 SyntaxError tolerated out of 100, ≥ 2 is
a ship-blocker" — is a pure integer threshold and is proven correct
at `cargo test` time today. Full discharge (100-prompt `apr run`
harness against a trained 370M .apr) remains PENDING on pretraining
compute-dispatch (AC-SHIP2-003/004) — fixture swap is data-only, no
harness rewrite required.

Changes:
- crates/aprender-train/src/models/llama_370m.rs:
  - Adds `AC_SHIP2_007_HELDOUT_PROMPT_COUNT` (=100) +
    `AC_SHIP2_007_MAX_TOLERATED_SYNTAX_ERRORS` (=1) consts mirroring
    the spec §6 harness size and §8.3 FALSIFY-SHIP-017 tolerance.
  - Adds `verdict_from_syntax_error_count(errors) -> Ship017Verdict`
    const fn — the pure threshold.
  - Adds `falsify_ship_017_syntax_error_count_threshold_logic` —
    Pass boundary (0,1), Fail boundary (2,50,100), monotonicity
    sweep ∈ [0,100], and provenance pinning.
  - Adds `falsify_ship_017_gate_arch_370m_005_has_partial_discharge_marker`
    — binds sovereign contract YAML shape (falsification_id,
    binds_to, discharge_status, evidence_discharged_by,
    full_discharge_blocks_on, ship_blocking) to Rust tests via
    include_str!.
- contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 →
  v1.6.0 (stays ACTIVE): adds GATE-ARCH-370M-005.
- docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0
  → v2.25.0 with amendment block: counter-example survey continues
  to find new PARTIAL levers after two prior "exhausted" verdicts
  (SHIP-015 → SHIP-019 → SHIP-017). New status: 3/12 ACTIVE + 4/12
  PARTIAL = 7/12 touched (58.3%).

Verification:
- cargo test -p aprender-train --lib llama_370m → 12/12 pass
  (including both new falsify_ship_017_* tests)
- cargo clippy -p aprender-train --lib -- -D warnings → clean
- pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
  → Contract is valid

Closes task #149.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Binds AC-SHIP2-010 (inference decode throughput ≥ 100 tok/s on RTX
4090) to a new GATE-ARCH-370M-006 in the sovereign contract via a pure
f32 threshold fn + two unit tests. The compute-heavy half (`apr bench`
on a real trained 370M .apr) is deferred to AC-SHIP2-003/004
compute-dispatch; the decision rule itself is proven today.

Changes:
- crates/aprender-train/src/models/llama_370m.rs:
  * AC_SHIP2_010_MIN_DECODE_TPS_RTX4090 = 100.0 (const floor)
  * Ship020Verdict { Pass, Fail }
  * verdict_from_decode_tps(f32) -> Ship020Verdict (fn, non-finite → Fail)
  * falsify_ship_020_decode_tps_threshold_logic (5 invariants:
    Pass boundary, Fail boundary at one f32 ULP, monotonicity in
    both directions, conservative Fail for NaN/±∞, provenance
    pinning that the const stays = 100.0)
  * falsify_ship_020_gate_arch_370m_006_has_partial_discharge_marker
    (contract parses + advertises PARTIAL_ALGORITHM_LEVEL +
    evidence_discharged_by populated + full_discharge_blocks_on
    documented + ship_blocking:true)

- contracts/model-families/llama-370m-sovereign-v1.yaml:
  * v1.5.0 → v1.6.0, stays ACTIVE
  * New GATE-ARCH-370M-006 binding AC-SHIP2-010 ↔ FALSIFY-SHIP-020
    with discharge_status: PARTIAL_ALGORITHM_LEVEL

- docs/specifications/aprender-train/ship-two-models-spec.md:
  * v2.23.0 → v2.26.0 with amendment block
  * MODEL-2 ship-gate status updated: 3/12 ACTIVE + 5/12 PARTIAL =
    8/12 touched (66.7%)

- crates/aprender-train/src/train/device.rs:
  * 2 pre-existing fmt fixes (6 lines of whitespace) — restores
    `cargo fmt -p aprender-train --check` green. Pre-existing on
    origin/main; kept in this PR under Toyota Way "all defects are
    your defects" rule.

Pattern lesson: v2.22.0 declared MODEL-2 non-compute PARTIAL levers
"exhausted" — re-running the counter-example survey has now falsified
that verdict three times (SHIP-019 → SHIP-017 → SHIP-020). When a
SHIP gate names a threshold / tolerance / ratio / cut-off and the
compute-heavy harness is separable from the decision function, the
threshold fn can land today at unit-test time — even when the full
end-to-end harness is blocked on compute.

Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004
compute-dispatch + three independent `apr bench --tokens 128 --json`
medians on RTX 4090 host. Fixture-swap only — no decision-rule rewrite.

Verification:
- cargo test -p aprender-train --lib models::llama_370m → 11/11 PASS
- pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
  → "Contract is valid. 0 error(s), 0 warning(s)."
- cargo clippy -p aprender-train --lib → green
- cargo fmt -p aprender-train --check → green

Task #150.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AC-SHIP2-008 / FALSIFY-SHIP-018 bound via new GATE-ARCH-370M-007 at
PARTIAL_ALGORITHM_LEVEL. Pure two-number threshold fn
`verdict_from_pass_at_1(correct, total, threshold_pct)` + const
`AC_SHIP2_008_MIN_HUMANEVAL_PASS_AT_1_PCT = 30.0` in
crates/aprender-train/src/models/llama_370m.rs — proves the spec's
'HumanEval pass@1 ≥ 30.0%' decision rule at `cargo test` time,
independent of a trained artifact. Two unit tests prove:

  - boundary (f32-exact 50/100 = 50.0% with ±ULP shift showing `>=`
    is inclusive; 49/164 and 29/100 fail the 30.0 floor)
  - monotonicity (correct sweep 0..=164 at total=164 never flips
    Pass → Fail)
  - div-safety (total=0 fails closed) + sanity (correct>total fails)
  - non-finite threshold guard (NaN / ±∞ all Fail)
  - provenance pin (const stays = 30.0)
  - YAML marker (GATE-ARCH-370M-007 carries PARTIAL_ALGORITHM_LEVEL,
    binds AC-SHIP2-008, cites FALSIFY-SHIP-018, ship_blocking:true)

Full discharge blocks on real 370M .apr (AC-SHIP2-003/004 compute)
+ three seed=0 `apr eval --benchmark humaneval --json` median
pass@1 values fed into the verdict fn — all three must Pass.
Fixture-swap only; no harness rewrite.

6th PARTIAL for MODEL-2 (after SHIP-012/015/017/019/020). Spec
v2.22.0's 'exhausted' verdict now falsified 4×. Remaining 5th-PARTIAL
candidate: SHIP-016 (`apr qa` 8-of-8 aggregate — not a single
threshold). SHIP-013/014 genuinely need real compute.

Contract: llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE).
Spec: ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block).
Also: 6-line pre-existing fmt fix in train/device.rs under Toyota
Way "all defects are your defects" (same pattern as PR #1005).

Status: MODEL-2 ship-gates 3/12 ACTIVE + 6/12 PARTIAL = 9/12 touched
(75.0%). Remaining 3 (003/004/006) all need real 370M compute.

Tests: cargo test -p aprender-train --lib models::llama_370m → 11/11
pass. `pv validate contracts/model-families/llama-370m-sovereign-v1.yaml`
→ Contract is valid. cargo fmt -p aprender-train --check → clean.
cargo clippy -p aprender-train --lib -- -D warnings → clean.

Refs: SHIP-TWO-001, task #151, FALSIFY-SHIP-018.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ND verdict fn

Wires GATE-ARCH-370M-008 (AC-SHIP2-006) to a pure
verdict_from_qa_gates(&[bool]) -> Ship016Verdict aggregate-AND fn in
aprender-train/src/models/llama_370m.rs, proven today by exhaustive
2^8 = 256-combination sweep + single-gate-flip falsifiability +
monotonicity + 3 contract-drift guards (slice length 0/7/9/16 → Fail
even when all-true). Discharge marker: PARTIAL_ALGORITHM_LEVEL.

Pattern note: SHIP-016 is the first aggregate-AND shape —
SHIP-017/018/020 were single-threshold shapes. The proof pattern now
covers two distinct decision-rule shapes, confirming
decision-rule/compute-harness separation is a reusable pattern, not a
one-off.

**5th PARTIAL after "exhausted" verdict falsified 4× already**
(SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016).

**MODEL-2 ship-gate coverage: 3/12 ACTIVE + 7/12 PARTIAL = 10/12
touched (83.3%).** Remaining 2 truly compute-blocked (003 CE ≤ 2.2,
004 ≤21-day wall-clock) have no fixture-swap trick.

Changes:
- contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0
  (GATE-ARCH-370M-008 block added; stays ACTIVE)
- crates/aprender-train/src/models/llama_370m.rs:
  + AC_SHIP2_006_REQUIRED_QA_GATE_COUNT = 8 const
  + Ship016Verdict enum
  + verdict_from_qa_gates(&[bool]) pure fn with aggregate-AND
  + falsify_ship_016_apr_qa_aggregate_and_logic test (2^8 sweep +
    single-gate-flip + monotonicity + 3 contract-drift guards)
  + falsify_ship_016_gate_arch_370m_008_has_partial_discharge_marker
    test (YAML binding: binds_to AC-SHIP2-006, falsification_id
    FALSIFY-SHIP-016, discharge_status PARTIAL_ALGORITHM_LEVEL)
- docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0
  → v2.24.0 (amendment block documenting 5th PARTIAL, first
  aggregate-AND shape)
- crates/aprender-train/src/train/device.rs: pre-existing fmt fixes
  bundled per Toyota Way "all defects are your defects"

Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004
compute-dispatch + 8-gate apr qa harness invocation with exit 0 →
feed the 8 gate-result booleans into verdict_from_qa_gates and
require Ship016Verdict::Pass. Fixture-swap only — no harness rewrite.

Refs #152

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…X 4090 training-budget PARTIAL discharges (12/12 MODEL-2 complete)

Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two untouched MODEL-2
AC rows: AC-SHIP2-003 (val CE ≤ 2.2) and AC-SHIP2-004 (training ≤ 21 days
on RTX 4090). First bundled double-discharge on the SHIP-TWO-001 surface.

**FALSIFY-SHIP-013 / AC-SHIP2-003 / GATE-ARCH-370M-013** — val CE floor

- `AC_SHIP2_003_MAX_VAL_CROSS_ENTROPY_LOSS: f32 = 2.2`
- `Ship013Verdict { Pass, Fail }`
- `const fn verdict_from_val_ce_loss(f32) -> Ship013Verdict` — Pass iff
  measured CE is finite AND non-negative AND ≤ 2.2. Negative values Fail
  conservatively because cross-entropy H(p,q) ≥ 0 by definition.
- `falsify_ship_013_val_ce_loss_threshold_logic` — 7-section mutation survey:
  1. Exact boundary 2.2 → Pass (inclusive floor, not strict <)
  2. ULP asymmetry — above 2.2 → Fail, below 2.2 → Pass
  3. Clear Pass band {0.0, 0.5, 1.0, 2.0, 2.199}
  4. Clear Fail band {2.201, 3.0, 10.0, f32::MAX}
  5. Non-finite {NaN, +∞, -∞} → Fail conservatively
  6. Negative-CE domain-violation Fail ({-0.001, -1.0, -∞})
  7. Provenance pin: const stays = 2.2_f32

**FALSIFY-SHIP-014 / AC-SHIP2-004 / GATE-ARCH-370M-014** — training budget

- `AC_SHIP2_004_MAX_TRAINING_DURATION_DAYS: u32 = 21`
- `Ship014Verdict { Pass, Fail }`
- `const fn verdict_from_training_duration_days(u32) -> Ship014Verdict` —
  Pass iff measured ≤ 21. u32 auto-rules out negatives and non-finites.
- `falsify_ship_014_training_duration_threshold_logic` — 6-section mutation survey:
  1. Exact boundary 21 → Pass (inclusive ceiling)
  2. Adjacent: 20 → Pass, 22 → Fail
  3. Clear Pass band {0, 1, 7, 14, 20, 21}
  4. Clear Fail band {22, 30, 100, u32::MAX}
  5. Monotonicity sweep 0..=42 — flips exactly once at 21→22
  6. Provenance pin: const stays = 21_u32

**Changes:**

- crates/aprender-train/src/models/llama_370m.rs:
  * 2 new public const floors + 2 verdict enums + 2 pure `const fn` verdict fns
  * 2 new mutation-survey unit tests (inside existing tests mod)

- contracts/model-families/llama-370m-sovereign-v1.yaml:
  * v1.9.0 → v1.10.0, stays ACTIVE
  * New GATE-ARCH-370M-013 binding AC-SHIP2-003 ↔ FALSIFY-SHIP-013 with
    discharge_status: PARTIAL_ALGORITHM_LEVEL
  * New GATE-ARCH-370M-014 binding AC-SHIP2-004 ↔ FALSIFY-SHIP-014 with
    discharge_status: PARTIAL_ALGORITHM_LEVEL
  * v1.10.0 changelog entry at top of changelog block

- docs/specifications/aprender-train/ship-two-models-spec.md:
  * Version 2.37.0 → 2.38.0
  * v2.38.0 Date-field entry describing the bundle
  * AC-SHIP2-003 and AC-SHIP2-004 rows tagged
    `**(PARTIAL_ALGORITHM_LEVEL v2.38.0)**`

**Verification:**

- `cargo fmt -p aprender-train --check` — clean
- `cargo test -p aprender-train --lib ship_013` → 1 passed
- `cargo test -p aprender-train --lib ship_014` → 1 passed
- `cargo test -p aprender-train --lib llama_370m` → 20 passed
- `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate
  contracts/model-families/llama-370m-sovereign-v1.yaml` → 0 errors

**Full discharge still blocks on:**

- SHIP-013: live `apr pretrain --mode from-scratch --validate` loop on
  RTX 4090 with `--features cuda` producing a real MODEL-2 val CE.
- SHIP-014: real wall-clock measurement of a MODEL-2 pretraining run on
  RTX 4090 from first `apr pretrain` dispatch to final checkpoint write.

**Status shift:**

- MODEL-2 coverage: 8/12 → **12/12 PARTIAL_ALGORITHM_LEVEL touched** (complete)
- Across both models: 23 PARTIAL + 3 DISCHARGED

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…us for single-source-of-truth

Back-annotates discharge status already documented in prior v2.20/v2.21/v2.22
amendments into the §5.2 MODEL-2 acceptance-criteria table, and adds
PARTIAL_ALGORITHM_LEVEL + contract + `cargo test` cross-references to
§7.1/§7.2 falsification tables — so the three tables together form a true
single source of truth for SHIP-TWO-001 algorithm-level ship-gate coverage.

Changes:
- §5.2 MODEL-2 table: 6 new annotations (3 DISCHARGED + 3 PARTIAL_ALGORITHM_LEVEL)
  * AC-SHIP2-001 FALSIFY-SHIP-011 DISCHARGED v2.21.0 (evidence 338c6eb)
  * AC-SHIP2-002 FALSIFY-SHIP-012 PARTIAL v2.21.0 (evidence 2e8b8b8)
  * AC-SHIP2-005 FALSIFY-SHIP-015 PARTIAL v2.21.0 (evidence bfb8831)
  * AC-SHIP2-009 FALSIFY-SHIP-019 PARTIAL v2.22.0 (evidence 846cc1d)
  * AC-SHIP2-011 FALSIFY-SHIP-021 DISCHARGED v2.20.0 (evidence 0b8ca8c)
  * AC-SHIP2-012 FALSIFY-SHIP-022 DISCHARGED v2.20.0 (evidence 8f0607d)
- §4.2 MODEL-1 table drift fix: AC-SHIP1-007 v2.27.0 → v2.29.0 (correct SHIP-007 amendment ref)
- §7.1 MODEL-1 Falsification: 6 new PARTIAL cross-references (SHIP-001/003/004/007/009/010)
- §7.2 MODEL-2 Falsification: 12 new annotations (2 DISCHARGED + 10 PARTIAL) covering SHIP-011..022
- Version bump 2.38.0 → 2.39.0 + v2.39.0 changelog line appended to Date field

Pure documentation hygiene; no Rust, no contracts, no tests, no meaning
changes. Task #119.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
….2pp + adversarial-suite 0-tolerance PARTIAL discharges

Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two MODEL-1 §7.1 stability
tests: SHIP-023 (cross-run score drift ≤ 1.2 pp) + SHIP-024 (adversarial suite
0-tolerance across ≥ 50 prompts). Files: ship_023.rs + ship_024.rs + mod.rs +
qwen2-e2e-verification-v1.yaml v1.7.0 + ship-two-models-spec.md v2.40.0.
Tests: 1 unit + 1 doc-test each; pv validate → 0 errors; completes MODEL-1
§7.1 at 12/12 algorithmically bound; 25 PARTIAL + 3 DISCHARGED aggregate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Y-GPUTRAIN PARTIAL discharges

Binds the remaining 5 GPU-training-backend invariants (003..007) at
PARTIAL_ALGORITHM_LEVEL via pure Rust verdict functions, each accompanied
by a 6-8 section mutation survey. GPUTRAIN-001 (grammar) and
GPUTRAIN-002 (no-silent-fallback) are already bound in
`crates/aprender-train/src/train/device.rs` with 17 passing tests.

New modules (5):
  - crates/aprender-train/src/train/gputrain_003.rs — nvidia-smi residency
    proof: parse_nvidia_smi_compute_apps + verdict_from_residency bound to
    5-s poll window + 1-MiB floor; 7-section survey (happy path /
    zero-mem / other-pid / empty / multi-process / malformed /
    u32::MAX-u64::MAX boundary / provenance pin)
  - crates/aprender-train/src/train/gputrain_004.rs — CPU-fallback-
    preserved dispatch invariant via verdict_from_dispatch_label over
    disjoint CPU/CUDA label sets; 7-section survey (cpu→cpu /
    cuda→cuda / cpu→cuda silent-promotion Fail / cuda→cpu task-#126
    silent-fallback Fail / unknown / empty / case-sensitivity)
  - crates/aprender-train/src/train/gputrain_005.rs — 500-ms step-time
    ceiling on RTX 4090 370M via const fn verdict_from_step_time_ms;
    7-section survey mirroring SHIP-007/020 shape (inclusive boundary /
    ULP-above / Pass band / Fail band / non-finite / negative /
    provenance pin)
  - crates/aprender-train/src/train/gputrain_006.rs — same-device seed
    reproducibility at 1e-5 tolerance via verdict_from_loss_delta +
    aggregate verdict_from_loss_trajectories; 7-section survey (boundary
    / trajectory single-step-fail / length mismatch / empty / non-finite
    / negative tolerance / provenance pin)
  - crates/aprender-train/src/train/gputrain_007.rs — apr --version
    --json schema + field-shape invariants via
    verdict_from_version_json_keys + verdict_from_version_json_fields;
    7-section survey (all-keys-present / each-key-missing / 3 valid
    (feature, runtime) combos Pass / FM-GPUTRAIN-STALE-BUILD Fail /
    boundary 16 Pass / 17 Fail / forward-compat extras / provenance pin)

Contract (contracts/entrenar/gpu-training-backend-v1.yaml):
  v1.0.0 PROPOSED → v1.1.0 PROPOSED (stays PROPOSED until Phase 3 live
  evidence). Each of FALSIFY-GPUTRAIN-003..007 now carries
  discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by
  listing the Rust symbols, full_discharge_blocks_on describing the
  live lambda-labs harness, and 6 counter_example_classes.

Spec (docs/specifications/aprender-train/ship-two-models-spec.md):
  v2.40.0 → v2.41.0; §14.5 table updated to mark the algorithm-level
  Phase-2 row DONE and leave the live-wire Phase-2 row pending; across
  both models: 30 PARTIAL + 3 DISCHARGED.

Validation gates (all green):
  - cargo fmt --all --check — clean
  - cargo test -p aprender-train --lib gputrain_003 → 1/1 pass
  - cargo test -p aprender-train --lib gputrain_004 → 1/1 pass
  - cargo test -p aprender-train --lib gputrain_005 → 1/1 pass
  - cargo test -p aprender-train --lib gputrain_006 → 1/1 pass
  - cargo test -p aprender-train --lib gputrain_007 → 1/1 pass
  - cargo test -p aprender-train --lib train::device → 17/17 pass
    (regression clean)
  - pv validate contracts/entrenar/gpu-training-backend-v1.yaml →
    0 errors, 0 warnings

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gorithm-level bindings

Algorithmically binds the 6 bindable compound ship gates from §6 of
SHIP-TWO-001. Gates 007-012 are CI/lint meta-policy (enforced by
.clippy.toml, .pmat-gates.toml, and CI workflows) and are intentionally
out of scope.

New contract: contracts/compound-ship-gates-v1.yaml v1.0.0 PROPOSED
(metadata.kind: pattern; 6 falsification_tests each PARTIAL_ALGORITHM_LEVEL).

New Rust modules (6, all in crates/aprender-core/src/format/):

  * gate_ship_001.rs — MODEL-1 aggregate-AND over 10 AC-SHIP1-* bools
    (AC_GATE_SHIP_001_MODEL_1_AC_COUNT = 10;
     verdict_from_model1_ac_aggregate; 6-section survey incl. 2^10=1024
     exhaustive bitmask proof)

  * gate_ship_002.rs — MODEL-2 aggregate-AND over 12 AC-SHIP2-* bools
    (AC_GATE_SHIP_002_MODEL_2_AC_COUNT = 12;
     verdict_from_model2_ac_aggregate; 6-section survey incl. 2^12=4096
     exhaustive bitmask proof)

  * gate_ship_003.rs — apr qa Golden Output byte-identity across quantize
    round-trip (verdict_from_golden_output_diff; 6-section survey with
    conservative-Fail on empty input — SKIPPED Golden Output = no
    regression proof)

  * gate_ship_004.rs — HumanEval bitwise-identical determinism on two
    seed=0 runs (verdict_from_identical_humaneval_scores uses
    f32::to_bits() equality — STRICTLY STRICTER than FALSIFY-SHIP-023's
    1.2 pp drift tolerance; 7-section survey)

  * gate_ship_005.rs — License metadata byte-equal + non-empty +
    ASCII-printable (AC_GATE_SHIP_005_REQUIRED_LICENSE_FIELD = "license";
    verdict_from_license_metadata; 6-section survey incl. SPDX
    case-sensitivity guard)

  * gate_ship_006.rs — GGUF round-trip first-token probability delta
    (AC_GATE_SHIP_006_MAX_FIRST_TOKEN_DELTA = 1e-3;
     const fn verdict_from_first_token_probability_delta; symmetric via
     .abs(); 7-section survey)

Test counts:
  cargo test -p aprender-core --lib format::gate_ship → 6/6 pass
  cargo test -p aprender-core --doc format::gate_ship → 6/6 pass

Contract validation:
  pv validate contracts/compound-ship-gates-v1.yaml → 0 errors, 0 warnings

Spec update: v2.41.0 → v2.42.0; §6 Compound Ship Gates table annotates
GATE-SHIP-001..006 with (PARTIAL_ALGORITHM_LEVEL v2.42.0) markers.

Across both models: 30 PARTIAL + 3 DISCHARGED → 36 PARTIAL + 3 DISCHARGED.

Full discharge of each gate blocks on the live compound-gate harness
(all 10 per-AC MODEL-1 checks for GATE-SHIP-001; all 12 per-AC MODEL-2
checks for GATE-SHIP-002; apr qa --golden-output on pre+post quantize
checkpoints for GATE-SHIP-003; two consecutive apr eval --seed 0 runs
for GATE-SHIP-004; apr inspect .metadata.license vs upstream HF card
for GATE-SHIP-005; apr run --emit-logprobs vs llama-cli --logits-all
for GATE-SHIP-006).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rithm-level bindings (12/12 §6 total)

Task #123: completes §6 Compound Ship Gates coverage by algorithmically
binding the 6 merge-gate meta-policy rows (GATE-SHIP-007..012) at
PARTIAL_ALGORITHM_LEVEL. Prior v2.42.0 bundle covered 001..006 (ship-
blocking); this release adds the remaining 6 (merge-gate) via new
sibling modules in `crates/aprender-core/src/format/gate_ship_0XX.rs`:

- GATE-SHIP-007 (`.unwrap()` count): const fn `verdict_from_unwrap_count`
  bound to `AC_GATE_SHIP_007_MAX_TOLERATED_UNWRAP_COUNT = 0` via zero-
  tolerance threshold + 5-section survey.
- GATE-SHIP-008 (contract density): `verdict_from_contract_density` bound
  to `AC_GATE_SHIP_008_MIN_CONTRACT_DENSITY_NEW_CODE = 1.0` via divide-
  by-zero-guarded ratio threshold + 7-section survey.
- GATE-SHIP-009 (CI aggregate): const fn `verdict_from_ci_aggregate` over
  (fmt, clippy, test) + 8-section survey incl. exhaustive 2^3 = 8 bitmask
  proof + AND-symmetry pin.
- GATE-SHIP-010 (advisory count): const fn `verdict_from_advisory_count`
  bound to `AC_GATE_SHIP_010_MAX_TOLERATED_ADVISORY_COUNT = 0` via zero-
  tolerance threshold + 5-section survey.
- GATE-SHIP-011 (PMAT TDG): const fn `verdict_from_tdg_score` bound to
  `AC_GATE_SHIP_011_MIN_PMAT_TDG_SCORE = 90.0` via inclusive-floor
  threshold + 7-section survey.
- GATE-SHIP-012 (line coverage): const fn `verdict_from_line_coverage_pct`
  bound to `AC_GATE_SHIP_012_MIN_LINE_COVERAGE_PCT = 95.0` via inclusive-
  floor threshold + 7-section survey.

Contract: `contracts/compound-ship-gates-v1.yaml` v1.0.0 → v1.1.0 (stays
PROPOSED) adds 6 new `falsification_tests` (FALSIFY-GATE-SHIP-007..012),
6 new equations, and 6 new proof_obligations.

Spec: `docs/specifications/aprender-train/ship-two-models-spec.md`
v2.42.0 → v2.43.0; §6 table rows 007..012 annotated PARTIAL_ALGORITHM_LEVEL
v2.43.0. §6 Compound Ship Gates now 12/12 algorithmically bound.

Full discharge still blocks on live CI tooling invocation (`cargo clippy
-- -D warnings` / `pmat density` / branch-protection ci-gate / `cargo
deny check advisories` / `pmat tdg` / `cargo llvm-cov report --json`).

Validation:
  - cargo fmt --check — clean (aprender-core only)
  - cargo test -p aprender-core --lib format::gate_ship_0XX — 6/6 pass
  - cargo test -p aprender-core --doc format::gate_ship_0XX — 6/6 pass
  - pv validate contracts/compound-ship-gates-v1.yaml — 0 errors, 0 warnings

Across both models: 42 PARTIAL + 3 DISCHARGED.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift
Copy link
Copy Markdown
Contributor Author

Superseded by #1044 — 11-PR cascade collapsed into single squash-merge to avoid O(n²) rebase treadmill. Content identical; this branch's commit is in #1044.

@noahgift noahgift closed this Apr 24, 2026
auto-merge was automatically disabled April 24, 2026 11:42

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant