feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge#1013
Merged
noahgift merged 1 commit intoApr 22, 2026
Conversation
…arge Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound to the 8-gate ship criterion from `docs/specifications/components/qa.md` §3 (golden / throughput / ollama parity / gpu speedup / tensor contracts / format parity / ptx parity / metadata). Files: - `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) — `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with 7-section mutation survey: all-Pass→Pass, all-Fail→Fail, single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail monotonicity, length-drift counter-examples (0 / 7 / 9 / 16), provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8). - `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`. - `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds `FALSIFY-QA-SHIP-006` with `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` pointing at ship_006.rs + the harness test, and `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body). - `docs/specifications/aprender-train/ship-two-models-spec.md` v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry. Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016 (task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on main). Authored self-contained because SHIP-016 hasn't landed; once both ship, the two `verdict_from_qa_gates_*` fns should be deduplicated into a single parameterized helper. Required gate count differs by model (both 8 today — the spec's "All must Pass" is model-independent). MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) → **3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL. Full discharge blocks on a live `apr qa` run against the teacher weights on RTX 4090; the compute-heavy portion is intentionally out of scope here. Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed. Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors. Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec v2.25.0 builds on v2.24.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0457857
into
feat/falsify-ship-008-partial-discharge
1 check passed
4 tasks
noahgift
added a commit
that referenced
this pull request
Apr 22, 2026
* feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge
Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL.
- contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds
GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to
the canonical Qwen2.5-Coder-7B (system, user) golden via a pure
verdict_from_chat_template_render const fn. ship_blocking: true,
discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks
on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion
diff against golden.
- crates/aprender-core/src/text/chat_template/ship_008.rs (new):
AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants +
Ship008Verdict enum + verdict_from_chat_template_render const fn
(byte-equality, UTF-8-safe) + 5-section mutation survey
(engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim
Fail, swapped-roles Fail, single-byte flip Fail) + symmetry +
provenance pin.
- crates/aprender-core/src/text/chat_template/mod.rs: include!
ship_008.rs alongside existing template.rs, raw_template.rs.
- docs/specifications/aprender-train/ship-two-models-spec.md
v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row
annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry
records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1
non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern).
Test: cargo test -p aprender-core --lib
falsify_ship_008_chat_template_render_bind -> 1 passed
Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid
Refs: SHIP-TWO-001, task #155
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge (#1013)
Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at
PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound
to the 8-gate ship criterion from `docs/specifications/components/qa.md`
§3 (golden / throughput / ollama parity / gpu speedup / tensor contracts
/ format parity / ptx parity / metadata).
Files:
- `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) —
`verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with
7-section mutation survey: all-Pass→Pass, all-Fail→Fail,
single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail
monotonicity, length-drift counter-examples (0 / 7 / 9 / 16),
provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8).
- `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`.
- `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds
`FALSIFY-QA-SHIP-006` with `ship_blocking: true`,
`discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
pointing at ship_006.rs + the harness test, and
`full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1
--json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body).
- `docs/specifications/aprender-train/ship-two-models-spec.md`
v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows
with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry.
Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016
(task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on
main). Authored self-contained because SHIP-016 hasn't landed;
once both ship, the two `verdict_from_qa_gates_*` fns should be
deduplicated into a single parameterized helper. Required gate
count differs by model (both 8 today — the spec's "All must Pass"
is model-independent).
MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) →
**3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL.
Full discharge blocks on a live `apr qa` run against the teacher
weights on RTX 4090; the compute-heavy portion is intentionally
out of scope here.
Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors.
Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec
v2.25.0 builds on v2.24.0.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires AC-SHIP1-006 (`apr qa ` — all 8 gates PASS) at `PARTIAL_ALGORITHM_LEVEL`:
a pure aggregate-AND verdict fn bound to the 8-gate ship criterion from
`docs/specifications/components/qa.md` §3.
MODEL-1 AC-SHIP1 coverage: 2/10 → 3/10 touched (after SHIP-008 + SHIP-009).
First MODEL-1 aggregate-AND PARTIAL shape.
What changed
Design
Test plan
Stacked on #1012
Base = `feat/falsify-ship-008-partial-discharge` (PR #1012). When #1012 merges,
GitHub will automatically retarget this PR to `main`.
🤖 Generated with Claude Code