Skip to content

feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge#1013

Merged
noahgift merged 1 commit into
feat/falsify-ship-008-partial-dischargefrom
feat/falsify-ship-006-partial-discharge
Apr 22, 2026
Merged

feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge#1013
noahgift merged 1 commit into
feat/falsify-ship-008-partial-dischargefrom
feat/falsify-ship-006-partial-discharge

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Wires AC-SHIP1-006 (`apr qa ` — all 8 gates PASS) at `PARTIAL_ALGORITHM_LEVEL`:
a pure aggregate-AND verdict fn bound to the 8-gate ship criterion from
`docs/specifications/components/qa.md` §3.

MODEL-1 AC-SHIP1 coverage: 2/10 → 3/10 touched (after SHIP-008 + SHIP-009).
First MODEL-1 aggregate-AND PARTIAL shape.

What changed

File Change
`crates/aprender-core/src/qa/ship_006.rs` NEW — `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn + 7-section mutation survey
`crates/aprender-core/src/qa/mod.rs` register `pub mod ship_006;`
`contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0, adds `FALSIFY-QA-SHIP-006` (PARTIAL_ALGORITHM_LEVEL)
`docs/specifications/aprender-train/ship-two-models-spec.md` v2.24.0 → v2.25.0, annotates AC/FALSIFY rows + amendment entry

Design

  • Aggregate-AND: `Pass` iff every gate is `true` AND the slice has exactly 8 entries.
  • 7-section mutation survey in `falsify_ship_006_apr_qa_eight_gates_aggregate`:
    1. all-Pass 8-long array → Pass.
    2. all-Fail 8-long array → Fail.
    3. single-gate-flip × 8: every index flipped individually → Fail (rejects OR / majority / 7-of-8 drift).
    4. Exhaustive 2^8=256-mask proof: Pass iff mask == 0xFF.
    5. Monotonicity: Pass → single-gate-flip must yield Fail.
    6. Length drift: 0, 7, 9, 16-long inputs all → Fail (catches `--skip-metadata`, added-gate, double-count bugs).
    7. Provenance pin on `AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8` lockstepping with spec §4.2 + `docs/specifications/components/qa.md` §3.
  • Mirrors MODEL-2 SHIP-016 aggregate shape (task feat(serve): Add --verbose flag for request/response payload logging #152 on `feat/falsify-ship-016-partial-discharge`). SHIP-016 isn't on main yet, so SHIP-006 is authored self-contained; once both ship the two verdict fns should be deduped into a single parameterized helper.
  • Full discharge blocks on a live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` run on an RTX 4090 host.

Test plan

  • `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
  • `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → "Contract is valid" (0 errors, 0 warnings).
  • Workspace clippy on the SHIP-006 crate: `cargo clippy -p aprender-core --lib --no-deps -- -D warnings` → clean on new code.
  • CI `ci / test` + `workspace-test` green on this stacked branch.
  • Full discharge: live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` on RTX 4090 with 8× `"pass": true` in the JSON body.

Stacked on #1012

Base = `feat/falsify-ship-008-partial-discharge` (PR #1012). When #1012 merges,
GitHub will automatically retarget this PR to `main`.

🤖 Generated with Claude Code

…arge

Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at
PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound
to the 8-gate ship criterion from `docs/specifications/components/qa.md`
§3 (golden / throughput / ollama parity / gpu speedup / tensor contracts
/ format parity / ptx parity / metadata).

Files:
- `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) —
  `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with
  7-section mutation survey: all-Pass→Pass, all-Fail→Fail,
  single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail
  monotonicity, length-drift counter-examples (0 / 7 / 9 / 16),
  provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8).

- `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`.

- `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QA-SHIP-006` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_006.rs + the harness test, and
  `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1
  --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body).

- `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows
  with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry.

Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016
(task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on
main). Authored self-contained because SHIP-016 hasn't landed;
once both ship, the two `verdict_from_qa_gates_*` fns should be
deduplicated into a single parameterized helper. Required gate
count differs by model (both 8 today — the spec's "All must Pass"
is model-independent).

MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) →
**3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL.

Full discharge blocks on a live `apr qa` run against the teacher
weights on RTX 4090; the compute-heavy portion is intentionally
out of scope here.

Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors.

Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec
v2.25.0 builds on v2.24.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 0457857 into feat/falsify-ship-008-partial-discharge Apr 22, 2026
1 check passed
@noahgift noahgift deleted the feat/falsify-ship-006-partial-discharge branch April 22, 2026 16:37
noahgift added a commit that referenced this pull request Apr 22, 2026
* feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge

Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL.

- contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds
  GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to
  the canonical Qwen2.5-Coder-7B (system, user) golden via a pure
  verdict_from_chat_template_render const fn. ship_blocking: true,
  discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks
  on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion
  diff against golden.
- crates/aprender-core/src/text/chat_template/ship_008.rs (new):
  AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants +
  Ship008Verdict enum + verdict_from_chat_template_render const fn
  (byte-equality, UTF-8-safe) + 5-section mutation survey
  (engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim
  Fail, swapped-roles Fail, single-byte flip Fail) + symmetry +
  provenance pin.
- crates/aprender-core/src/text/chat_template/mod.rs: include!
  ship_008.rs alongside existing template.rs, raw_template.rs.
- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row
  annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry
  records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1
  non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern).

Test: cargo test -p aprender-core --lib
  falsify_ship_008_chat_template_render_bind -> 1 passed
Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid

Refs: SHIP-TWO-001, task #155

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge (#1013)

Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at
PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound
to the 8-gate ship criterion from `docs/specifications/components/qa.md`
§3 (golden / throughput / ollama parity / gpu speedup / tensor contracts
/ format parity / ptx parity / metadata).

Files:
- `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) —
  `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with
  7-section mutation survey: all-Pass→Pass, all-Fail→Fail,
  single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail
  monotonicity, length-drift counter-examples (0 / 7 / 9 / 16),
  provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8).

- `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`.

- `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QA-SHIP-006` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_006.rs + the harness test, and
  `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1
  --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body).

- `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows
  with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry.

Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016
(task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on
main). Authored self-contained because SHIP-016 hasn't landed;
once both ship, the two `verdict_from_qa_gates_*` fns should be
deduplicated into a single parameterized helper. Required gate
count differs by model (both 8 today — the spec's "All must Pass"
is model-independent).

MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) →
**3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL.

Full discharge blocks on a live `apr qa` run against the teacher
weights on RTX 4090; the compute-heavy portion is intentionally
out of scope here.

Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors.

Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec
v2.25.0 builds on v2.24.0.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant