feat(falsify-ship-002): MODEL-1 Python syntax PARTIAL discharge (clean branch) by noahgift · Pull Request #1017 · paiml/aprender

noahgift · 2026-04-22T20:07:51Z

Summary

Binds AC-SHIP1-002 (MODEL-1 teacher apr run emits syntactically valid Python on the canonical def fib(n): prompt) to a zero-tolerance algorithm-level verdict rule via a pure const fn verdict_from_syntax_error_count(usize) -> Ship002Verdict in crates/aprender-core/src/qa/ship_002.rs.

Supersedes #1016 — clean rebuild on fresh branch from main (no stacked SHIP-005/007 dependency).

Deltas

Contract — contracts/qwen2-e2e-verification-v1.yaml v1.0.0 → v1.1.0: adds FALSIFY-QW2E-SHIP-002 with discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by (3 symbols), full_discharge_blocks_on (live apr run + rustpython/ruff AST parse on RTX 4090), and 5 counter_example_classes.
Spec — docs/specifications/aprender-train/ship-two-models-spec.md v2.25.0 → v2.26.0: AC-SHIP1-002 + FALSIFY-SHIP-002 rows annotated (PARTIAL_ALGORITHM_LEVEL v2.26.0).
New module — crates/aprender-core/src/qa/ship_002.rs (156 lines) — const AC_SHIP1_002_MAX_TOLERATED_SYNTAX_ERRORS = 0, Ship002Verdict {Pass, Fail}, verdict_from_syntax_error_count const fn.
Module registration — crates/aprender-core/src/qa/mod.rs gains pub mod ship_002;.

Mutation Survey (6 sections)

zero-errors → Pass (only shipping scenario)
exactly-one-error → Fail (boundary; tighter than MODEL-2 SHIP-017 which tolerates 1/100)
many-errors Fail band {2, 10, 100}
monotonicity sweep 0..=256 (no Fail→Pass flip once Fail observed)
usize::MAX sanity Fail
provenance pin locks tolerance = 0 (spec §4.2)

Why

4th MODEL-1 PARTIAL discharged from main's baseline (after SHIP-009 + SHIP-008 + SHIP-006). Uniquely tight rule — zero tolerance on the single canonical def fib(n): prompt — because spec §4.2 AC-SHIP1-002 "emits valid Python" carries no noise allowance, unlike MODEL-2 SHIP-017 which tolerates ≤1/100.

Full discharge blocks on

apr run paiml/qwen2.5-coder-7b-apache-q4k-v1.safetensors --prompt "def fib(n):"

on RTX 4090 + rustpython/ruff AST parse of the emitted completion. Any parse error count > 0 is a ship-blocker.

Test plan

cargo fmt -p aprender-core -- --check — exit 0
cargo test -p aprender-core --lib falsify_ship_002_python_syntax_error_threshold_logic — green (1 passed, 0 failed)
cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/qwen2-e2e-verification-v1.yaml — 0 error(s), 0 warning(s). Contract is valid.
CI: ci / gate + workspace-test

Refs: task #159

🤖 Generated with Claude Code

Bind AC-SHIP1-002 ("teacher emits syntactically valid Python on canonical `def fib(n):` prompt") to a zero-tolerance algorithm-level verdict rule via a pure `const fn verdict_from_syntax_error_count( usize) -> Ship002Verdict` in `crates/aprender-core/src/qa/ship_002.rs`. Contract delta — `contracts/qwen2-e2e-verification-v1.yaml` v1.0.0 → v1.1.0 adds `FALSIFY-QW2E-SHIP-002` with `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` (const + fn + test), and `full_discharge_blocks_on` (live `apr run` + `rustpython`/`ruff` AST parse on RTX 4090). Spec delta — `docs/specifications/aprender-train/ship-two-models-spec.md` v2.25.0 → v2.26.0; AC-SHIP1-002 + FALSIFY-SHIP-002 rows annotated `(PARTIAL_ALGORITHM_LEVEL v2.26.0)`. Mutation survey — 6 sections (single test) exercised by `falsify_ship_002_python_syntax_error_threshold_logic`: 1. zero-errors → Pass (only shipping scenario) 2. exactly-one-error → Fail (boundary; tighter than MODEL-2 SHIP-017 which tolerates 1/100) 3. many-errors Fail band {2, 10, 100} 4. monotonicity sweep 0..=256 (no Fail→Pass flip) 5. `usize::MAX` sanity Fail 6. provenance pin locks tolerance = 0 (spec §4.2) Why: 4th MODEL-1 PARTIAL touched from main's baseline (SHIP-009 + SHIP-008 + SHIP-006 + now SHIP-002). Uniquely tight rule — zero tolerance on the single canonical `def fib(n):` prompt — because spec §4.2 AC-SHIP1-002 "emits valid Python" carries no noise allowance, unlike MODEL-2 SHIP-017 which tolerates ≤1 error across 100 held-out prompts. Full discharge blocks on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1.safetensors --prompt "def fib(n):"` on RTX 4090 + `rustpython`/`ruff` AST parse of the emitted completion. Refs: task #159 Verify: cargo test -p aprender-core --lib \ falsify_ship_002_python_syntax_error_threshold_logic cargo run --quiet -p aprender-contracts-cli --bin pv -- \ validate contracts/qwen2-e2e-verification-v1.yaml Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rance <= Clippy flags `syntax_errors <= AC_SHIP1_002_MAX_TOLERATED_SYNTAX_ERRORS` because the constant is `0` (unsigned minimum), so `<=` is vacuously equivalent to `==`. CI ci/lint went red on this. Keep the `<=` shape intentionally — it mirrors MODEL-2 SHIP-017's `verdict_from_syntax_error_count` (tolerance = 1, where `<=` is non-vacuous) so the two can be deduplicated into a single parameterized helper once both PRs land. Add `#[allow(clippy::absurd_extreme_comparisons)]` with an inline justification block above the attribute. Verify: cargo clippy -p aprender-core --lib -- -D warnings → clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift force-pushed the feat/falsify-ship-002-clean branch from 6bbf1ce to 9d39857 Compare April 22, 2026 20:27

noahgift merged commit f615148 into main Apr 22, 2026
10 checks passed

noahgift deleted the feat/falsify-ship-002-clean branch April 22, 2026 21:10

noahgift mentioned this pull request Apr 22, 2026

CI infra: disk-guard cross-runner race wipes target/ mid-build (ci/coverage persistent flake) #1020

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(falsify-ship-002): MODEL-1 Python syntax PARTIAL discharge (clean branch)#1017

feat(falsify-ship-002): MODEL-1 Python syntax PARTIAL discharge (clean branch)#1017
noahgift merged 2 commits into
mainfrom
feat/falsify-ship-002-clean

noahgift commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 22, 2026

Summary

Deltas

Mutation Survey (6 sections)

Why

Full discharge blocks on

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant