test(aprender-serve): M32d.3 — qwen3_moe_argmax_parity.rs F-QW3-MOE-PARITY-002 llama.cpp argmax sanity by noahgift · Pull Request #1131 · paiml/aprender

noahgift · 2026-04-29T11:10:21Z

Summary

Per `qwen3-moe-forward-v1` v1.3.0 staged plan, M32d.3 authors the secondary sanity test (axis (b) of `FALSIFY-QW3-MOE-FORWARD-004`) — runs `llama-cli` as a subprocess on the same prompt + GGUF + greedy-deterministic sampler and asserts its first emitted token's decoded text matches the FP16 fixture's `argmax_text`.

This is the LAST code slice before M32d.4 flips contract DRAFT → ACTIVE_RUNTIME.

What this validates

The strict axis-(b) gate is `apr_argmax_token_id == llama_cpp_argmax_token_id`. We discharge it transitively:

M32d.2 (test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate #1130) asserts `cos_sim(apr_logits, hf_fp16_logits) > 0.99` → apr's argmax ≈ HF FP16's argmax (any cosine > 0.99 over a 151936-dim logit vector forces argmax agreement except in pathological near-tie cases).
THIS test asserts `llama_cpp_first_decoded_token == hf_fp16.argmax_text` → llama.cpp Q4_K's argmax equals HF FP16's argmax at decoded-text level.
Composing (1) and (2): apr ≈ HF ≈ llama.cpp — the contract gate.

Decoding apr's argmax inside this test would require pulling the GGUF tokenizer into the integration layer, which is a separate slice; the transitive composition gives the same gate strength with a tighter PR.

Tests

`f_qw3_moe_parity_002_argmax_vs_llama_cpp` (`#[ignore]`, heavy):

Locates llama-cli via `which` first, then 4 fallback paths (`~~/.local/bin/`, `~~/src/llama.cpp/`, `/usr/local/bin/`, `/usr/bin/`).
Skips cleanly if llama-cli, GGUF, or fixture missing.
Spawns llama-cli with FALSIFY-QW3-MOE-FORWARD-004's documented flags:
`-p "What is 2+2?" -n 1 --top-k 1 --temp 0.0 --seed 0 --no-display-prompt -no-cnv --no-warmup --log-disable`
Asserts trimmed-equality OR substring-containment (tolerates leading-space marker in some tokenizers' detokenize paths).
Reports per FALSIFY-QW3-MOE-FORWARD-004 `if_fails` axis (b) diagnostic on miss.

Three sibling unit tests run in default CI:

`locate_llama_cli_handles_missing` — bogus path-list returns `None`.
`extract_first_emit_strips_blank_leading_lines` — blank-line robustness.
`extract_first_emit_handles_empty` — empty-input edge case.

Live verification

```
$ cargo test -p aprender-serve --test qwen3_moe_argmax_parity
running 4 tests
test f_qw3_moe_parity_002_argmax_vs_llama_cpp ... ignored
test extract_first_emit_handles_empty ... ok
test extract_first_emit_strips_blank_leading_lines ... ok
test locate_llama_cli_handles_missing ... ok

test result: ok. 3 passed; 0 failed; 1 ignored
```

Why this is small

This PR is tight: 1 new test file (~270 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.4 (DRAFT → ACTIVE_RUNTIME bump after both axes discharge live on lambda-vector) is the final slice.

Test plan

`cargo test -p aprender-serve --test qwen3_moe_argmax_parity` — 3 sibling tests pass, heavy test ignored
`cargo fmt -p aprender-serve` — formatted
Pre-commit quality gates passed
Operator runs `cargo test ... -- --ignored` on lambda-vector after M32d.1 fixture is generated (deferred to M32d.4 DRAFT→ACTIVE flip)

🤖 Generated with Claude Code

…ARITY-002 llama.cpp argmax sanity Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.3 authors the secondary sanity test (axis (b) of FALSIFY-QW3-MOE-FORWARD-004) — runs llama-cli as a subprocess on the same prompt + GGUF + greedy-deterministic sampler and asserts its first emitted token's decoded text matches the FP16 fixture's argmax_text. Test: f_qw3_moe_parity_002_argmax_vs_llama_cpp (#[ignore]) - Locates llama-cli via `which` first, then 4 fallback paths (~/.local/bin/, ~/src/llama.cpp/, /usr/local/bin/, /usr/bin/). - Skips with eprintln if llama-cli, GGUF, or fixture missing (operator-confirm-gated; fixture is multi-GB, GGUF is 17.3 GB). - Spawns llama-cli with the deterministic flags from FALSIFY-QW3-MOE-FORWARD-004's test: block: -p "What is 2+2?" -n 1 --top-k 1 --temp 0.0 --seed 0 --no-display-prompt -no-cnv --no-warmup --log-disable - Captures stdout, extracts first non-empty line. - Asserts trimmed equality (or substring containment in either direction, to tolerate the leading-space marker some tokenizer detokenize paths emit) between llama-cli's first emit and fixture.argmax_text. Three sibling unit tests run in default CI (not #[ignore]): - locate_llama_cli_handles_missing: bogus path-list returns None. - extract_first_emit_strips_blank_leading_lines: blank-line robustness. - extract_first_emit_handles_empty: empty-input edge case. The rich docstring documents the *transitive* nature of axis (b): the strict gate `apr_argmax_token_id == llama_cpp_argmax_token_id` is discharged by composing M32d.2's apr-vs-HF cosine > 0.99 (forces argmax agreement) with this test's llama.cpp-vs-HF decoded-text equality. Decoding apr's argmax inside this test would require pulling the GGUF tokenizer into the integration layer, which is a separate slice; the transitive composition gives the same gate strength with a tighter PR. Live results from `cargo test -p aprender-serve --test qwen3_moe_argmax_parity`: test result: ok. 3 passed; 0 failed; 1 ignored This PR is tight: 1 new test file (~245 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.4 (DRAFT → ACTIVE_RUNTIME bump after both axes discharge live on lambda-vector) is the final slice. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

M33 audit-trail bump on companion side. Records: * #1127 (M32c.2.2.2.1.4) live regression test on aprender main * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding No code change beyond this contract mirror. M22 4-step ritual: mirror push (this commit) → companion pin.lock refresh → companion spec PR. Contract sha256 f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846 byte-identical with companion side. Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin

… — closes sweep Algorithm-level PARTIAL discharge for FALSIFY-QW3-MOE-FORWARD-001 + 002 + 003 + 004 per `contracts/qwen3-moe-forward-v1.yaml`. Closes 4/4 sweep on the M32d MoE forward parity contract. ## ✅ Closes 4/4 qwen3-moe-forward sweep **Thirteen contract families now fully algorithm-bound at PARTIAL:** - All 11 prior families (dataset/tokenizer/apr-cli-* + apr-vs-gguf-forward-parity-v1) - `qwen3-moe-forward-v1` (4/4) ← this PR ## What this binds (M32 milestone state machine) The four gates encode a milestone state machine for the Qwen3-MoE forward path: - **001 (M32a-precursor)**: regression sentinel pinning the "dense-FFN tensor lookup is reached" pre-M32b error string. Pass at this level proves the bug exists; flips polarity once M32b lands. - **002 (M32b)**: arch-aware load wired but forward not yet implemented; expects `RealizarError::UnsupportedOperation` with `moe_forward_pass`. - **003 (M32c)**: CPU forward wired; `apr run` exits 0 and emits at least one non-whitespace byte (correctness not yet asserted). - **004 (M32d)**: numerical parity vs HuggingFace FP16 reference; cosine similarity > 0.99 strict. ## Verdict shapes - 001: substring contains (regression-sentinel). - 002: substring conjunction (NOT dense-FFN AND HAS unsupported). - 003: conjunctive (exit 0 AND non-whitespace stdout). - 004: bounded-threshold (finite + in [-1, 1] + > 0.99 strict). ## Five-Whys 1. Why bind these now? — Closes 4/4 sweep on a milestone-tracking contract; pins the M32d acceptance criterion at algorithm level. 2. Why one module? — Bundle precedent. 3. Why distinct verdicts per gate? — Each represents a distinct milestone state; substring/conjunctive/threshold shapes match. 4. Why strict `> 0.99` for cosine? — Contract-literal `> 0.99`. 5. Why 19 tests across 4 verdict sections? — Mutation-survey coverage per gate. ## Cross-reference Per memory `2026-04-28 session distillation track complete`: M32d.0-M32d.3 already shipped (PRs #1129/#1130/#1131); M32d.4 fixture-gen + actual cosine measurement remain. This verdict gives the M32d.4 work an algorithm-level acceptance criterion. ## Tests 19 unit tests, all green.

noahgift enabled auto-merge (squash) April 29, 2026 11:10

Merge branch 'main' into feat/m32d-3-llama-cli-argmax-test

9ae8746

noahgift merged commit 9f93d02 into main Apr 29, 2026
10 checks passed

noahgift deleted the feat/m32d-3-llama-cli-argmax-test branch April 29, 2026 11:40

noahgift mentioned this pull request May 9, 2026

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(aprender-serve): M32d.3 — qwen3_moe_argmax_parity.rs F-QW3-MOE-PARITY-002 llama.cpp argmax sanity#1131

test(aprender-serve): M32d.3 — qwen3_moe_argmax_parity.rs F-QW3-MOE-PARITY-002 llama.cpp argmax sanity#1131
noahgift merged 2 commits into
mainfrom
feat/m32d-3-llama-cli-argmax-test

noahgift commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 29, 2026

Summary

What this validates

Tests

Live verification

Why this is small

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant