Skip to content

test(realizar): M32c.2.2.2.1.4 — live apr run falsifier pinning FALSIFY-QW3-MOE-FORWARD-003#1127

Merged
noahgift merged 1 commit into
mainfrom
feat/m32c-2-2-2-1-4-live-falsifier
Apr 29, 2026
Merged

test(realizar): M32c.2.2.2.1.4 — live apr run falsifier pinning FALSIFY-QW3-MOE-FORWARD-003#1127
noahgift merged 1 commit into
mainfrom
feat/m32c-2-2-2-1-4-live-falsifier

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Adds a live integration test (F-QW3-MOE-C22214-001) that invokes
the user-facing apr binary as a subprocess against the cached
Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf and asserts:

  1. exit 0
  2. stdout contains ≥1 non-whitespace character

This pins the M32c.2.2.2.1.3 dispatch flip (PR #1126, squash
a902eea93) in CI / regression-prevention. Without it, a future
regression that re-routed qwen3_moe back to the dense
run_gguf_generate path (which produces garbage on MoE weights)
would slip through CI silently — there'd be no signal at the
apr run user-facing surface.

Live evidence (lambda-vector RTX 4090, 2026-04-29)

test f_qw3_moe_c22214_001_apr_run_emits_at_least_one_non_whitespace_char ...
F-QW3-MOE-C22214-001: elapsed = 130.945370974s
  stdout: === APR Run === ... Output: . ... Completed in 130.83s (cached)
F-QW3-MOE-C22214-001: PASS
test result: ok. 1 passed

The single emitted character . (period) is enough to discharge
FALSIFY-QW3-MOE-FORWARD-003 — quality vs llama.cpp Q4_K (cosine on
logits) is M32d's job.

Skip path

CI runners (no cached GGUF) print
F-QW3-MOE-C22214-001: SKIP — no cached Qwen3-Coder GGUF at any of [...]
and return success. Same skip pattern as
crates/aprender-serve/tests/qwen3_moe_forward_one_token.rs
(M32c.2.2.2.1.1 in-process forward primitive).

Contract chain status

Slice Status PR
M32a — contract scaffold SHIPPED #1099
M32b — arch-aware FFN load refuses qwen3_moe SHIPPED #1100
M32c.1+ — MoE descriptor load + byte slicer SHIPPED
M32c.2.2.2.1.1 — forward_qwen3_moe method SHIPPED #1124
M32c.2.2.2.1.2 — run_qwen3_moe_generate SHIPPED #1125
M32c.2.2.2.1.3 — dispatch flip + Q4_K_M qtype SHIPPED #1126
M32c.2.2.2.1.4 — live apr run falsifier THIS PR
M32d — numerical parity vs llama.cpp PENDING

After M32d, qwen3-moe-forward-v1 flips DRAFT → ACTIVE_RUNTIME,
unblocking the companion-repo FALSIFY-CCPA-013 measured
tool-dispatch parity gate.

Test plan

  • cargo test -p apr-cli --test qwen3_moe_apr_run_live_falsifier --no-run — compiles
  • cargo clippy -p apr-cli --test qwen3_moe_apr_run_live_falsifier -- -D warnings — clean
  • Live run on lambda-vector against cached 17.3 GB GGUF — PASS (130.83s, emitted ".")
  • CI workspace-test — passes via SKIP path (no GGUF on runners)
  • Lambda-vector full workspace test post-merge

🤖 Generated with Claude Code

…SIFY-QW3-MOE-FORWARD-003

## What ships

Adds `crates/apr-cli/tests/qwen3_moe_apr_run_live_falsifier.rs` —
F-QW3-MOE-C22214-001, an integration test that invokes the user-facing
`apr` binary as a subprocess and asserts:

  1. exit 0
  2. stdout contains ≥1 non-whitespace character

against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
with a fresh date-tagged prompt.

This pins the M32c.2.2.2.1.3 dispatch flip (PR #1126,
squash a902eea) in CI / regression-prevention. Without it, a
future regression that re-routed qwen3_moe back to the dense
`run_gguf_generate` path (which produces garbage on MoE weights)
would slip through CI silently — there'd be no signal at the
`apr run` user-facing surface.

## Live evidence (lambda-vector RTX 4090, 2026-04-29)

```
running 1 test
test f_qw3_moe_c22214_001_apr_run_emits_at_least_one_non_whitespace_char ...
F-QW3-MOE-C22214-001: live `apr run` against /home/noah/.cache/pacha/models/2b88b180a790988f.gguf
F-QW3-MOE-C22214-001: elapsed = 130.945370974s
  stdout (first 200B): === APR Run ===

Source: /home/noah/.cache/pacha/models/2b88b180a790988f.gguf

Output:
.

Completed in 130.83s (cached)

  stderr (first 200B): [BOS-FALLBACK] No tokenizer.ggml.bos_token_id in GGUF — using architecture default for 'qwen3moe'
[BOS-FALLBACK] No tokenizer.ggml.bos_token_id in GGUF — using architecture default for 'qwen3moe'

F-QW3-MOE-C22214-001: PASS
ok

test result: ok. 1 passed; 0 failed; 0 ignored
```

Token quality vs llama.cpp Q4_K (cosine on logits) is M32d. This
test asserts ONLY emit/exit-0 — the discharge gate for
FALSIFY-QW3-MOE-FORWARD-003.

## Skip path

CI runners (and any host without the cached GGUF) print:

  F-QW3-MOE-C22214-001: SKIP — no cached Qwen3-Coder GGUF at any of [...]

and return success. Same skip pattern as
`crates/aprender-serve/tests/qwen3_moe_forward_one_token.rs`
(M32c.2.2.2.1.1 in-process forward primitive).

## Contract chain status

  M32a    qwen3-moe-forward-v1 contract scaffold        SHIPPED (#1099)
  M32b    arch-aware FFN load refuses qwen3_moe          SHIPPED (#1100)
  M32c.1+ MoE descriptor load + per-expert byte slicer   SHIPPED
  M32c.2.2.2.1.1 forward_qwen3_moe method                SHIPPED (#1124)
  M32c.2.2.2.1.2 run_qwen3_moe_generate function         SHIPPED (#1125)
  M32c.2.2.2.1.3 dispatch flip + Q4_K_M qtype dispatch   SHIPPED (#1126)
  M32c.2.2.2.1.4 live `apr run` falsifier               THIS PR
  M32d           numerical parity vs llama.cpp           PENDING

After M32d the contract flips DRAFT → ACTIVE_RUNTIME, which
unblocks the companion-repo FALSIFY-CCPA-013 measured tool-dispatch
parity gate.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 0392b18 into main Apr 29, 2026
11 checks passed
@noahgift noahgift deleted the feat/m32c-2-2-2-1-4-live-falsifier branch April 29, 2026 08:34
noahgift added a commit that referenced this pull request May 1, 2026
M33 audit-trail bump on companion side. Records:
  * #1127 (M32c.2.2.2.1.4) live regression test on aprender main
  * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding

No code change beyond this contract mirror. M22 4-step ritual:
mirror push (this commit) → companion pin.lock refresh → companion
spec PR. Contract sha256
f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846
byte-identical with companion side.

Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant