chore(scripts): M32d.1 — generate_qwen3_moe_fp16_logits.py fixture script by noahgift · Pull Request #1129 · paiml/aprender

noahgift · 2026-04-29T10:21:34Z

Summary

Per qwen3-moe-forward-v1 v1.3.0 staged plan, M32d.1 authors the one-time HF FP16 reference fixture-generation script that M32d.2's parity test consumes. M32d.0 (parity strategy decision) shipped in #1128; this is M32d.1.

The script (scripts/generate_qwen3_moe_fp16_logits.py):

Loads HuggingFace Qwen/Qwen3-Coder-30B-A3B-Instruct at FP16 (BF16 fallback) with device_map="auto" so accelerate can split params across GPU + CPU + disk on a 24 GB-VRAM lambda-vector RTX 4090 (one forward pass: ~10–30 min with offload).
Tokenizes the canonical M32d prompt "What is 2+2?" and runs ONE greedy decode step with use_cache=False.
Dumps the full 151936-dim logit vector at the seq-end position (the next-token-after-prompt distribution), plus argmax token + decoded text + dtype + tokens + git_sha + transformers/torch versions + UTC timestamp + vocab_size, into a JSON fixture.

This is a fixture-generator only — it does NOT validate parity. M32d.2 follows in the next slice (crates/aprender-serve/tests/qwen3_moe_parity.rs reads the JSON, computes cosine similarity vs apr's CPU forward, asserts > 0.99 per AC_QW3_MOE_005).

Operator-confirm gate (one-time, separate step)

Running this script (separate from committing it) requires:

~60 GB disk in ~/.cache/huggingface for the FP16 weights download.
~10–30 min wall time per forward pass with accelerate offload.

The fixture is captured once and committed verbatim; downstream tests read the JSON.

Why this is small

This PR is tight: 1 file (185 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.2 (parity test) and M32d.3 (llama-cli sanity) are subsequent slices. M32d.4 flips DRAFT → ACTIVE_RUNTIME after both gates pass.

Test plan

python3 -c "import ast; ast.parse(...)" — syntax valid
Pre-commit quality gates passed
Operator runs the script on lambda-vector to produce the fixture (deferred to M32d.2)
M32d.2 integration test reads fixture and gates cosine > 0.99

🤖 Generated with Claude Code

…ript Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.1 authors the one-time HF FP16 fixture-generation script that M32d.2's parity test consumes. The script: - Loads HuggingFace Qwen3-Coder-30B-A3B-Instruct at FP16 (BF16 fallback) with `device_map="auto"` so accelerate can split params across GPU + CPU + disk on a 24 GB-VRAM lambda-vector RTX 4090 (one forward pass: ~10-30 min with offload). - Tokenizes the canonical M32d prompt "What is 2+2?" and runs ONE greedy decode step with `use_cache=False`. - Dumps the full 151936-dim logit vector at the seq-end position (the next-token-after-prompt distribution), plus argmax token, argmax decoded text, model dtype, prompt, tokens, git_sha, transformers version, torch version, generated_utc, vocab_size. This is a fixture-generator only — it does NOT validate parity. M32d.2 authors `crates/aprender-serve/tests/qwen3_moe_parity.rs` which loads the JSON fixture and computes cosine similarity vs apr's CPU forward (target > 0.99 per AC_QW3_MOE_005). Operator-confirm gate before running (one-time): - ~60 GB disk in ~/.cache/huggingface for the FP16 weights download. - ~10-30 min wall time per forward pass with accelerate offload. This PR is tight: 1 file (185 LOC), no behavior change to the binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.2 follows in the next slice (parity integration test). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…01 cosine gate (#1130) Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.2 authors the cosine- similarity parity test that consumes the JSON fixture from M32d.1 (PR #1129) and exercises OwnedQuantizedModel::forward_qwen3_moe end-to- end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF. Test: f_qw3_moe_parity_001_cosine_vs_hf_fp16 (#[ignore]) - Skips with eprintln if no cached Qwen3-Coder GGUF or no FP16 fixture file present (operator-confirm-gated; FP16 fixture is multi-GB). - Loads fixture (model_name, prompt, tokens, vocab_size, logits[151936], argmax_token) via serde_json. - Asserts vocab_size == 151936 and logits.len() == 151936 to catch fixture drift. - Loads GGUF via MappedGGUFModel + OwnedQuantizedModel::from_mapped, loads all 48 MoE layer descriptors, runs ONE forward pass on the fixture's prompt tokens. - Computes cosine_similarity(apr_logits, hf_fp16_logits). - Asserts cos_sim > 0.99 per AC_QW3_MOE_005. - Reports per FALSIFY-QW3-MOE-FORWARD-004 if_fails diagnostic order. Three sibling unit tests run in default CI (not #[ignore]): - fixture_loader_handles_missing_path: load_fixture returns None on absent path (no panic). - cosine_similarity_unit_vectors: parallel/orthogonal/anti-parallel unit-vector cases. - cosine_similarity_handles_zero_vector: zero-vector edge case returns 0.0 (no NaN from divide-by-zero). Live results from cargo test -p aprender-serve --test qwen3_moe_parity: test result: ok. 3 passed; 0 failed; 1 ignored This is a tight one-PR slice: 1 new test file (~230 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.3 (llama-cli argmax sanity) and M32d.4 (DRAFT → ACTIVE_RUNTIME bump) follow. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

M33 audit-trail bump on companion side. Records: * #1127 (M32c.2.2.2.1.4) live regression test on aprender main * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding No code change beyond this contract mirror. M22 4-step ritual: mirror push (this commit) → companion pin.lock refresh → companion spec PR. Contract sha256 f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846 byte-identical with companion side. Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin

… — closes sweep Algorithm-level PARTIAL discharge for FALSIFY-QW3-MOE-FORWARD-001 + 002 + 003 + 004 per `contracts/qwen3-moe-forward-v1.yaml`. Closes 4/4 sweep on the M32d MoE forward parity contract. ## ✅ Closes 4/4 qwen3-moe-forward sweep **Thirteen contract families now fully algorithm-bound at PARTIAL:** - All 11 prior families (dataset/tokenizer/apr-cli-* + apr-vs-gguf-forward-parity-v1) - `qwen3-moe-forward-v1` (4/4) ← this PR ## What this binds (M32 milestone state machine) The four gates encode a milestone state machine for the Qwen3-MoE forward path: - **001 (M32a-precursor)**: regression sentinel pinning the "dense-FFN tensor lookup is reached" pre-M32b error string. Pass at this level proves the bug exists; flips polarity once M32b lands. - **002 (M32b)**: arch-aware load wired but forward not yet implemented; expects `RealizarError::UnsupportedOperation` with `moe_forward_pass`. - **003 (M32c)**: CPU forward wired; `apr run` exits 0 and emits at least one non-whitespace byte (correctness not yet asserted). - **004 (M32d)**: numerical parity vs HuggingFace FP16 reference; cosine similarity > 0.99 strict. ## Verdict shapes - 001: substring contains (regression-sentinel). - 002: substring conjunction (NOT dense-FFN AND HAS unsupported). - 003: conjunctive (exit 0 AND non-whitespace stdout). - 004: bounded-threshold (finite + in [-1, 1] + > 0.99 strict). ## Five-Whys 1. Why bind these now? — Closes 4/4 sweep on a milestone-tracking contract; pins the M32d acceptance criterion at algorithm level. 2. Why one module? — Bundle precedent. 3. Why distinct verdicts per gate? — Each represents a distinct milestone state; substring/conjunctive/threshold shapes match. 4. Why strict `> 0.99` for cosine? — Contract-literal `> 0.99`. 5. Why 19 tests across 4 verdict sections? — Mutation-survey coverage per gate. ## Cross-reference Per memory `2026-04-28 session distillation track complete`: M32d.0-M32d.3 already shipped (PRs #1129/#1130/#1131); M32d.4 fixture-gen + actual cosine measurement remain. This verdict gives the M32d.4 work an algorithm-level acceptance criterion. ## Tests 19 unit tests, all green.

noahgift enabled auto-merge (squash) April 29, 2026 10:21

noahgift merged commit 87a2a61 into main Apr 29, 2026
11 checks passed

noahgift deleted the feat/m32d-1-hf-fp16-fixture branch April 29, 2026 10:43

noahgift mentioned this pull request Apr 29, 2026

test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate #1130

Merged

4 tasks

noahgift mentioned this pull request May 9, 2026

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scripts): M32d.1 — generate_qwen3_moe_fp16_logits.py fixture script#1129

chore(scripts): M32d.1 — generate_qwen3_moe_fp16_logits.py fixture script#1129
noahgift merged 1 commit into
mainfrom
feat/m32d-1-hf-fp16-fixture

noahgift commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 29, 2026

Summary

Operator-confirm gate (one-time, separate step)

Why this is small

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant