Skip to content

chore(scripts): M32d.1 — generate_qwen3_moe_fp16_logits.py fixture script#1129

Merged
noahgift merged 1 commit into
mainfrom
feat/m32d-1-hf-fp16-fixture
Apr 29, 2026
Merged

chore(scripts): M32d.1 — generate_qwen3_moe_fp16_logits.py fixture script#1129
noahgift merged 1 commit into
mainfrom
feat/m32d-1-hf-fp16-fixture

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Per qwen3-moe-forward-v1 v1.3.0 staged plan, M32d.1 authors the one-time HF FP16 reference fixture-generation script that M32d.2's parity test consumes. M32d.0 (parity strategy decision) shipped in #1128; this is M32d.1.

The script (scripts/generate_qwen3_moe_fp16_logits.py):

  • Loads HuggingFace Qwen/Qwen3-Coder-30B-A3B-Instruct at FP16 (BF16 fallback) with device_map="auto" so accelerate can split params across GPU + CPU + disk on a 24 GB-VRAM lambda-vector RTX 4090 (one forward pass: ~10–30 min with offload).
  • Tokenizes the canonical M32d prompt "What is 2+2?" and runs ONE greedy decode step with use_cache=False.
  • Dumps the full 151936-dim logit vector at the seq-end position (the next-token-after-prompt distribution), plus argmax token + decoded text + dtype + tokens + git_sha + transformers/torch versions + UTC timestamp + vocab_size, into a JSON fixture.

This is a fixture-generator only — it does NOT validate parity. M32d.2 follows in the next slice (crates/aprender-serve/tests/qwen3_moe_parity.rs reads the JSON, computes cosine similarity vs apr's CPU forward, asserts > 0.99 per AC_QW3_MOE_005).

Operator-confirm gate (one-time, separate step)

Running this script (separate from committing it) requires:

  • ~60 GB disk in ~/.cache/huggingface for the FP16 weights download.
  • ~10–30 min wall time per forward pass with accelerate offload.

The fixture is captured once and committed verbatim; downstream tests read the JSON.

Why this is small

This PR is tight: 1 file (185 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.2 (parity test) and M32d.3 (llama-cli sanity) are subsequent slices. M32d.4 flips DRAFT → ACTIVE_RUNTIME after both gates pass.

Test plan

  • python3 -c "import ast; ast.parse(...)" — syntax valid
  • Pre-commit quality gates passed
  • Operator runs the script on lambda-vector to produce the fixture (deferred to M32d.2)
  • M32d.2 integration test reads fixture and gates cosine > 0.99

🤖 Generated with Claude Code

…ript

Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.1 authors the one-time
HF FP16 fixture-generation script that M32d.2's parity test consumes.

The script:
- Loads HuggingFace Qwen3-Coder-30B-A3B-Instruct at FP16 (BF16 fallback)
  with `device_map="auto"` so accelerate can split params across
  GPU + CPU + disk on a 24 GB-VRAM lambda-vector RTX 4090
  (one forward pass: ~10-30 min with offload).
- Tokenizes the canonical M32d prompt "What is 2+2?" and runs ONE
  greedy decode step with `use_cache=False`.
- Dumps the full 151936-dim logit vector at the seq-end position
  (the next-token-after-prompt distribution), plus argmax token,
  argmax decoded text, model dtype, prompt, tokens, git_sha,
  transformers version, torch version, generated_utc, vocab_size.

This is a fixture-generator only — it does NOT validate parity. M32d.2
authors `crates/aprender-serve/tests/qwen3_moe_parity.rs` which loads
the JSON fixture and computes cosine similarity vs apr's CPU forward
(target > 0.99 per AC_QW3_MOE_005).

Operator-confirm gate before running (one-time):
- ~60 GB disk in ~/.cache/huggingface for the FP16 weights download.
- ~10-30 min wall time per forward pass with accelerate offload.

This PR is tight: 1 file (185 LOC), no behavior change to the binary,
no contract-rev (M32d.0 already shipped at v1.3.0). M32d.2 follows in
the next slice (parity integration test).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 29, 2026 10:21
@noahgift noahgift merged commit 87a2a61 into main Apr 29, 2026
11 checks passed
@noahgift noahgift deleted the feat/m32d-1-hf-fp16-fixture branch April 29, 2026 10:43
noahgift added a commit that referenced this pull request Apr 29, 2026
…01 cosine gate (#1130)

Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.2 authors the cosine-
similarity parity test that consumes the JSON fixture from M32d.1
(PR #1129) and exercises OwnedQuantizedModel::forward_qwen3_moe end-to-
end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF.

Test: f_qw3_moe_parity_001_cosine_vs_hf_fp16 (#[ignore])
- Skips with eprintln if no cached Qwen3-Coder GGUF or no FP16 fixture
  file present (operator-confirm-gated; FP16 fixture is multi-GB).
- Loads fixture (model_name, prompt, tokens, vocab_size, logits[151936],
  argmax_token) via serde_json.
- Asserts vocab_size == 151936 and logits.len() == 151936 to catch
  fixture drift.
- Loads GGUF via MappedGGUFModel + OwnedQuantizedModel::from_mapped,
  loads all 48 MoE layer descriptors, runs ONE forward pass on the
  fixture's prompt tokens.
- Computes cosine_similarity(apr_logits, hf_fp16_logits).
- Asserts cos_sim > 0.99 per AC_QW3_MOE_005.
- Reports per FALSIFY-QW3-MOE-FORWARD-004 if_fails diagnostic order.

Three sibling unit tests run in default CI (not #[ignore]):
- fixture_loader_handles_missing_path: load_fixture returns None on
  absent path (no panic).
- cosine_similarity_unit_vectors: parallel/orthogonal/anti-parallel
  unit-vector cases.
- cosine_similarity_handles_zero_vector: zero-vector edge case
  returns 0.0 (no NaN from divide-by-zero).

Live results from cargo test -p aprender-serve --test qwen3_moe_parity:
  test result: ok. 3 passed; 0 failed; 1 ignored

This is a tight one-PR slice: 1 new test file (~230 LOC), no behavior
change to any binary, no contract-rev (M32d.0 already shipped at
v1.3.0). M32d.3 (llama-cli argmax sanity) and M32d.4 (DRAFT →
ACTIVE_RUNTIME bump) follow.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 1, 2026
M33 audit-trail bump on companion side. Records:
  * #1127 (M32c.2.2.2.1.4) live regression test on aprender main
  * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding

No code change beyond this contract mirror. M22 4-step ritual:
mirror push (this commit) → companion pin.lock refresh → companion
spec PR. Contract sha256
f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846
byte-identical with companion side.

Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin
noahgift added a commit that referenced this pull request May 11, 2026
… — closes sweep

Algorithm-level PARTIAL discharge for FALSIFY-QW3-MOE-FORWARD-001
+ 002 + 003 + 004 per `contracts/qwen3-moe-forward-v1.yaml`.
Closes 4/4 sweep on the M32d MoE forward parity contract.

## ✅ Closes 4/4 qwen3-moe-forward sweep

**Thirteen contract families now fully algorithm-bound at PARTIAL:**
- All 11 prior families (dataset/tokenizer/apr-cli-* + apr-vs-gguf-forward-parity-v1)
- `qwen3-moe-forward-v1` (4/4) ← this PR

## What this binds (M32 milestone state machine)

The four gates encode a milestone state machine for the
Qwen3-MoE forward path:

- **001 (M32a-precursor)**: regression sentinel pinning the
  "dense-FFN tensor lookup is reached" pre-M32b error string.
  Pass at this level proves the bug exists; flips polarity once
  M32b lands.
- **002 (M32b)**: arch-aware load wired but forward not yet
  implemented; expects `RealizarError::UnsupportedOperation`
  with `moe_forward_pass`.
- **003 (M32c)**: CPU forward wired; `apr run` exits 0 and
  emits at least one non-whitespace byte (correctness not yet
  asserted).
- **004 (M32d)**: numerical parity vs HuggingFace FP16
  reference; cosine similarity > 0.99 strict.

## Verdict shapes

- 001: substring contains (regression-sentinel).
- 002: substring conjunction (NOT dense-FFN AND HAS unsupported).
- 003: conjunctive (exit 0 AND non-whitespace stdout).
- 004: bounded-threshold (finite + in [-1, 1] + > 0.99 strict).

## Five-Whys

1. Why bind these now? — Closes 4/4 sweep on a milestone-tracking
   contract; pins the M32d acceptance criterion at algorithm
   level.
2. Why one module? — Bundle precedent.
3. Why distinct verdicts per gate? — Each represents a distinct
   milestone state; substring/conjunctive/threshold shapes match.
4. Why strict `> 0.99` for cosine? — Contract-literal `> 0.99`.
5. Why 19 tests across 4 verdict sections? — Mutation-survey
   coverage per gate.

## Cross-reference

Per memory `2026-04-28 session distillation track complete`:
M32d.0-M32d.3 already shipped (PRs #1129/#1130/#1131); M32d.4
fixture-gen + actual cosine measurement remain. This verdict
gives the M32d.4 work an algorithm-level acceptance criterion.

## Tests

19 unit tests, all green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant