fix(export): SPEC §82 P0-G — pad tokenizer.ggml.tokens to vocab_size for llama.cpp interop#1706
Merged
Conversation
llama.cpp's check_tensor_dims uses len(tokenizer.ggml.tokens) as the
expected first dim of token_embd.weight. Qwen2.5 models pad embed_tokens
to 151936 for TP-alignment but real tokenizer vocab is 151643 — the 293
delta causes llama-cli to refuse loading APR-exported GGUFs.
Fix: thread `<arch>.vocab_size` into both tokenizer-emission paths
(GgufTokenizer + APR-fallback) and pad with `<|pad_N|>` placeholders
from `len(tokens)` to `vocab_size`. Pass 0 to disable (back-compat for
tests that don't care about model dims).
Empirically verified end-to-end on SPEC §82's P2-A epoch-020 checkpoint:
[P0-G] Padding APR-fallback tokenizer.ggml.tokens:
151643 + 293 placeholders = 151936
Unit tests (6 new):
- test_p0g_pad_tokens_to_vocab_size (GgufTokenizer path)
- test_p0g_no_pad_when_vocab_size_zero (back-compat)
- test_p0g_no_pad_when_vocab_size_equals_tokens
- test_p0g_no_pad_when_vocab_size_smaller (no truncation)
- test_p0g_apr_fallback_pad_tokens_to_vocab_size (APR path)
- test_p0g_apr_fallback_no_pad_when_vocab_size_zero
Discharges AC-SHIP2-010 vocab-size component (next blocker is
P0-H tensor-count mismatch — separate PR).
Methodology lesson #29: Class 3 packaging defects surface in waves.
P0-G is the 4th in 24h (D embed tokenizer, E arch dims, F arch case, G vocab pad).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…G LIVE-discharged §82 records the first long-training MODEL-2 dispatch since §34 (27 days, 60 amendments ago): - §34 ceiling broken further: 9.38 → 5.36 (§78) → 4.71 (§82) - P2-A on lambda-vector RTX 4090: 27 epochs / 2700 steps / ~40 min wall - Best val_loss = 4.7110777 at epoch 20 P0 trio dispatched against epoch-020.apr: - P0-A apr qa: infra PASS (only golden_output fails — expected for pretrain) - P0-B apr bench: PASS at 325.1 tok/s with embedded BPE tokenizer + C-03 metadata satisfied — confirms #1701 P0-D/E fixes live in production - P0-C step 1 apr export: PASS — confirms #1699 P0-F arch case mapping live - P0-C step 2 llama-cli: BLOCKED by NEW Class 3 defect P0-G (fixed in companion code commit on this branch) - P0-G fix DISCHARGED end-to-end; surfaces P0-H tensor-count mismatch (out of scope for this PR) AC-SHIP2-* movement: - AC-SHIP2-009 → DISCHARGED (apr bench works on pretrain ckpt) - AC-SHIP2-006 → FUNCTIONAL (apr qa infra runs end-to-end) - AC-SHIP2-010 → vocab-component DISCHARGED via P0-G; blocked on P0-H MODEL-1 ship %: 100% (unchanged). MODEL-2 ship %: 77% → 79% (+1 for AC-SHIP2-009; +1 for ceiling break to 4.71). Methodology lesson #29 NEW: Class 3 packaging defects surface in waves of 4 (not 2). Every downstream tool falsifies its own invariant in the checkpoint-emission contract. Evidence: evidence/section-82-p2a-results-2026-05-15/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d over .gitignore) The two llamacli error logs document the pre-fix (P0-G vocab mismatch) and post-P0G-fix (P0-H tensor count mismatch) states for §82 evidence. .gitignore excludes *.log so force-add is required. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ttern CI failed clippy::collapsible_match on the nested if-let chain in P0-G's APR-fallback padding path. Rust 2021 edition can't use let chains, so the cleanest fix is to use find_map with a pattern guard that returns the inner Vec<String> directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 16, 2026
… --init model (#1709) When `apr pretrain --init <qwen2.apr>` fine-tunes a Qwen2 model, the trainer was hardcoded to stamp `("llama-370m-pretrain", "LlamaForCausalLM")` regardless of what the init model actually was. Downstream `apr export --format gguf` then routed through the llama-family GGUF mapper, which has no mapping for Qwen2's per-layer biases (q_proj_bias, k_proj_bias, v_proj_bias × 24 layers = 72 tensors). Those biases fell through to passthrough names like `model.layers.0.self_attn.q_proj.bias`, got counted in the GGUF header (291 total), but llama.cpp's llama-arch loader silently skipped them → `done_getting_tensors: wrong number of tensors; expected 291, got 219`. The fix derives `name` and `architecture` from `init_arch`: - Qwen2 init → ("qwen2-pretrain", "Qwen2ForCausalLM") - Other init → ("<hf_model_type>-pretrain", "<hf_architecture>") - No init → ("llama-370m-pretrain", "LlamaForCausalLM") [back-compat] Once stamped correctly, the qwen2 GGUF family mapper handles biases via its `q_proj_bias: "attn_q.bias"` rules and the tensor count matches. Discharges §82's P0-H item and unblocks AC-SHIP2-010 (llama-cli interop) in combination with the P0-G vocab pad fix (PR #1706). Test plan: - 3 new unit tests in pretrain::tests: - checkpoint_name_and_arch_default_when_no_init (back-compat) - checkpoint_name_and_arch_qwen2_init (Qwen2 stamping) - checkpoint_name_and_arch_init_without_hf_fields (graceful fallback) - All 3 PASS Methodology lesson #29 evidence: P0-G surfaced P0-H within minutes; 4 Class 3 defects (P0-D, P0-E, P0-F, P0-G, P0-H) in 24h confirms the "waves of 4" pattern. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tokenizer.ggml.tokensto match<arch>.vocab_sizeso llama.cpp'scheck_tensor_dimsaccepts the correspondingtoken_embd.weightfirst dim. Threadsvocab_sizethrough bothbuild_tokenizer_gguf_metadata(GgufTokenizer path) andextract_apr_tokenizer_for_gguf(APR-fallback path), padding with<|pad_N|>placeholders.apr benchPASSED at 325.1 tok/s with embedded BPE tokenizer + C-03 metadata on pretrain checkpoint.Empirical verification
Before fix:
After fix (with new
aprbinary, re-exporting same checkpoint):Llama.cpp accepts the vocab metadata; next blocker (P0-H tensor count mismatch) is out of scope for this PR.
Test plan
vocab_size=0)cargo test -p aprender-core --lib p0g_→ 6/6 PASSMethodology
Lesson #29 NEW: Class 3 packaging defects surface in waves of 4 (not 2):
Each downstream tool falsifies its own invariant in the checkpoint-emission contract.
🤖 Generated with Claude Code