feat(export): SPEC §81 P0-F — HF arch → GGUF lowercase case mapping#1699
Merged
Conversation
Maps HuggingFace transformers convention architecture strings
(LlamaForCausalLM, Qwen2ForCausalLM, ...) to GGUF / llama.cpp
lowercase family names (llama, qwen2, ...) at the export boundary.
Empirical surfacing (§81 P0-C):
apr export --format gguf epoch-004.apr -o /tmp/e4.gguf # PASSED
llama-cli -m /tmp/e4.gguf -p "def fib(n):" # FAILED:
"unknown model architecture: 'LlamaForCausalLM'"
Fix: new `normalize_arch_for_gguf` helper in gguf_export_config.rs.
Explicit mapping table for 9 known HF names + idempotent passthrough
for GGUF-native strings + lowercase fallback for unknowns.
Verified end-to-end: rebuilt apr binary + re-exported epoch-004.apr.
llama-cli now passes the arch check (advances to the next blocker:
P0-D missing tokenizer merges — separate fix).
Contract: 9 unit tests in export_tests_arch_case_mapping.rs cover:
- LlamaForCausalLM → llama
- Qwen2/3 ForCausalLM → qwen2/3
- Qwen2/3 Moe → qwen2moe/qwen3moe
- MistralForCausalLM → llama (Mistral uses llama family in GGUF)
- Phi3, GPT2, Bert variants
- Idempotent for already-lowercase inputs
- Lowercase fallback for unknown HF names
- Property test: no uppercase chars in output for known HF inputs
Test count: 9 new tests, all PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 15, 2026
…ints
Adds CudaAprCheckpointFn::with_tokenizer_dir() builder + new
CudaTransformerTrainer::save_apr_with_tokenizer() method that reads
tokenizer.json from --tokenizer dir and embeds:
- tokenizer.vocabulary (151643 entries on §78 Qwen-0.5B fixture)
- tokenizer.merges (151387 entries)
- tokenizer.bos_token_id (parsed from added_tokens for known special
strings: <s>, <|im_start|>, <|begin_of_text|>)
- tokenizer.eos_token_id (</s>, <|im_end|>, <|end_of_text|>,
<|endoftext|>)
apr-cli pretrain.rs now passes --tokenizer through via
.with_tokenizer_dir(&config.tokenizer_dir).
Empirical verify on §78 fine-tune from Qwen-0.5B init (100 steps):
apr run <checkpoint>.apr "def fib(n):" →
[PMAT-171] Loaded embedded BPE tokenizer: 151643 vocab, 151387
merges, 3 special tokens
apr qa <checkpoint>.apr --skip-throughput ... →
Previously: "Validation failed: APR missing embedded tokenizer"
Now: Gates execute; only golden_output fails (separate issue
— model output quality at val_loss=6.56 with 100 steps).
Closes §81 P0-D (one of three apr pretrain output metadata gaps;
P0-E arch metadata is in PR #1701, P0-F arch case mapping in PR #1699).
After this PR, apr qa runs against MODEL-2 checkpoints WITHOUT the
external --tokenizer requirement — checkpoints are now self-contained
per the AC-SHIP2-005 / .apr format spec.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 15, 2026
…0 packaging gaps (#1702) Triple-amendment to SPEC-SHIP-TWO-001 capturing the §78 → §80 dispatch arc that revealed a Class 3 packaging-defect wave in apr pretrain output. Consolidates the content of PRs #1695 (§79), #1697 (§80), #1698 (§81) which were all DIRTY against main due to overlapping spec-header edits. §79 — External audit + Five-Whys retrospective on MODEL-2 convergence Synthesizes docs/specifications/two-model-spec-audit.md. Identifies three compounding root causes for the val_loss=9.75 plateau: 1. Data starvation (0.24% of Chinchilla-optimal token count) 2. False plateau hypothesis (LR-budget falsification) 3. Infrastructure masking bugs (silent CPU fallback, exhaustion placeholder, premature early-stop) Five-Whys for Case A (silent corpus exhaustion), Case B (early stop), Case C (val_loss=9.75 plateau). Reconciles audit Recommendations 1-3 vs §78's §49-pivot path. §80 — Prioritized open-follow-up backlog Ranks all open SHIP-TWO-001 work by ship-% delta ÷ effort. P0 trio (apr qa / bench / export against epoch-004.apr) + P1 Chinchilla gate + P1 python validity + P1 HumanEval + P2-A long train = MODEL-2 theoretical ceiling 92% at ~6-10h RTX 4090 compute. §81 — P0 dispatch surfaced 3 systemic packaging-defect gaps Dispatching §80's P0 trio against §78's epoch-004.apr revealed: - P0-A apr qa → "APR missing embedded tokenizer" - P0-B apr bench → "C-03: APR model missing 'hidden_size' metadata" - P0-C apr export → PASSED, but llama-cli refused with "unknown model architecture: 'LlamaForCausalLM'" (GGUF expects lowercase "llama") Companion code PRs: - #1699 P0-F → HF→GGUF arch case mapping in apr export - #1701 P0-D + P0-E → embed tokenizer + write arch metadata in apr pretrain output AC-SHIP2-010 → DISCHARGED (315.5 tok/s on Qwen-0.5B fine-tune; 3.15× over the 100 tok/s floor). Methodology lessons added: #26 NEW: Three-class root-cause taxonomy for ML convergence failures (data starvation / optimization defects / infrastructure masking). Diagnose which class is binding before tuning. #27 NEW: Prioritize by ship-% delta ÷ effort, not alphabetical AC order. P0 dispatches are 0.1% the compute cost of P2-A. #28 NEW: Class 3 defects come in waves. Training works ≠ checkpoint is usable. Each lifecycle stage needs its own surfacing dispatch. Ship-% movement: MODEL-1: 100% (unchanged) MODEL-2: 75% (unchanged in this PR; +2pp expected on #1701 merge) Spec v3.24.0 → v3.27.0. Replaces PRs #1695, #1697, #1698 (all DIRTY against main). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 15, 2026
…etadata in apr pretrain output (#1701) * feat(pretrain): SPEC §81 P0-E — write arch metadata keys to .apr checkpoints CudaTransformerTrainer::save_apr now writes individual transformer config keys (hidden_size, num_hidden_layers, num_attention_heads, num_kv_heads, intermediate_size, vocab_size, max_position_embeddings, rope_theta, rms_norm_eps) as AprWriter well-known metadata fields. These map to AprV2Metadata typed fields via build_v2_metadata, which realizar's gguf::config::from_apr requires (C-03 gate). Before this fix, MODEL-2 checkpoints from `apr pretrain` failed apr bench with "C-03: APR model missing 'hidden_size' metadata". End-to-end verify on §78 fine-tune from Qwen-0.5B init: apr inspect --json | jq .metadata → hidden_size: 896 ✓ num_layers: 24 ✓ num_heads: 14 ✓ num_kv_heads: 2 ✓ intermediate_size: 4864 ✓ vocab_size: 151936 ✓ apr bench --iterations 5 --max-tokens 128 → tokens_per_second: 315.5 ✓ (3.15× over AC-SHIP2-010 100 tok/s floor) passed: true ✓ AC-SHIP2-010 (FALSIFY-SHIP-020) PARTIAL_ALGORITHM_LEVEL → DISCHARGED on real fine-tuned MODEL-2 checkpoint. Implementation: replace the legacy save_model() path (which only wrote model_name + architecture + format + version) with a direct AprWriter call that adds the arch dim keys. Reuses io::save::infer_all_tensor_shapes for 2D weight shape handling (now pub(crate)). MODEL-2 ship %: 75% → 77%. Closes §81 P0-E (one of three apr pretrain output metadata gaps; P0-D embed-tokenizer and P0-F arch-case-mapping are separate PRs). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(pretrain): SPEC §81 P0-D — embed tokenizer.json into APR checkpoints Adds CudaAprCheckpointFn::with_tokenizer_dir() builder + new CudaTransformerTrainer::save_apr_with_tokenizer() method that reads tokenizer.json from --tokenizer dir and embeds: - tokenizer.vocabulary (151643 entries on §78 Qwen-0.5B fixture) - tokenizer.merges (151387 entries) - tokenizer.bos_token_id (parsed from added_tokens for known special strings: <s>, <|im_start|>, <|begin_of_text|>) - tokenizer.eos_token_id (</s>, <|im_end|>, <|end_of_text|>, <|endoftext|>) apr-cli pretrain.rs now passes --tokenizer through via .with_tokenizer_dir(&config.tokenizer_dir). Empirical verify on §78 fine-tune from Qwen-0.5B init (100 steps): apr run <checkpoint>.apr "def fib(n):" → [PMAT-171] Loaded embedded BPE tokenizer: 151643 vocab, 151387 merges, 3 special tokens apr qa <checkpoint>.apr --skip-throughput ... → Previously: "Validation failed: APR missing embedded tokenizer" Now: Gates execute; only golden_output fails (separate issue — model output quality at val_loss=6.56 with 100 steps). Closes §81 P0-D (one of three apr pretrain output metadata gaps; P0-E arch metadata is in PR #1701, P0-F arch case mapping in PR #1699). After this PR, apr qa runs against MODEL-2 checkpoints WITHOUT the external --tokenizer requirement — checkpoints are now self-contained per the AC-SHIP2-005 / .apr format spec. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 15, 2026
Merged
noahgift
added a commit
that referenced
this pull request
May 16, 2026
…for llama.cpp interop (#1706) * fix(export): SPEC §82 P0-G — pad tokenizer.ggml.tokens to vocab_size llama.cpp's check_tensor_dims uses len(tokenizer.ggml.tokens) as the expected first dim of token_embd.weight. Qwen2.5 models pad embed_tokens to 151936 for TP-alignment but real tokenizer vocab is 151643 — the 293 delta causes llama-cli to refuse loading APR-exported GGUFs. Fix: thread `<arch>.vocab_size` into both tokenizer-emission paths (GgufTokenizer + APR-fallback) and pad with `<|pad_N|>` placeholders from `len(tokens)` to `vocab_size`. Pass 0 to disable (back-compat for tests that don't care about model dims). Empirically verified end-to-end on SPEC §82's P2-A epoch-020 checkpoint: [P0-G] Padding APR-fallback tokenizer.ggml.tokens: 151643 + 293 placeholders = 151936 Unit tests (6 new): - test_p0g_pad_tokens_to_vocab_size (GgufTokenizer path) - test_p0g_no_pad_when_vocab_size_zero (back-compat) - test_p0g_no_pad_when_vocab_size_equals_tokens - test_p0g_no_pad_when_vocab_size_smaller (no truncation) - test_p0g_apr_fallback_pad_tokens_to_vocab_size (APR path) - test_p0g_apr_fallback_no_pad_when_vocab_size_zero Discharges AC-SHIP2-010 vocab-size component (next blocker is P0-H tensor-count mismatch — separate PR). Methodology lesson #29: Class 3 packaging defects surface in waves. P0-G is the 4th in 24h (D embed tokenizer, E arch dims, F arch case, G vocab pad). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): SPEC §82 — P2-A 5000-step EARLY_STOP val_loss=4.7111; P0-G LIVE-discharged §82 records the first long-training MODEL-2 dispatch since §34 (27 days, 60 amendments ago): - §34 ceiling broken further: 9.38 → 5.36 (§78) → 4.71 (§82) - P2-A on lambda-vector RTX 4090: 27 epochs / 2700 steps / ~40 min wall - Best val_loss = 4.7110777 at epoch 20 P0 trio dispatched against epoch-020.apr: - P0-A apr qa: infra PASS (only golden_output fails — expected for pretrain) - P0-B apr bench: PASS at 325.1 tok/s with embedded BPE tokenizer + C-03 metadata satisfied — confirms #1701 P0-D/E fixes live in production - P0-C step 1 apr export: PASS — confirms #1699 P0-F arch case mapping live - P0-C step 2 llama-cli: BLOCKED by NEW Class 3 defect P0-G (fixed in companion code commit on this branch) - P0-G fix DISCHARGED end-to-end; surfaces P0-H tensor-count mismatch (out of scope for this PR) AC-SHIP2-* movement: - AC-SHIP2-009 → DISCHARGED (apr bench works on pretrain ckpt) - AC-SHIP2-006 → FUNCTIONAL (apr qa infra runs end-to-end) - AC-SHIP2-010 → vocab-component DISCHARGED via P0-G; blocked on P0-H MODEL-1 ship %: 100% (unchanged). MODEL-2 ship %: 77% → 79% (+1 for AC-SHIP2-009; +1 for ceiling break to 4.71). Methodology lesson #29 NEW: Class 3 packaging defects surface in waves of 4 (not 2). Every downstream tool falsifies its own invariant in the checkpoint-emission contract. Evidence: evidence/section-82-p2a-results-2026-05-15/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): SPEC §82 P0-G/P0-H — add llama-cli failure logs (force-add over .gitignore) The two llamacli error logs document the pre-fix (P0-G vocab mismatch) and post-P0G-fix (P0-H tensor count mismatch) states for §82 evidence. .gitignore excludes *.log so force-add is required. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(lint): collapsible_match — flatten nested if-let into find_map pattern CI failed clippy::collapsible_match on the nested if-let chain in P0-G's APR-fallback padding path. Rust 2021 edition can't use let chains, so the cleanest fix is to use find_map with a pattern guard that returns the inner Vec<String> directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Maps HF
LlamaForCausalLM→ GGUFllamaatapr export --format ggufboundary. Empirically verified end-to-end: llama-cli now passes arch check (advances to P0-D next blocker, missing tokenizer merges — separate fix). 9 unit tests cover all known HF mappings + idempotent + lowercase fallback. MODEL-2 ship %: unchanged (P0-D still blocks llama-cli load). Closes one of three §81 blockers.