fix(export): infer GGUF metadata (num_heads/etc.) so apr->gguf works on metadata-light .apr (PMAT-920)#2212
Merged
Merged
Conversation
…on metadata-light .apr (PMAT-920) `apr export --format gguf model.apr` hard-failed with a C-07 "num_heads required for GGUF export" error on any .apr whose metadata block lacked explicit num_heads / hidden_size / vocab_size / intermediate_size — e.g. a model produced by training or `apr convert` without a fully-populated header. Users could not export such .apr files to GGUF for llama.cpp / ollama at all, even though the tensor shapes carry these dimensions unambiguously. Fix: in the APR->GGUF raw passthrough path (export_apr_to_gguf_raw), add infer_missing_gguf_dims_from_shapes, which reuses the existing shape-inference engine (infer_model_config_from_tensors) to derive the missing GGUF-required dimensions from tensor shapes before the C-07 error fires: - num_heads / num_kv_heads from q/k/v projection shapes (head_dim) - hidden_size / vocab_size from the embedding shape - intermediate_size from the FFN gate/up shapes Explicit APR metadata always wins; inference only fills None fields. Shapes are interpreted row-major (LAYOUT-001), consistent with import. (The non-raw export_to_gguf path already infers via resolve_gguf_config; this closes the gap in the raw passthrough path.) Falsifier FALSIFY-EXPORT-INFER-METADATA-001 (ft_apr_gguf_export_infers_heads_when_metadata_light): a metadata-light .apr (num_heads=None, shapes imply 2 heads / 1 kv head / hidden 128) now exports to GGUF AND the produced GGUF re-reads with the correct attention.head_count / head_count_kv / embedding_length. RED on the unfixed exporter (C-07 hard-fail), GREEN on the fix; mutation-verified by reverting the inference call (falsifier goes RED). Contract apr-export-num-layers-v1 bumped to 1.1.0: new equation attn_dim_inference, a proof obligation, and the single-line falsifier ref. pv validate + pv lint contracts/ pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r guess (PMAT-920)
The first cut of PMAT-920 inferred num_heads for apr->gguf export via
infer_head_counts, which GUESSES head_dim from a hardcoded
[64, 128, 96, 80] list and takes the first divisor of q_dim. That
SILENTLY mis-stamps real models: Qwen2-1.5B (q_dim=1536, head_dim=128,
12 heads) gets 1536/64 = 24 heads written into a valid-looking GGUF with
no error. A wrong-but-valid head count is worse than the original honest
hard-fail.
Root cause: num_heads is NOT inferable from shapes alone —
q_dim = num_heads x head_dim has no unique factorization without
head_dim (1536 = 12x128 = 24x64 = 16x96, all valid).
Fix (export_apr_to_gguf_raw / infer_missing_gguf_dims_from_shapes):
- num_heads / num_kv_heads: derived ONLY from an EXPLICIT head_dim in
the APR header (AprV2Metadata.head_dim), num_heads = q_dim/head_dim,
num_kv_heads = kv_dim/head_dim — EXACT and sound
(fill_head_counts_from_explicit_head_dim). Explicit num_heads always
wins. The [64,128,96,80] guess is no longer consulted on the export
path (infer_head_counts stays for the import fallback only).
- When head_dim AND num_heads are BOTH absent → actionable hard-fail
(missing_num_heads_err) naming the missing dim and a working remedy
(stamp head_dim/num_heads, or re-convert from the source config). No
GGUF is written.
- hidden_size / vocab_size / intermediate_size stay inferred from
embedding/FFN shapes (these ARE unambiguous).
Falsifiers (both directions, mutation-verified):
FALSIFY-EXPORT-INFER-METADATA-001
(ft_apr_gguf_export_uses_explicit_head_dim_exactly_not_guess):
metadata-light .apr with explicit head_dim=128, q_dim=256 → GGUF
head_count == 2 EXACTLY (the old guess would say 256/64 = 4).
FALSIFY-EXPORT-INFER-METADATA-002
(ft_apr_gguf_export_hard_fails_when_head_dim_and_num_heads_absent):
head_dim AND num_heads absent → Err naming num_heads + remedy, NO
GGUF written. Reverting to the guess flips BOTH falsifiers RED.
Contract apr-export-num-layers-v1 bumped to 1.2.0: drops the
"q_dim/head_dim always inferable" overclaim, states num_heads requires
explicit head_dim (else honest error), adds both falsifier refs
(single-line). pv validate + pv lint contracts/ pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug (EV#4 — user-facing export correctness)
apr export --format gguf model.apr -o out.ggufhard-fails with aC-07
num_heads required for GGUF export (missing in APR metadata)erroron any
.aprwhose metadata block lacks explicitnum_heads(and otherGGUF-required dims). A user who trained or
apr converted a.aprwithout a fully-populated header could not export it to GGUF for
llama.cpp / ollama at all — even though the tensor shapes carry those
dimensions unambiguously.
Fix
In the APR→GGUF raw passthrough path (
export_apr_to_gguf_raw), addinfer_missing_gguf_dims_from_shapes, which reuses the existingshape-inference engine (
infer_model_config_from_tensors) to derive themissing GGUF-required dimensions from tensor shapes before the C-07 error
fires:
num_heads/num_kv_headsfrom q/k/v projection shapes (viahead_dim)hidden_size/vocab_sizefrom the embedding shapeintermediate_sizefrom the FFN gate/up shapesExplicit APR metadata always wins; inference only fills
Nonefields.Shapes are interpreted row-major (LAYOUT-001), consistent with import.
(The non-raw
export_to_ggufpath already inferred viaresolve_gguf_config; this closes the gap in the raw passthrough path,which still hard-failed.)
Falsifier (RED→GREEN, mutation-verified)
FALSIFY-EXPORT-INFER-METADATA-001(
ft_apr_gguf_export_infers_heads_when_metadata_light): a metadata-light.apr(num_heads=None; shapes imply 2 heads / 1 kv head / hidden 128)now exports to GGUF and the produced GGUF re-reads with the correct
attention.head_count(2) /head_count_kv(1) /embedding_length(128).Contract
contracts/apr-export-num-layers-v1.yamlbumped to 1.1.0 — new equationattn_dim_inference, a proof obligation, and the single-line falsifierref.
pv validate+pv lint contracts/pass (0 errors).Tests
cargo test -p aprender-core --lib— 14000 passcargo test -p apr-cli --lib— 5993 passexport_apr_to_gguf_rawtests still green🤖 Generated with Claude Code