Skip to content

Sync master with upstream release b9012#505

Merged
jan-service-account merged 2 commits into
devfrom
update-dev-from-master-2026-05-04-01-05
May 4, 2026
Merged

Sync master with upstream release b9012#505
jan-service-account merged 2 commits into
devfrom
update-dev-from-master-2026-05-04-01-05

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b9012) from ggml-org/llama.cpp

jmrobles and others added 2 commits May 3, 2026 18:22
…2611)

Llama-architecture q_proj/k_proj weights need an axis-0 row permutation
to match GGML's RoPE convention. The BF16 path applies this in
LlamaModel.modify_tensors via LlamaModel.permute, but the NVFP4 path
bypasses modify_tensors and writes weights directly through
ModelBase._repack_nvfp4. Without the permutation, attention heads end
up scrambled at inference and the model produces gibberish.

This change overrides _repack_nvfp4 on LlamaModel and applies the same
permutation to both the nibble-packed weight and the per-block scale
before delegating to ModelBase._repack_nvfp4 via super(). Reuses the
existing LlamaModel.permute static helper and respects the existing
undo_permute flag, so subclasses (Mistral, Granite, Llama4, etc.)
inherit the fix automatically.

Verified on TinyLlama-1.1B reproducer: perplexity drops from 4419
(gibberish) to 43.9, matching the BF16-dequantized baseline (44.0).
Also verified end-to-end on ALIA-40b-instruct-2601 (BSC, Llama
architecture) with multilingual generation in Spanish/Catalan/Basque/
Galician all coherent with the fix applied.

Co-authored-by: Chema <chema@montevive.ai>
* [BUGFIX] Mistral format apply_scale support.

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix misunderstood boolean parameters

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@jan-service-account jan-service-account merged commit b88a55d into dev May 4, 2026
4 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-05-04-01-05 branch May 4, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants