Skip to content

Sync master with upstream release b9049#508

Merged
jan-service-account merged 12 commits into
devfrom
update-dev-from-master-2026-05-07-01-03
May 7, 2026
Merged

Sync master with upstream release b9049#508
jan-service-account merged 12 commits into
devfrom
update-dev-from-master-2026-05-07-01-03

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b9049) from ggml-org/llama.cpp

fl0rianr and others added 12 commits May 5, 2026 22:12
…--fit (ggml-org#22688)

* ggml : report estimated OpenCL memory for --fit

Signed-off-by: Florian Reinle <f.reinle@otec.de>

* ggml : estimated OpenCL memory backend integrated

Signed-off-by: Florian Reinle <f.reinle@otec.de>

---------

Signed-off-by: Florian Reinle <f.reinle@otec.de>
…2597)

* add filter_tensors classmethod

* remove language_model

* fix parts validation
…gml-org#22719)

* refactor: Remove Google favicon utility

* fix: MCP Server favicon

* refactor: Cleanup

* refactor: MCP Server Information

* fix: Fix MCP Settings UI

* refactor: Cleanup
* convert : ignore non-language tensors for Gemma4Model

This commit adds a check to make sure only text language tensors are
handled in filter_tensors.

The motivation is that currently when trying to convert a Gemma4 model
the following error occurs:
```console
(venv) $ ./convert-gemma.sh
INFO:hf-to-gguf:Loading model: gemma-4-E2B-it
INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration
INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,                 torch.float32 --> F32, shape = {256}
Traceback (most recent call last):
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 13752, in <module>
    main()
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 13746, in main
    model_instance.write()
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 945, in write
    self.prepare_tensors()
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 805, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 7925, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 7290, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 579, in modify_tensors
    new_name = self.map_tensor_name(name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 572, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.embed_vision.embedding_projection.weight'
```

* add forgotten embed_vision and embed_audio

* improve

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* feat: migrate to PEP 621 and add uv support

* fix: remove upper bound on protobuf

* remove poetry.lock and uv.lock

* fix/add torch dependency version and markers

* fix dev-dependency deprecation warning

* gguf-py : update python version requirement to 3.10

---------

Co-authored-by: David Huggins-Daines <dhd@dhd.ecolingui.ca>
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
…gml-org#22101)

* mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)

Conformer encoder with Shaw relative position encoding,
QFormer projector, log-mel spectrogram with frame stacking.

Encoder uses GLU gating, folded batch norm, and SSM depthwise
conv. QFormer compresses encoder output via windowed
cross-attention (window=15, queries=3) into the LLM embedding
space.

Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank,
dynamic range compression, 2x frame stacking (80->160 mel).

GGUF converter handles batch norm folding at export time,
fused K/V split, and Conv1d weight reshaping.

Tested against HF transformers reference: token-for-token match
on 30s/60s audio clips with greedy decoding.

* mtmd: rename gs_ prefixed tensors to generic/architecture names

* mtmd: use tensor_mapping.py for all granite_speech tensors

* convert: fold GraniteSpeechTextModel into GraniteModel

* mtmd: replace n_layer hack with explicit has_standard_layers flag

* mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech

* mtmd: align KEY_A_ define spacing

* convert: register GraniteModel for GraniteSpeechForConditionalGeneration

* convert: fix ty type-check for GraniteSpeechMmprojModel registration

* mtmd: align TN_ define spacing

* mtmd: use generic layer loop for granite speech tensor loading

* mtmd: merge qformer_proj_layer into clip_layer

* mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs

* mtmd: granite_speech add comment explaining why build_attn is not used

* mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata

* gguf: add spacing between granite_speech tensor mapping blocks

* mtmd: make generic audio layer_norm_eps read optional

* mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps

* mtmd: align defines and struct fields in clip-impl.h and clip-model.h

* mtmd: fix alignment and ordering issues across granite speech files

* convert: granite_speech use filter_tensors instead of modify_tensors for skipping
* gguf-py : bump version to 0.19.0

* bump poetry

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: do not fit to unknown device memory

Signed-off-by: Florian Reinle <f.reinle@otec.de>

* common: preserve host fallback for non-GPU fit devices

Signed-off-by: Florian Reinle <f.reinle@otec.de>

* common: keep unknown GPU fit memory at zero

Signed-off-by: Florian Reinle <f.reinle@otec.de>

---------

Signed-off-by: Florian Reinle <f.reinle@otec.de>
* model: don't crash on unsupported architecture

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Support MiniCPM-V 4.6 in new branch

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix code bug

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix pre-commit

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix convert

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* rename clip_graph_minicpmv4_6

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* use new TYPE_MINICPMV4_6

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* use build_attn to allow flash attention support

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* no use legacy code, restored here.

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* use the existing tensors name

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* unused ctx->model.hparams.minicpmv_version

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* use n_merge for slice alignment

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* borrow wa_layer_indexes for vit_merger insertion point

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix code style

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* use filter_tensors and add model.vision_tower

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix chkhsh

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* fix type check

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

---------

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@jan-service-account jan-service-account merged commit 1c5c739 into dev May 7, 2026
4 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-05-07-01-03 branch May 7, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.