Sync master with upstream release b7016 #320

jan-service-account · 2025-11-11T00:36:12Z

Updates dev branch with latest release (b7016) from ggml-org/llama.cpp

[no ci]

* vulkan : implement upscale with bicubic interpolation * cuda : implement upscale with bicubic interpolation * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests * adapt OpenCL backend to not support the OP in that case so tests don't fail * print scale mode & flags in test-backend-ops

[no ci]

…l-org#17128) * mtmd: fix patch_size initialized to random value in audio models * add default hparams

…t_q6_K_… (ggml-org#15277) * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K * Surround SVE function with compiler directive * fix compile switch * fix coding style * ggml : fix indent --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* feat(memory): Only fail partial erasure of recurrent tail The recurrent state is always assumed to be the state as of the last update from the final token in the sequence. When doing a partial erasure, if the range does not include the final token, the erasure can be considered a success since any memory used for the sequence prior to the final token (which is no memory) has been successfully removed. There is one potential case that this doesn't address which is the pruning of cache to remove sensitive data from the context. This wouldn't work for attention cache partial removal (in the middle) either since the KV state is linearly-dependent and states in later sequence positions would still be based on the state from the sensitive data, even if that data is no longer cached, so I don't think this is relevant, but it is worth noting that the semantics of this change for a partial erasure in the middle of the cache are essentially "my context is already compressed" and not "all trace of the removed tokens has been removed." ggml-org#16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(main): Check the output of seq_rm for prefix matching This prefix matching is explicitly attempting to remove the tokens at the end of the sequence that don't match. This is the operation that can't be performed on a recurrent cache due to the state being updated in place, so if this removal fails, we need to clear the whole cache. ggml-org#16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(memory): Fix condition for partial erasure failure if p0 > pos Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> * style: Fix extra parens Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear ggml-org#16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…7145)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty

ggerganov and others added 13 commits November 10, 2025 10:44

benches : add eval results (ggml-org#17139)

15274c0

[no ci]

editorconfig : ignore benches/ (ggml-org#17140)

9898b57

[no ci]

mtmd: fix patch_size initialized to random value in audio models (ggm…

4b13a68

…l-org#17128) * mtmd: fix patch_size initialized to random value in audio models * add default hparams

batched-bench : add "separate text gen" mode (ggml-org#17103)

f914544

metal : enable tensor API for A19 (ggml-org#17087)

c27efd2

vulkan: fix validation issue introduced by ggml-org#16868 (ggml-org#1…

85234a4

…7145)

vulkan: check glslc executable string (ggml-org#17144)

f117be1

ggml-cpu : inspect -march and -mcpu to found the CPU (ggml-org#16333)

967eb4b

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

metal : cap threadgroups size of set_rows (ggml-org#17146)

13730c1

cpu: skip NOPs to avoid barriers (ggml-org#17133)

395e286

* cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty

jan-service-account merged commit 07e4b51 into dev Nov 11, 2025
1 check passed

jan-service-account deleted the update-dev-from-master-2025-11-11-00-36 branch November 11, 2025 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b7016 #320

Sync master with upstream release b7016 #320

Uh oh!

jan-service-account commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Sync master with upstream release b7016 #320

Sync master with upstream release b7016 #320

Uh oh!

Conversation

jan-service-account commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants