Sync master with upstream release b5033 #41

jan-service-account · 2025-04-03T00:08:05Z

Updates dev branch with latest release (b5033) from ggml-org/llama.cpp

…gml-org#12694) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2

* [CANN]get_rows and dup optimization. Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]GET_ROWS and CPY/DUP optimization Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>

* common : remove json.hpp from common.cpp * fix comment

…g#12677)

* llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning

* llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes

* model : print tensor size during load * cont : fix units MB -> MiB Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

…gml-org#12559) When adjacent batches of Q share the same batches of K/V, batch them into the same workgroup. For example, when: dst(128,32,1,1) = FA(q(128,1,32,1), k(128,16640,8,1), v(128,16640,8,1)) previously we would run 32 workgroups computing 1 result each, now we will run 8 workgroups computing 4 results each. This doesn't directly translate to better performance (at least when you have >=32 SMs), but in a subsequent change I'll enable split_k which will scale much better with 4x fewer workgroups.

When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.

ngxson and others added 11 commits April 1, 2025 23:44

common : remove json.hpp from common.cpp (ggml-org#12697)

42eb248

* common : remove json.hpp from common.cpp * fix comment

vocab : BailingMoE : change possessive quantifiers to greedy (ggml-or…

83a88bd

…g#12677)

llama : add option to override model tensor buffers (ggml-org#11397)

e0e912f

* llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes

model : print tensor size during load (ggml-org#12711)

833e2b7

* model : print tensor size during load * cont : fix units MB -> MiB Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

Vulkan: Fix mmq int dot float cache size (ggml-org#12722)

92e3006

cmake: remove caching from vulkan coopmat checks (ggml-org#12719)

6f3bd38

github-actions bot added Vulkan testing examples ggml server labels Apr 3, 2025

jan-service-account merged commit 6e30a6c into dev Apr 3, 2025
10 checks passed

jan-service-account deleted the update-dev-from-master-2025-04-03-00-08 branch April 3, 2025 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5033 #41

Sync master with upstream release b5033 #41

Uh oh!

jan-service-account commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Sync master with upstream release b5033 #41

Sync master with upstream release b5033 #41

Uh oh!

Conversation

jan-service-account commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants