Sync master with upstream release b4944 #24

jan-service-account · 2025-03-24T00:08:43Z

Updates dev branch with latest release (b4944) from ggml-org/llama.cpp

…-org#9976) * [SYCL] Fix build on Windows when ccache enabled (ggml-org#9954) * take effect only on windows and force it to icl --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

…2482)

ggml-ci

…g#12472)

* Vulkan: RTE rounding for cpy to quant Co-Authored-By: Jeff Bolz <jbolz@nvidia.com> * remove trailing whitespace * avoid duplicating pipeline_cpy_f32_quant * fix copypasting issue * remove duplicated code --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders * vulkan: Optimize mul_mat_vec p021 and nc shaders. These shaders are used in attention calculations, and when the KV cache grows large they start to dominate the run time. For the nc shader (which is called with large 'k' dimension), use unrolling and vector loads. For the p021 shader (which is called with large 'm' and small 'k' dimensions), take advantage of grouped query attention to reuse loads from the A matrix for the whole group, and reduce the number of workgroups (too much overhead from tiny dispatches). Using subgroupAdd in the p021 shader also helps, use that conditionally.

* musa: refine compute capability Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* ggml : fix quantized cpy op ggml-ci * tests : add cpy tests for all types ggml-ci * tests : add BF16 copy tests ggml-ci * tests : fix loop for same-type copy ggml-ci * tests : add option to permute the dst tensor ggml-ci

…-org#12506) * llama : gemma3 : use output tensor if it exists in model weight * also add to the llm_tensor_names

MacPorts section added

…g#12246) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.

MakeDecisionWorth and others added 12 commits March 21, 2025 14:58

[SYCL] Fix build on Windows when ccache enabled (ggml-org#9954) (ggml…

1aa87ee

…-org#9976) * [SYCL] Fix build on Windows when ccache enabled (ggml-org#9954) * take effect only on windows and force it to icl --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

llama-tts : avoid crashes related to bad model file paths (ggml-org#1…

ea1518e

…2482)

chore : cleanup llama_model_loader::TENSOR_ usage (ggml-org#12492)

960e726

model : do not repack if a GPU device is present (ggml-org#12498)

af04481

ggml-ci

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (ggml-or…

30c42ef

…g#12472)

musa: refine compute capability (ggml-org#12493)

fac63a3

* musa: refine compute capability Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

ggml : fix quantized cpy op (ggml-org#12310)

ba932df

* ggml : fix quantized cpy op ggml-ci * tests : add cpy tests for all types ggml-ci * tests : add BF16 copy tests ggml-ci * tests : fix loop for same-type copy ggml-ci * tests : add option to permute the dst tensor ggml-ci

llama : gemma3 : use output tensor if it exists in model weight (ggml…

fbdfefe

…-org#12506) * llama : gemma3 : use output tensor if it exists in model weight * also add to the llm_tensor_names

install : add macports (ggml-org#12518)

18b663d

MacPorts section added

server : Add verbose output to OAI compatible chat endpoint. (ggml-or…

77f9c6b

…g#12246) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.

github-actions bot added Nvidia GPU Vulkan testing examples python ggml documentation server labels Mar 24, 2025

jan-service-account merged commit 21e3bdd into dev Mar 24, 2025
12 checks passed

jan-service-account deleted the update-dev-from-master-2025-03-24-00-08 branch March 24, 2025 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b4944 #24

Sync master with upstream release b4944 #24

Uh oh!

jan-service-account commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Sync master with upstream release b4944 #24

Sync master with upstream release b4944 #24

Uh oh!

Conversation

jan-service-account commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants