Sync master with upstream release b5452 #97

jan-service-account · 2025-05-22T00:08:43Z

Updates dev branch with latest release (b5452) from ggml-org/llama.cpp

…ITY op to accelerate D2D memory copy (ggml-org#13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

* server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> --------- Signed-off-by: Dorin Geman <dorin.geman@docker.com> Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* switch retrieval to llama_encode * enable --no-warmup for retrieval

ggml-ci

* opencl: fix couple crashes * fix kernel launches failed on devices which do not support non-uniform work-groups. When non-uniform work-groups are not supported, set `local_work_size` to NULL (= let driver choose the work-group sizes). This patch does not cover everything - just the cases tested by test-backend-ops. * fix sub-buffer creation failed due to `cl_buffer_region::origin` not being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`. * OpenCL: query non-uniform WG sizes only on OpenCL 3.0+

* opencl: Add support for multiple devices ... but limited to one platform. A platform with a GPU will be preferred. Additionally: * Filter out devices that lack capabilities needed by the backend implementation (half support, OpenCL 2.0+, etc). * Make ggml_backend_opencl_reg() thread-safe. * fixup: fix an error in sync_with_other_backends ... when there is only one OpenCL device available.

yeahdongcn and others added 14 commits May 21, 2025 09:58

model : disable SWA for Phi models (ggml-org#13676)

b44890d

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

kv-cache : simplify the interface (ggml-org#13660)

797f2ac

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

server : Add the endpoints /api/tags and /api/chat (ggml-org#13659)

0d5c742

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

ggml : add ggml_gelu_erf() (ggml-org#13667)

cf4cb59

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

gguf-py : display the invalid gguf type (ggml-org#13687)

eb0f5c2

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

examples : switch retrieval to llama_encode (ggml-org#13685)

2aa777d

* switch retrieval to llama_encode * enable --no-warmup for retrieval

convert : add qwen2vl support for unsloth merges (ggml-org#13686)

c76532e

server : improve error reporting (ggml-org#13680)

5fbfe38

hparams : support models for which all layers use SWA (ggml-org#13682)

8e186ef

ggml-ci

releases : build CPU backend separately (windows) (ggml-org#13642)

d643bb2

jan-service-account merged commit edc2f9b into dev May 22, 2025
15 checks passed

jan-service-account deleted the update-dev-from-master-2025-05-22-00-08 branch May 22, 2025 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5452 #97

Sync master with upstream release b5452 #97

Uh oh!

jan-service-account commented May 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Sync master with upstream release b5452 #97

Sync master with upstream release b5452 #97

Uh oh!

Conversation

jan-service-account commented May 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants