Sync master with upstream release b5191 #68

jan-service-account · 2025-04-26T00:08:10Z

Updates dev branch with latest release (b5191) from ggml-org/llama.cpp

…org#12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.

* clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO

* Force FP32 compute in cuBLAS GEMM * Revert "Force FP32 compute in cuBLAS GEMM" This reverts commit 6efd872. * Force F32 compute in GLM4 ffn down * Edit comment to clarify issue Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

rgerganov and others added 5 commits April 25, 2025 10:08

change the reorder tensor from init to execute OP (ggml-org#13003)

514c456

clip : fix pixtral on some GPU backends (ggml-org#13097)

edb18b6

* clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO

llama : fix K-shift with quantized K and BLAS backend (ggml-org#13113)

295354e

jan-service-account merged commit 5fa0519 into dev Apr 26, 2025
9 checks passed

jan-service-account deleted the update-dev-from-master-2025-04-26-00-08 branch April 26, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5191 #68

Sync master with upstream release b5191 #68

Uh oh!

jan-service-account commented Apr 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Sync master with upstream release b5191 #68

Sync master with upstream release b5191 #68

Uh oh!

Conversation

jan-service-account commented Apr 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants