Sync master with upstream release b8233 by jan-service-account · Pull Request #446 · janhq/llama.cpp

jan-service-account · 2026-03-08T00:48:20Z

Updates dev branch with latest release (b8233) from ggml-org/llama.cpp

* CUDA: use shared mem for ssm_conv * fuse silu + ssm_conv * fuse unary + mul * enable for fp16 * formatting Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

This patch addresses an Internal Compiler Error (Segmentation fault) observed with gcc 15 by replacing the intrinsic + cast by doing a cat on the data first and then calling the intrinsic. This bypasses the buggy compiler path while maintaining identical instruction selection. Performance Verification: Assembly analysis on RHEL 9 (GCC 15.1.1) confirms that both the original code and this fix generate the identical Power10 prefixed load instruction: `plxv 40, 2(14)` This ensures zero performance regression while unblocking builds on newer toolchains. Reproduced on: - Alpine Linux + GCC 15.2.0-r2 - RHEL 9 + GCC 15.1.1 (gcc-toolset-15) Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>

…ml-org#20157) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

…0120) * server : preserve anthropic thinking blocks in conversion (ggml-org#20090) * server : add tests for anthropic thinking block conversion --------- Co-authored-by: root <root@llamacpp.home>

* hexagon: add ssm_conv op * hexagon: hvx kernel is functional * hexagon: improvements to ssm-conv hvx kernel * hexagon: added dma to ssm-conv hvx kernel * hexagon: ssm-conv dynamically compute gather scratchpad * hex-ssm-conv: add local context and fix various issues (spad indexing, etc) --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>

) * Autoparser - full single commit squish * Final pre-merge changes: minor fixes, Kimi 2.5 model parser

* Add memsets and other fixes for IQ quants * Make memset unconditional, change Laux back to L * Move another memset

* Allow reshuffled arguments in tagged argument parser format tool calls. * Remove shuffle just keep the optional parsers in any order * Remove unnecessary import

* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing * Whitespace * Remove redundant atomics

* ggml: add GATED_DELTA_NET op * remove the transpose * add KDA * add qwen35 dense * llama : check for fused gated delta net backend support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

am17an and others added 14 commits March 6, 2026 23:09

CUDA: use shared mem for ssm_conv (ggml-org#20128)

1e38a7a

* CUDA: use shared mem for ssm_conv * fuse silu + ssm_conv * fuse unary + mul * enable for fp16 * formatting Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ggml: update comments for backends which have no memory to report (gg…

ba2ff79

…ml-org#20157) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-cuda: add mem check for fusion (ggml-org#19916)

d48e876

* ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

cpu: skip redudant ROPE cache updates (ggml-org#20149)

ba2fd11

server : preserve anthropic thinking blocks in conversion (ggml-org#2…

e68f2fb

…0120) * server : preserve anthropic thinking blocks in conversion (ggml-org#20090) * server : add tests for anthropic thinking block conversion --------- Co-authored-by: root <root@llamacpp.home>

Autoparser - complete refactoring of parser architecture (ggml-org#18675

566059a

) * Autoparser - full single commit squish * Final pre-merge changes: minor fixes, Kimi 2.5 model parser

Add @pwilkin to CODEOWNERS for autoparser code (ggml-org#20174)

7463687

quants : Add memsets and other fixes for IQ quants (ggml-org#19861)

649f064

* Add memsets and other fixes for IQ quants * Make memset unconditional, change Laux back to L * Move another memset

Autoparser: add optional argument reshuffle capability (ggml-org#20171)

2f2923f

* Allow reshuffled arguments in tagged argument parser format tool calls. * Remove shuffle just keep the optional parsers in any order * Remove unnecessary import

Autoparser: True streaming (ggml-org#20177)

c024d85

* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing * Whitespace * Remove redundant atomics

opencl: add l2_norm (ggml-org#20160)

6fce5c6

ggml: add GATED_DELTA_NET op (ggml-org#19504)

c5a7788

* ggml: add GATED_DELTA_NET op * remove the transpose * add KDA * add qwen35 dense * llama : check for fused gated delta net backend support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

jan-service-account merged commit f53f2fb into dev Mar 8, 2026
1 check passed

jan-service-account deleted the update-dev-from-master-2026-03-08-00-48 branch March 8, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b8233#446

Sync master with upstream release b8233#446
jan-service-account merged 14 commits intodevfrom
update-dev-from-master-2026-03-08-00-48

jan-service-account commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

jan-service-account commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants