Skip to content

Sync master with upstream release b8233#446

Merged
jan-service-account merged 14 commits intodevfrom
update-dev-from-master-2026-03-08-00-48
Mar 8, 2026
Merged

Sync master with upstream release b8233#446
jan-service-account merged 14 commits intodevfrom
update-dev-from-master-2026-03-08-00-48

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b8233) from ggml-org/llama.cpp

am17an and others added 14 commits March 6, 2026 23:09
* CUDA: use shared mem for ssm_conv

* fuse silu + ssm_conv

* fuse unary + mul

* enable for fp16

* formatting

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
This patch addresses an Internal Compiler Error (Segmentation fault)
observed with gcc 15 by replacing the intrinsic + cast by doing
a cat on the data first and then calling the intrinsic. This bypasses the
buggy compiler path while maintaining identical instruction selection.

Performance Verification:
Assembly analysis on RHEL 9 (GCC 15.1.1) confirms that both the original
code and this fix generate the identical Power10 prefixed load instruction:
    `plxv 40, 2(14)`

This ensures zero performance regression while unblocking builds on
newer toolchains.

Reproduced on:
- Alpine Linux + GCC 15.2.0-r2
- RHEL 9  + GCC 15.1.1 (gcc-toolset-15)

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
* ggml-cuda: add mem check for fusion

* Replace NaNs with -FLT_MAX

* fix typo

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
…0120)

* server : preserve anthropic thinking blocks in conversion (ggml-org#20090)

* server : add tests for anthropic thinking block conversion

---------

Co-authored-by: root <root@llamacpp.home>
* hexagon: add ssm_conv op

* hexagon: hvx kernel is functional

* hexagon: improvements to ssm-conv hvx kernel

* hexagon: added dma to ssm-conv hvx kernel

* hexagon: ssm-conv dynamically compute gather scratchpad

* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
)

* Autoparser - full single commit squish

* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
* Add memsets and other fixes for IQ quants

* Make memset unconditional, change Laux back to L

* Move another memset
* Allow reshuffled arguments in tagged argument parser format tool calls.

* Remove shuffle just keep the optional parsers in any order

* Remove unnecessary import
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing

* Whitespace

* Remove redundant atomics
* ggml: add GATED_DELTA_NET op

* remove the transpose

* add KDA

* add qwen35 dense

* llama : check for fused gated delta net backend support

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@jan-service-account jan-service-account merged commit f53f2fb into dev Mar 8, 2026
1 check passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-03-08-00-48 branch March 8, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants