Skip to content

Sync master with upstream release b9082#510

Merged
jan-service-account merged 16 commits into
devfrom
update-dev-from-master-2026-05-09-01-03
May 9, 2026
Merged

Sync master with upstream release b9082#510
jan-service-account merged 16 commits into
devfrom
update-dev-from-master-2026-05-09-01-03

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b9082) from ggml-org/llama.cpp

samuraieng and others added 16 commits May 7, 2026 23:10
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Q4_0 MoE CLC pass sanity check

* release program

* opencl: fix whitespace

* opencl: remove unused cl_program

* opencl: break #if block to make it more clear

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
…ture (ggml-org#22803)

* refactor: Settings keys as constant object keys

* chore: Run `npm audit fix`

* refactor: Settings Sections UI

* feat: Refactor Settings structure and implement import/export logic

* feat: Introduce ROUTES constant and RouterService

* refactor: Consolidate settings definitions into registry

* refactor: Update settings page routing structure

* chore: Migrate hardcoded URLs to use ROUTES and RouterService

* feat: Enhance model selection logic for settings and chat

* chore: Update webui static build

* refactor: Address PR review comments

* fix: Remove unneeded setting

* fix: Re-add missing settings

* fix: Add missing `/slots` proxy for webui dev mode

* chore: Dev-mode logs

* fix: Data binding

* fix: Steering for non-agentic flow
* cuda: fuse snake activation (mul, sin, sqr, mul, add)

Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The
matcher recognizes the naive 5 op decomposition emitted by audio
decoders (BigVGAN, Vocos) for snake activation
y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise
kernel.

Add test_snake_fuse comparing CPU naive vs CUDA fused across
F32 / F16 / BF16.

* cuda: address review feedback from @am17an

Use ggml_cuda_cast for F32/F16/BF16 conversions and rename
kernel_snake to snake_kernel to match upstream conventions.

* cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an

* Update tests/test-backend-ops.cpp

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* cuda: snake fusion check add->type matches x->type

Address review feedback from @am17an

* cuda: snake fusion check add->type matches x->type

Moved for readability (equivalent)
Address review feedback from @am17an

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
…ml-org#22683)

* server: (router) expose child model info from router's /v1/models

* update docs
* server: support Vertex AI compatible API

* a bit safer

* support other AIP_* env var

* various fixes

* if AIP_MODE is unset, do nothing

* fix test case

* fix windows build
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes

Signed-off-by: ynankani <ynankani@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Address review comments

Signed-off-by: ynankani <ynankani@nvidia.com>

* fix CRLF

Signed-off-by: ynankani <ynankani@nvidia.com>

* Lint error fix

Signed-off-by: ynankani <ynankani@nvidia.com>

---------

Signed-off-by: ynankani <ynankani@nvidia.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* L2_NORM Updates

* Addressed PR Comments

* ggml-hexagon: add L2_NORM HVX kernel for Hexagon backend

* hex-unary: remove supported_unary_nc since the outer loop is the same for all unary ops

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
@jan-service-account jan-service-account merged commit 945a610 into dev May 9, 2026
14 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-05-09-01-03 branch May 9, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.