Sync master with upstream release b9082 by jan-service-account · Pull Request #510 · janhq/llama.cpp

jan-service-account · 2026-05-09T01:03:42Z

Updates dev branch with latest release (b9082) from ggml-org/llama.cpp

…rg#22818) * convert : fix RuntimeError when stripping FP8 KV-cache scales In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates self.model_tensors.keys() and calls del on the same dict, which raises RuntimeError: dictionary changed size during iteration when a ModelOpt NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4 and any modelopt config with kv_cache_quant_algo: FP8). Wrap the keys view in list() so the deletions happen on a snapshot. * re-add another accidentally removed list --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Q4_0 MoE CLC pass sanity check * release program * opencl: fix whitespace * opencl: remove unused cl_program * opencl: break #if block to make it more clear * opencl: adjust format --------- Co-authored-by: Li He <lih@qti.qualcomm.com>

…ture (ggml-org#22803) * refactor: Settings keys as constant object keys * chore: Run `npm audit fix` * refactor: Settings Sections UI * feat: Refactor Settings structure and implement import/export logic * feat: Introduce ROUTES constant and RouterService * refactor: Consolidate settings definitions into registry * refactor: Update settings page routing structure * chore: Migrate hardcoded URLs to use ROUTES and RouterService * feat: Enhance model selection logic for settings and chat * chore: Update webui static build * refactor: Address PR review comments * fix: Remove unneeded setting * fix: Re-add missing settings * fix: Add missing `/slots` proxy for webui dev mode * chore: Dev-mode logs * fix: Data binding * fix: Steering for non-agentic flow

@am17an

* cuda: fuse snake activation (mul, sin, sqr, mul, add) Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel. Add test_snake_fuse comparing CPU naive vs CUDA fused across F32 / F16 / BF16. * cuda: address review feedback from @am17an Use ggml_cuda_cast for F32/F16/BF16 conversions and rename kernel_snake to snake_kernel to match upstream conventions. * cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an * Update tests/test-backend-ops.cpp Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cuda: snake fusion check add->type matches x->type Address review feedback from @am17an * cuda: snake fusion check add->type matches x->type Moved for readability (equivalent) Address review feedback from @am17an --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

…ml-org#22683) * server: (router) expose child model info from router's /v1/models * update docs

* server: support Vertex AI compatible API * a bit safer * support other AIP_* env var * various fixes * if AIP_MODE is unset, do nothing * fix test case * fix windows build

…2840)

* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

…gml-org#22827)

* L2_NORM Updates * Addressed PR Comments * ggml-hexagon: add L2_NORM HVX kernel for Hexagon backend * hex-unary: remove supported_unary_nc since the outer loop is the same for all unary ops --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>

samuraieng and others added 16 commits May 7, 2026 23:10

model: Support sarashina2.2-vision-3b model (ggml-org#22103)

44dbe8c

fix script error (#22795sycl : )

6a2a251

ggml: update SCHED_DEBUG output to use ggml_op_desc() (ggml-org#22825)

3e941b8

vulkan: fix spv shadowing (ggml-org#22760)

6d57a49

CUDA: lower-case PCI bus id, standardize for ggml (ggml-org#22820)

a8fd165

server: (router) expose child model info from router's /v1/models (gg…

9dcf835

…ml-org#22683) * server: (router) expose child model info from router's /v1/models * update docs

server: support Vertex AI compatible API (ggml-org#22545)

29debb3

* server: support Vertex AI compatible API * a bit safer * support other AIP_* env var * various fixes * if AIP_MODE is unset, do nothing * fix test case * fix windows build

webui: fix LLM title generation for agentic conversations (ggml-org#2…

5d6f18a

…2840)

common : revert reasoning budget +inf logit bias (ggml-org#22740)

f9cd456

common : do not wrap raw strings in schema parser for tagged parsers (g…

4995604

…gml-org#22827)

jan-service-account merged commit 945a610 into dev May 9, 2026
14 checks passed

jan-service-account deleted the update-dev-from-master-2026-05-09-01-03 branch May 9, 2026 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b9082#510

Sync master with upstream release b9082#510
jan-service-account merged 16 commits into
devfrom
update-dev-from-master-2026-05-09-01-03

jan-service-account commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

jan-service-account commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants