Skip to content

Sync master with upstream release b8580#470

Merged
jan-service-account merged 5 commits intodevfrom
update-dev-from-master-2026-03-30-00-54
Mar 30, 2026
Merged

Sync master with upstream release b8580#470
jan-service-account merged 5 commits intodevfrom
update-dev-from-master-2026-03-30-00-54

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b8580) from ggml-org/llama.cpp

arthw and others added 5 commits March 29, 2026 09:02
…ggml-org#21093)

* use half cores to build, avoid OS hang

* reduce the output text num to short test time

* avoid to return 0
* hex-fa: add simple dma cache for Mask

I noticed that we were refetch the mask rows over and over.
This simple cache avoids that.

* hex-dma: unset in-order desc bit which caused signficant perf regression

We don't rely on true in order processing of the DMA descriptors anywhere.
Turns out this mode caused significant regression of around 3-4 TPS during token gen.

* hex-rope: update comment to clarify that we don't need in-order DMA completions
* Optimize MOE GEMV kernel for BS > 1.

The previous MOE kernel for BS > 1 had too many thread blocks (nrows_x, nchannels_dst, ncols_dst), with very little work per block. block of (32, 4) was doing inner dot product for a single row.

New mul_mat_vec_q_moe kernel is dedicated for MoE multi-token kernel with grid (ceil(nrows_x/rpb), nchannels_dst), block (warp_size, ncols_dst). Each warp handles two rows independently with warp-level reduction only (no shared memory sync).

This change doesn't increase any compilation time as a single template instance is needed per type. This also simplifies the original GEMV kernel and gets rid of `is_multi_token_id` specialization.

* Remove em-dashes

* Cherry-pick changes from @am17an PR ggml-org#20885 to enable small_k optimization only for cases where it benefits

Increase max batch size for MMVQ kernels for MUL_MAT_ID to 8

* Make the max batch size for MOE GEMV kernel configurable based on GPU arch and datatype

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
@jan-service-account jan-service-account merged commit 03eccdc into dev Mar 30, 2026
3 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-03-30-00-54 branch March 30, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants