sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path by aicss-genai · Pull Request #22152 · ggml-org/llama.cpp

aicss-genai · 2026-04-20T07:04:28Z

Overview

Authors

Extends the reorder-quantized codepath to Q5_K (new) and adds a reorder
MMVQ kernel for Q8_0.

Adds block_q_t<GGML_TYPE_Q5_K> specialization with layout [qs (QK_K/2 per block)] [qh (QK_K/8 per block)] [scales] [dm] and matching get_block_offset / get_d_offset.
Adds reorder_qw_q5_k (weight reorder), reorder_mul_mat_vec_q5_k_q8_1_sycl (MMVQ kernel), dequantize_row_q5_K_sycl_reorder and the reorder variant of dequantize_block_q5_K.
Wires Q5_K into ggml_sycl_supports_reorder_mul_mat_sycl, ggml_sycl_supports_reorder_mmvq, and the reorder_qw dispatch.
Adds reorder_mul_mat_vec_q8_0_q8_1_sycl and inlines reorder_vec_dot_q_sycl<Q8_0>::operator() (removes the small vec_dot_q8_0_q8_1_impl helper).
Adds dequantize_q8_0_reorder and dequantize_block_q8_0_reorder helpers used by the Q8_0 reorder MMVQ path.

Uses the existing g_ggml_sycl_use_async_mem_op flag (default off in master); no dependency on #22066's async-toggle change.

Additional information

Split from #22066 per reviewer request for independent review.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes. This work was partially produced with an agentic engineering approach: agents surface issues and explore experiments while engineers identify and reject candidates using domain knowledge. Human feedback involved.

Signed-off-by: Chun Tao <chun.tao@intel.com>

ggml-gh-bot · 2026-04-20T07:08:44Z

Hi @aicss-genai, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 7 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

NeoZhangJianyu · 2026-04-22T13:44:15Z

Could you share the test data for the reorder of Q5_K and optimize for Q8_0?

Thank you!

…bmg-upstream-pr-5

malsbat · 2026-05-07T23:03:25Z

Test data captured in table below. Baseline commit is deab41e.

Model	Task	Tokens	baseline (tok/s)	pr-5 (tok/s)	Speedup
Phi-3.5-mini-3.8B	pp	512	1810.30 ±25.09	1834.59 ±19.02	1.01x
Phi-3.5-mini-3.8B	pp	1024	1823.18 ±1.56	1841.92 ±0.40	1.01x
Phi-3.5-mini-3.8B	pp	2048	1671.65 ±0.77	1689.04 ±3.16	1.01x
Phi-3.5-mini-3.8B	pp	4096	1355.25 ±0.34	1369.27 ±1.75	1.01x
Phi-3.5-mini-3.8B	pp	8192	983.48 ±0.28	990.10 ±2.14	1.01x
Phi-3.5-mini-3.8B	tg	128	111.80 ±0.24	134.41 ±0.28	1.20x
Phi-3.5-mini-3.8B	tg	256	111.17 ±0.48	133.59 ±0.77	1.20x
Phi-3.5-mini-3.8B	tg	512	108.54 ±0.46	129.61 ±0.69	1.19x
Phi-3.5-mini-3.8B	tg	1024	104.17 ±0.40	123.26 ±0.60	1.18x
Qwen3.5-9B-Q4	pp	512	990.97 ±8.01	991.70 ±10.11	1.00x
Qwen3.5-9B-Q4	pp	1024	1030.35 ±0.17	1031.83 ±0.19	1.00x
Qwen3.5-9B-Q4	pp	2048	1046.67 ±1.55	1051.00 ±0.36	1.00x
Qwen3.5-9B-Q4	pp	4096	1004.57 ±0.52	1010.68 ±0.49	1.01x
Qwen3.5-9B-Q4	pp	8192	934.43 ±0.63	937.02 ±0.22	1.00x
Qwen3.5-9B-Q4	tg	128	57.92 ±0.22	64.38 ±0.13	1.11x
Qwen3.5-9B-Q4	tg	256	57.75 ±0.08	64.31 ±0.03	1.11x
Qwen3.5-9B-Q4	tg	512	57.56 ±0.07	64.32 ±0.01	1.12x
Qwen3.5-9B-Q4	tg	1024	57.46 ±0.03	64.11 ±0.10	1.12x

Full list of models tested: DeepSeek-R1-Qwen-32B-Q4, Gemma-2-9B, Llama-3.1-8B-Q8, Llama-3.2-3B, Mistral-Nemo-12B, Mistral-Small-24B, Phi-3.5-mini-3.8B, Qwen2.5-14B-Q4, Qwen2.5-14B-Q8, Qwen2.5-32B-Q4, Qwen2.5-32B-Q6, Qwen2.5-7B, Qwen3.5-9B-Q4, Qwen3-8B. No positive or negative change on other models.

arthw

It's good job!
We have finally addressed our shortcomings of Q5 data type.

) * sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path Signed-off-by: Chun Tao <chun.tao@intel.com> * Remove duplicate definitions --------- Signed-off-by: Chun Tao <chun.tao@intel.com> Co-authored-by: Chun Tao <chun.tao@intel.com> Co-authored-by: Todd Malsbary <todd.malsbary@intel.com>

ctao456 and others added 2 commits April 19, 2026 23:37

sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path

ac54489

Signed-off-by: Chun Tao <chun.tao@intel.com>

Merge branch 'ggml-org:master' into aicss-genai/sycl-bmg-upstream-pr-5

ce43550

aicss-genai requested a review from a team as a code owner April 20, 2026 07:04

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 20, 2026

aicss-genai and others added 2 commits April 27, 2026 22:06

Remove duplicate definitions

0b4007f

Merge remote-tracking branch 'upstream/master' into aicss-genai/sycl-…

4af9e04

…bmg-upstream-pr-5

arthw approved these changes May 8, 2026

View reviewed changes

arthw added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label May 9, 2026

ggerganov merged commit 6048993 into ggml-org:master May 9, 2026
80 of 81 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path#22152

sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path#22152
ggerganov merged 4 commits intoggml-org:masterfrom
aicss-genai:aicss-genai/sycl-bmg-upstream-pr-5

aicss-genai commented Apr 20, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented Apr 20, 2026

Uh oh!

NeoZhangJianyu commented Apr 22, 2026

Uh oh!

malsbat commented May 7, 2026

Uh oh!

arthw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

aicss-genai commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Authors

Additional information

Requirements

Uh oh!

ggml-gh-bot Bot commented Apr 20, 2026

Uh oh!

NeoZhangJianyu commented Apr 22, 2026

Uh oh!

malsbat commented May 7, 2026

Uh oh!

arthw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

aicss-genai commented Apr 20, 2026 •

edited

Loading