sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path#22152
sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path#22152ggerganov merged 4 commits intoggml-org:masterfrom
Conversation
Signed-off-by: Chun Tao <chun.tao@intel.com>
|
Hi @aicss-genai, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
|
Could you share the test data for the reorder of Q5_K and optimize for Q8_0? Thank you! |
|
Test data captured in table below. Baseline commit is deab41e.
Full list of models tested: DeepSeek-R1-Qwen-32B-Q4, Gemma-2-9B, Llama-3.1-8B-Q8, Llama-3.2-3B, Mistral-Nemo-12B, Mistral-Small-24B, Phi-3.5-mini-3.8B, Qwen2.5-14B-Q4, Qwen2.5-14B-Q8, Qwen2.5-32B-Q4, Qwen2.5-32B-Q6, Qwen2.5-7B, Qwen3.5-9B-Q4, Qwen3-8B. No positive or negative change on other models.
|
arthw
left a comment
There was a problem hiding this comment.
It's good job!
We have finally addressed our shortcomings of Q5 data type.

Overview
Authors
Extends the reorder-quantized codepath to Q5_K (new) and adds a reorder
MMVQ kernel for Q8_0.
block_q_t<GGML_TYPE_Q5_K>specialization with layout[qs (QK_K/2 per block)] [qh (QK_K/8 per block)] [scales] [dm]and matchingget_block_offset/get_d_offset.reorder_qw_q5_k(weight reorder),reorder_mul_mat_vec_q5_k_q8_1_sycl(MMVQ kernel),dequantize_row_q5_K_sycl_reorderand the reorder variant ofdequantize_block_q5_K.ggml_sycl_supports_reorder_mul_mat_sycl,ggml_sycl_supports_reorder_mmvq, and thereorder_qwdispatch.reorder_mul_mat_vec_q8_0_q8_1_sycland inlinesreorder_vec_dot_q_sycl<Q8_0>::operator()(removes the smallvec_dot_q8_0_q8_1_implhelper).dequantize_q8_0_reorderanddequantize_block_q8_0_reorderhelpers used by the Q8_0 reorder MMVQ path.Uses the existing
g_ggml_sycl_use_async_mem_opflag (default off in master); no dependency on #22066's async-toggle change.Additional information
Split from #22066 per reviewer request for independent review.
Requirements