[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes by PMZFX · Pull Request #22035 · ggml-org/llama.cpp

PMZFX · 2026-04-17T08:35:09Z

Summary

Fixes #22020. The four SYCL reorder mul_mat_vec_q dispatchers (Q4_0, Q8_0, Q4_K, Q6_K) asserted that block_num_y was a multiple of 16 subgroups. Any model whose vocab size is not divisible by 16 aborted on load when the output projection hit the assert. The original report was HY-MT 1.5 1.8B (vocab 120818) on an Arc B570.

Fix

I replaced the hard assert with launch-grid padding. block_num_y now rounds up to a whole number of subgroup-sized workgroups, and the kernel's existing if (row >= nrows) return; guard skips the padded rows. The row value is uniform across a subgroup (it does not depend on get_local_linear_id), so sycl::reduce_over_group stays safe.

For aligned-vocab models, ceil_div(nrows, 16) * 16 == nrows, so block_num_y is unchanged and the kernel launch is identical to the pre-patch code.

Diff is 8 insertions and 8 deletions across four sites in ggml/src/ggml-sycl/mmvq.cpp. No other files touched.

Tests

Hardware: Intel Arc Pro B70 (Xe2 / bmg_g21), oneAPI 2025.3.

Crash repro (pre-patch): HY-MT-1.5-1.8B-Q8_0 aborts at mmvq.cpp:687 during warmup, matching the report.
Post-patch: HY-MT loads and generates coherent output on all four affected quants (Q8_0, Q6_K, Q4_K_M, and a Q4_0 I requantized locally). Multi-turn with five sequential prompts at temp 0 produces sensible responses with no second-prompt corruption.
test-backend-ops full sweep: 7736/7736 pass on both B70s (one IQ1_S flake on one GPU that passed on rerun; the diff does not touch any IQ path).
Aligned-vocab correctness A/B (pre-patch vs post-patch binary) on Qwen2.5-14B-Q8_0 at temp 0 seed 42: identical token output, identical tg rate.
Regression bench on Qwen2.5-14B (Q8_0, Q4_K_M) and Qwen3.5-27B (Q6_K, Q4_0), pp512 and tg128, all deltas within ±2% run-to-run noise:

Model	Metric	Pre-patch	Post-patch
Qwen2.5-14B Q8_0	pp512 / tg128	1601.34 / 28.48	1584.77 / 28.44
Qwen2.5-14B Q4_K_M	pp512 / tg128	1435.91 / 44.13	1419.45 / 43.42
Qwen3.5-27B Q6_K	pp512 / tg128	800.74 / 15.15	801.53 / 15.11
Qwen3.5-27B Q4_0	pp512 / tg128	672 / 24.52	672 / 24.52

Thanks to @arthw for pointing out the link to #21527 on the issue thread.

AI assisted coding, tested on Intel B70 hardware.

@arthw

The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K asserted that block_num_y was a multiple of 16 subgroups. Models with a vocab size not divisible by 16 (for example HY-MT at 120818) aborted on model load when the output projection tripped the assert. I replaced the assert with padding: block_num_y now rounds up to a whole number of subgroup-sized workgroups. The kernel already has the row bounds check (`if (row >= nrows) return;`) so the extra padded threads early-exit cleanly. Row values are uniform across a subgroup so the collective reduce stays safe. For aligned vocab sizes the padded block_num_y equals the old value, so the kernel launch is identical and there is no regression. Thanks to @arthw for flagging the relationship to ggml-org#21527. Fixes ggml-org#22020. AI assisted coding, tested on Intel B70 hardware.

NeoZhangJianyu

Thank you for the quick response!

NeoZhangJianyu · 2026-04-17T08:41:21Z

    GGML_ASSERT(ncols % QK4_0 == 0);
-    const int        block_num_y   = ceil_div(nrows, GGML_SYCL_MMV_Y);
+    // Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel.
    constexpr size_t num_subgroups = 16;


16 should be replaced by WARP_SIZE.
There are more 16 should be replaced

arthw

It's good job!

Thank you!

@NeoZhangJianyu

Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel target where WARP_SIZE is 16, but makes the relationship to subgroup size explicit. Per review by @NeoZhangJianyu on ggml-org#22035. Assisted by Claude.

PMZFX · 2026-04-17T09:10:51Z

Thanks @NeoZhangJianyu — applied across all 4 reorder sites (Q4_0, Q8_0, Q4_K, Q6_K) in 3eab160. Tested clean on B70.

arthw · 2026-04-20T04:45:54Z

@ggerganov
Please review this PR!
The CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) is always fault after retry several times.
The fault has nothing with this PR.

Thank you!

ggerganov · 2026-04-20T07:22:13Z

@arthw Whenever you have reviewed and approved a PR, and you are waiting for a second approval, please add the "merge ready" label to it. This is better than pinging me as it is easier to keep track of what is ready to for merging.

PMZFX requested a review from a team as a code owner April 17, 2026 08:35

PMZFX mentioned this pull request Apr 17, 2026

Eval bug: SYCL backend crashes trying to load hunyuan-dense models #22020

Closed

NeoZhangJianyu reviewed Apr 17, 2026

View reviewed changes

arthw approved these changes Apr 17, 2026

View reviewed changes

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 17, 2026

ggerganov merged commit 788fcbc into ggml-org:master Apr 20, 2026
207 of 221 checks passed

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes#22035

[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes#22035
ggerganov merged 2 commits intoggml-org:masterfrom
PMZFX:fix/reorder-mmvq-row-padding

PMZFX commented Apr 17, 2026

Uh oh!

NeoZhangJianyu left a comment

Uh oh!

NeoZhangJianyu Apr 17, 2026

Uh oh!

arthw left a comment

Uh oh!

PMZFX commented Apr 17, 2026

Uh oh!

arthw commented Apr 20, 2026

Uh oh!

Uh oh!

ggerganov commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

PMZFX commented Apr 17, 2026

Summary

Fix

Tests

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

arthw left a comment

Choose a reason for hiding this comment

Uh oh!

PMZFX commented Apr 17, 2026

Uh oh!

arthw commented Apr 20, 2026

Uh oh!

Uh oh!

ggerganov commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants