[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes#22035
Merged
ggerganov merged 2 commits intoggml-org:masterfrom Apr 20, 2026
Merged
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes#22035ggerganov merged 2 commits intoggml-org:masterfrom
ggerganov merged 2 commits intoggml-org:masterfrom
Conversation
The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K asserted that block_num_y was a multiple of 16 subgroups. Models with a vocab size not divisible by 16 (for example HY-MT at 120818) aborted on model load when the output projection tripped the assert. I replaced the assert with padding: block_num_y now rounds up to a whole number of subgroup-sized workgroups. The kernel already has the row bounds check (`if (row >= nrows) return;`) so the extra padded threads early-exit cleanly. Row values are uniform across a subgroup so the collective reduce stays safe. For aligned vocab sizes the padded block_num_y equals the old value, so the kernel launch is identical and there is no regression. Thanks to @arthw for flagging the relationship to ggml-org#21527. Fixes ggml-org#22020. AI assisted coding, tested on Intel B70 hardware.
Contributor
NeoZhangJianyu
left a comment
There was a problem hiding this comment.
Thank you for the quick response!
| GGML_ASSERT(ncols % QK4_0 == 0); | ||
| const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y); | ||
| // Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel. | ||
| constexpr size_t num_subgroups = 16; |
Contributor
There was a problem hiding this comment.
16 should be replaced by WARP_SIZE.
There are more 16 should be replaced
Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel target where WARP_SIZE is 16, but makes the relationship to subgroup size explicit. Per review by @NeoZhangJianyu on ggml-org#22035. Assisted by Claude.
Contributor
Author
|
Thanks @NeoZhangJianyu — applied across all 4 reorder sites (Q4_0, Q8_0, Q4_K, Q6_K) in 3eab160. Tested clean on B70. |
arthw
approved these changes
Apr 17, 2026
Contributor
|
@ggerganov Thank you! |
Member
|
@arthw Whenever you have reviewed and approved a PR, and you are waiting for a second approval, please add the "merge ready" label to it. This is better than pinging me as it is easier to keep track of what is ready to for merging. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #22020. The four SYCL reorder mul_mat_vec_q dispatchers (Q4_0, Q8_0, Q4_K, Q6_K) asserted that block_num_y was a multiple of 16 subgroups. Any model whose vocab size is not divisible by 16 aborted on load when the output projection hit the assert. The original report was HY-MT 1.5 1.8B (vocab 120818) on an Arc B570.
Fix
I replaced the hard assert with launch-grid padding. block_num_y now rounds up to a whole number of subgroup-sized workgroups, and the kernel's existing
if (row >= nrows) return;guard skips the padded rows. The row value is uniform across a subgroup (it does not depend onget_local_linear_id), sosycl::reduce_over_groupstays safe.For aligned-vocab models,
ceil_div(nrows, 16) * 16 == nrows, so block_num_y is unchanged and the kernel launch is identical to the pre-patch code.Diff is 8 insertions and 8 deletions across four sites in
ggml/src/ggml-sycl/mmvq.cpp. No other files touched.Tests
Hardware: Intel Arc Pro B70 (Xe2 / bmg_g21), oneAPI 2025.3.
mmvq.cpp:687during warmup, matching the report.Thanks to @arthw for pointing out the link to #21527 on the issue thread.
AI assisted coding, tested on Intel B70 hardware.