Add M-tile loop with dispatch capping for Intel Xe2/3-LPG#28250
Merged
Conversation
Contributor
jchen10
commented
Apr 28, 2026
- Wrap 8x16x16 MatMulNBits(SubgroupMatrix) kernel body in M-tile loop using uniforms.m_tiles_per_wg for tile assignment per workgroup
- Cap dispatch_y on Xe2/3-LPG when M > 2k, with occupancy factor 16x
- Non-Intel or small-M paths pass m_tiles_per_wg=1 (no behavior change)
Contributor
Author
- Wrap 8x16x16 MatMulNBits(SubgroupMatrix) kernel body in M-tile loop using uniforms.m_tiles_per_wg for tile assignment per workgroup - Cap dispatch_y on Xe2/3-LPG when M > 2k, with occupancy factor 16x - Non-Intel or small-M paths pass m_tiles_per_wg=1 (no behavior change)
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the WebGPU SubgroupMatrix MatMulNBits 8x16x16 path to reduce dispatch overhead on large-M Intel Xe2/Xe3-LPG devices by having each workgroup process multiple M-tiles sequentially, driven by a new m_tiles_per_wg uniform and a capped dispatch_y.
Changes:
- Wrap the 8x16x16 WGSL kernel body in an outer M-tile loop controlled by
uniforms.m_tiles_per_wg. - Add
m_tiles_per_wgto the program’s uniform interface and pass it from the CPU side. - Cap
dispatch_yfor largeMon Intel Xe2/Xe3-LPG and derivem_tiles_per_wgaccordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits_8x16x16.wgsl.template | Adds an outer M-tile loop and resets accumulators per tile using m_tiles_per_wg. |
| onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.h | Extends the uniform variable list with m_tiles_per_wg. |
| onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.cc | Computes capped dispatch_y on Intel Xe2/Xe3-LPG and passes m_tiles_per_wg to the shader. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
guschmue
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.