Skip to content

Remove gidx input from MatMulNBits graph surgery#2278

Merged
xiaoyu-work merged 1 commit intomicrosoft:mainfrom
CodeLinaro:mlperf_llms
Dec 18, 2025
Merged

Remove gidx input from MatMulNBits graph surgery#2278
xiaoyu-work merged 1 commit intomicrosoft:mainfrom
CodeLinaro:mlperf_llms

Conversation

@rM-planet
Copy link
Contributor

Describe your changes

Adding a graph surgery which will remove group index input from the MatMulNBit nodes, only if the group indexes are sorted.

##Motivation:
MLAS matmulnbits kernel expects g_idx to not be passed if the node was quantized with default column-wise grouping for block-wise quantization. If g_idx is passed, it runs the unpacked compute kernel i.e dequantizes everything to fp32 and triggers a floating point matmul which significantly degrades the runtime performance.

##Impact:
Significant performance improvement for phi4 14b model without accuracy drop.

@rM-planet rM-planet force-pushed the mlperf_llms branch 2 times, most recently from d63f16a to 60ee7a4 Compare December 9, 2025 21:36
@rM-planet
Copy link
Contributor Author

@devang-ml @jambayk @gtonpe Requesting you to please review the change.

@devang-ml
Copy link
Collaborator

Could you please add a unit test? Thanks!

@gtonpe
Copy link
Contributor

gtonpe commented Dec 16, 2025

When are the pending Olive CI tests expected to complete?

@xiaoyu-work xiaoyu-work merged commit 7c5b3b8 into microsoft:main Dec 18, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants