[MLAS][KleidiAI]Catlaw01/sgemm epilogue neon opt#27609
[MLAS][KleidiAI]Catlaw01/sgemm epilogue neon opt#27609hariharans29 merged 4 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree company="Arm" |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Updates the KleidiAI-backed SGEMM post-processing (alpha/beta “epilogue”) in MLAS to fix batched correctness in an early-exit path and to improve ARM NEON performance for contiguous outputs.
Changes:
- Fix batched
alpha == 0 || K == 0fast path to applybetareduction for every batch entry. - Add a contiguous-only NEON-vectorized alpha/beta epilogue path with scalar fallback for small/non-contiguous cases.
- Route contiguous 2D tiles through the 1D contiguous path to reuse vectorization.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
becb89f to
7b594fc
Compare
|
Please includ ethis - #27618 when it goes through eventually |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
Can you please fix the CI issues ? |
Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com
Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com
Fix batched fast-path handling in KleidiAI SGemm by avoiding alpha checks on batch-0 only: - handle K==0 as a per-batch beta-only path - only take alpha==0 fast path when all batch entries have alpha==0 Add non-long SGemm regression coverage for BatchSize>1 with mixed alpha/beta combinations, including a batched K==0 case. Update ApplyAlphaBeta2D comments to match current contiguous-tile control flow. Signed-off-by: Cathal Lawlor <cathal.lawlor@arm.com>
Signed-off-by: Cathal Lawlor <cathal.lawlor@arm.com>
2b591f7 to
f1d1717
Compare
|
Pushed new commits there to fix issues caused by when I merged main into the branch |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Description
This change updates the KleidiAI SGEMM post-processing path in onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp with two parts:
Motivation and Context
This change addresses correctness and performance in the SGEMM post-processing stage: