-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Feat]: Add Multithreading support for kleidiai groupwise GEMM kernels #144074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144074
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 85a3252 with merge base 0431d47 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com> Change-Id: I2cb782e8a8414adbee6bfe317bee5bb040f4f982
b8f0a14
to
1dfb724
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, left some nit comments. Thanks.
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com> Change-Id: I307c21fbe0fad0dd9793f39cef55167d553c091b
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Mitigation for #145273 Reverting #134124 and #144074 Pull Request resolved: #145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai
KleidiAI Groupwise GEMM Kernel was not 2D Blocked. This change adds supports for 2D blocking of GEMM kernel to efficiently split workload & speedup GEMM kernel over multiple threads.
Performance improvements:
7B model Pre-fill speedup from 145 t/s to 175 t/s
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @BoyuanFeng