-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Supporting Intel AMX instructions in quantized GEMM #14042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| }; | ||
|
|
||
| constexpr size_t MLAS_GEMM_U8S8_KERNEL_AMX::PackedK; | ||
| constexpr MLAS_GEMM_QUANT_STRIDES MLAS_GEMM_U8S8_KERNEL_AMX::Strides; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this pattern happens a lot in old MLAS. let me get another PR for this.
90e2141 to
20d013f
Compare
yufenglee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Description
Using Intel AMX int8 instructions to accelerate quantized GEMM
Motivation and Context
AMX instructions accelerate quantized GEMM significantly:
Prepacked B perf numbers (latency in ns)