Skip to content

Conversation

@chenfucn
Copy link
Contributor

Description

Using Intel AMX int8 instructions to accelerate quantized GEMM

Motivation and Context

AMX instructions accelerate quantized GEMM significantly:

Prepacked B perf numbers (latency in ns)

GEMM Config AVX512Vnni AMX
M:384/N:1024/K:1024/Batch:1/Threads:4 1057511 285393
M:384/N:1024/K:3072/Batch:1/Threads:4 2643929 700397
M:384/N:1024/K:4096/Batch:1/Threads:4 3784750 890701
M:384/N:4096/K:1024/Batch:1/Threads:4 2378139 887251
M:384/N:1024/K:1024/Batch:1/Threads:16 307137 138481
M:384/N:1024/K:3072/Batch:1/Threads:16 855730 295027
M:384/N:1024/K:4096/Batch:1/Threads:16 1126878 317395
M:384/N:4096/K:1024/Batch:1/Threads:16 781963 237014
M:1536/N:1024/K:1024/Batch:1/Threads:16 538864 181459
M:1536/N:1024/K:3072/Batch:1/Threads:16 1681002 561600
M:1536/N:1024/K:4096/Batch:1/Threads:16 2158127 717470
M:1536/N:4096/K:1024/Batch:1/Threads:16 2428622 896140
M:3072/N:1024/K:1024/Batch:1/Threads:16 1058029 357031
M:3072/N:1024/K:3072/Batch:1/Threads:16 3138504 1095857
M:3072/N:1024/K:4096/Batch:1/Threads:16 4155640 1386183
M:3072/N:4096/K:1024/Batch:1/Threads:16 4679030 1778624

@chenfucn chenfucn requested a review from a team as a code owner December 21, 2022 18:37
jchen351
jchen351 previously approved these changes Dec 21, 2022
};

constexpr size_t MLAS_GEMM_U8S8_KERNEL_AMX::PackedK;
constexpr MLAS_GEMM_QUANT_STRIDES MLAS_GEMM_U8S8_KERNEL_AMX::Strides;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this pattern happens a lot in old MLAS. let me get another PR for this.

jchen351
jchen351 previously approved these changes Dec 29, 2022
Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants