Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BesTLA] Add u8s8 kernel for AVX512BW #265

Merged
merged 18 commits into from
May 24, 2024
Merged

[BesTLA] Add u8s8 kernel for AVX512BW #265

merged 18 commits into from
May 24, 2024

Conversation

luoyu-intel
Copy link
Contributor

Type of Change

Add AVX512BW version of u8s8 GEMM and u8s8 GEMV.
Bring Skylake and Cannonlake to our optimized list.

@luoyu-intel
Copy link
Contributor Author

luoyu-intel commented May 24, 2024

Mistral-7B on Skylake server CPU:
AVX512F compute_dtype=fp32

model_print_timings: prompt eval time =  6530.54 ms /  1184 tokens (    5.52 ms per token)
model_print_timings:        eval time =   620.60 ms /    15 runs   (   41.37 ms per token)
model_print_timings:       total time =  7167.66 ms
========== eval time log of each prediction ==========
prediction   0, time: 6530.54ms
prediction   1, time: 43.94ms
prediction   2, time: 41.62ms
prediction   3, time: 41.30ms
prediction   4, time: 41.10ms

AVX512BW, compute_dtype=int8:

model_print_timings: prompt eval time =  4492.60 ms /  1184 tokens (    3.79 ms per token)
model_print_timings:        eval time =   614.60 ms /    15 runs   (   40.97 ms per token)
model_print_timings:       total time =  5121.86 ms
========== eval time log of each prediction ==========
prediction   0, time: 4492.60ms
prediction   1, time: 41.53ms
prediction   2, time: 41.03ms
prediction   3, time: 40.90ms
prediction   4, time: 41.01ms

@luoyu-intel
Copy link
Contributor Author

From bestla_benchmark:
sgemm FLOPs: 2.7T
igemm FLOPs: 4.5T
int4 group=128 igemm FLOPs: 4.2T

@luoyu-intel luoyu-intel merged commit 4e42e39 into main May 24, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants