-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metal : add Q8_0 support #2763
metal : add Q8_0 support #2763
Conversation
@lshzh-ww I'll probably try to also add |
I do have a template for all matrix-vector multiplication kernels. However, it requires careful tuning of the So, if you feel that it's urgent, please go ahead. However, it may not be worth spending too much time optimizing the kernel. Alternatively, we can wait a few more days to provide support for |
No rush - will wait for the new kernels then. Thanks! |
* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel
is it just for m1/m2 only? not amd gpu? because i cant run with gpu with my amd card on macos (it support mps, btw) |
* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel
close #2508
Add
Q8_0
support for MetalI haven't tested if this is the most optimal way to implement it regarding the
mat x vec
kernel, so there might be room for optimizations in the future.M2 Ultra
build: 1202e06 (1049)