metal : add Q8_0 support #2763

ggerganov · 2023-08-24T10:42:01Z

Add Q8_0 support for Metal

I haven't tested if this is the most optimal way to implement it regarding the mat x vec kernel, so there might be room for optimizations in the future.

M2 Ultra

model	backend	n_gpu_layers	test	t/s
LLaMA v2 7B mostly F16	Metal	1	pp 512	664.05 ± 0.22
LLaMA v2 7B mostly Q4_0	Metal	1	pp 512	632.16 ± 0.47
LLaMA v2 7B mostly Q4_1	Metal	1	pp 512	634.40 ± 0.40
LLaMA v2 7B mostly Q8_0	Metal	1	pp 512	630.26 ± 0.11
LLaMA v2 7B mostly Q2_K	Metal	1	pp 512	580.58 ± 0.22
LLaMA v2 7B mostly Q3_K - Medium	Metal	1	pp 512	580.74 ± 0.26
LLaMA v2 7B mostly Q4_K - Medium	Metal	1	pp 512	587.62 ± 0.19
LLaMA v2 7B mostly Q5_K - Medium	Metal	1	pp 512	560.96 ± 0.15
LLaMA v2 7B mostly Q6_K	Metal	1	pp 512	561.99 ± 0.15
LLaMA v2 7B mostly F16	Metal	1	tg 128	29.38 ± 0.11
LLaMA v2 7B mostly Q4_0	Metal	1	tg 128	86.17 ± 0.05
LLaMA v2 7B mostly Q4_1	Metal	1	tg 128	81.30 ± 0.08
LLaMA v2 7B mostly Q8_0	Metal	1	tg 128	61.16 ± 0.05
LLaMA v2 7B mostly Q2_K	Metal	1	tg 128	74.89 ± 0.05
LLaMA v2 7B mostly Q3_K - Medium	Metal	1	tg 128	76.22 ± 0.06
LLaMA v2 7B mostly Q4_K - Medium	Metal	1	tg 128	79.64 ± 0.08
LLaMA v2 7B mostly Q5_K - Medium	Metal	1	tg 128	68.91 ± 0.04
LLaMA v2 7B mostly Q6_K	Metal	1	tg 128	68.46 ± 0.07

build: 1202e06 (1049)

ggerganov · 2023-08-24T15:19:39Z

@lshzh-ww I'll probably try to also add Q5_0 and Q5_1 later today - just don't want to overlap in case you have started doing it.

lshzh-ww · 2023-08-24T15:42:11Z

I do have a template for all matrix-vector multiplication kernels. However, it requires careful tuning of the dequantize_q_n functions to achieve maximum performance for both matrix-vector multiplication and matrix-matrix multiplication. Currently, I have only finished reimplementing dequantize_q2_k and dequantize_q3_k, so the new template can achieve better or at least similar performance compared to the master branch. I may submit the PR this weekend or early next week.

So, if you feel that it's urgent, please go ahead. However, it may not be worth spending too much time optimizing the kernel. Alternatively, we can wait a few more days to provide support for Q5_0 and Q5_1 for metal.

ggerganov · 2023-08-24T15:57:36Z

No rush - will wait for the new kernels then. Thanks!

* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel

sukualam · 2023-09-04T02:18:29Z

is it just for m1/m2 only? not amd gpu? because i cant run with gpu with my amd card on macos (it support mps, btw)

* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel

ggerganov added 3 commits August 24, 2023 13:40

metal : add dequantize_q8_0 kernel

46a0881

metal : add mul_mat_q8_0_f32 kernel

61c8259

metal : add Q8_0 mul_mm kernel

1202e06

ggerganov marked this pull request as ready for review August 24, 2023 12:51

ggerganov requested a review from lshzh-ww August 24, 2023 12:51

lshzh-ww approved these changes Aug 24, 2023

View reviewed changes

ggerganov merged commit d67777c into master Aug 24, 2023
3 checks passed

ggerganov deleted the metal-add-q8_0 branch August 24, 2023 13:20

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023

metal : add Q8_0 support (ggerganov#2763)

9789cd2

* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel

Sam2much96 pushed a commit to Sam2much96/llama.cpp that referenced this pull request Sep 11, 2023

metal : add Q8_0 support (ggerganov#2763)

96b96cf

* metal : add dequantize_q8_0 kernel * metal : add mul_mat_q8_0_f32 kernel * metal : add Q8_0 mul_mm kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : add Q8_0 support #2763

metal : add Q8_0 support #2763

ggerganov commented Aug 24, 2023 •

edited

ggerganov commented Aug 24, 2023

lshzh-ww commented Aug 24, 2023

ggerganov commented Aug 24, 2023

sukualam commented Sep 4, 2023

metal : add Q8_0 support #2763

metal : add Q8_0 support #2763

Conversation

ggerganov commented Aug 24, 2023 • edited

ggerganov commented Aug 24, 2023

lshzh-ww commented Aug 24, 2023

ggerganov commented Aug 24, 2023

sukualam commented Sep 4, 2023

ggerganov commented Aug 24, 2023 •

edited