Skip to content

Conversation

@lhez
Copy link
Collaborator

@lhez lhez commented Sep 2, 2025

This PR adds initial q8_0 support.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Sep 2, 2025
@lhez lhez marked this pull request as ready for review September 5, 2025 05:53
extra = new ggml_tensor_extra_cl_q8_0();
} else {
extra = temp_tensor_extras_q8_0.back();
temp_tensor_extras_q4_0.pop_back();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be temp_tensor_extras_q8_0.pop_back().

@lhez lhez force-pushed the q8_0-mv branch 2 times, most recently from 9e07767 to 066b21d Compare September 16, 2025 03:23
@lhez
Copy link
Collaborator Author

lhez commented Sep 21, 2025

@max-krasnyansky @rmatif I am going to merge this to move things forward. We can iterate on this if needed.

@rmatif
Copy link
Collaborator

rmatif commented Sep 21, 2025

@max-krasnyansky @rmatif I am going to merge this to move things forward. We can iterate on this if needed.

Sorry for the delay. I’ve been pretty busy. I’ll look into this (and the others) today

@rmatif
Copy link
Collaborator

rmatif commented Sep 21, 2025

ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)

model size params backend ngl fa test t/s
llama 1B Q8_0 1.22 GiB 1.24 B OpenCL 99 0 pp512 58.38 ± 0.76
llama 1B Q8_0 1.22 GiB 1.24 B OpenCL 99 0 tg128 38.80 ± 0.20
llama 1B Q8_0 1.22 GiB 1.24 B OpenCL 99 1 pp512 47.74 ± 11.05
llama 1B Q8_0 1.22 GiB 1.24 B OpenCL 99 1 tg128 34.16 ± 2.31

@rmatif
Copy link
Collaborator

rmatif commented Sep 21, 2025

@lhez I'll let you merge the PR according to the new guidelines

@lhez
Copy link
Collaborator Author

lhez commented Sep 21, 2025

@rmatif Thank you for taking a look!

@lhez lhez merged commit c4510dc into ggml-org:master Sep 21, 2025
94 of 96 checks passed
struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants