Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

forforever73 · 2025-12-03T11:56:46Z

forforever73
Dec 3, 2025

@lhez Sorry for taking your time, I’m running a new model on an Adreno 750 GPU and noticed for Q4 weights, using the optimized kernel CL_mul_mat_Ab_Bi_8x4 seems to require that use_adreno_kernels() returns true. However, in my model there are several matmul shapes like:
A: [256, 1280, 1, 1]
B: [256, 512, 1, 1]
→ Output: [1280, 512, 1, 1]
So the kernel falls back to kernel_mul_mat_q4_0_f32_1d_8x_flat. This fallback kernel is about 10× slower on Adreno 750. I experimented by modifying the internal threshold
int64_t threshold_ne0 = 256;
After lowering the threshold, the Adreno kernels are used, performance improves dramatically, and the model’s PPL shows no meaningful change.
So what was the original reasoning behind the use_adreno_kernels() threshold? If I reduce the threshold to 256, is there any potential risk I should be aware of?

lhez · 2025-12-08T06:09:28Z

lhez
Dec 8, 2025
Collaborator

@forforever73 Apologies for the delay. kernel_mul_mat_Ab_Bi_8x4 should be general enough to handle smaller sizes. When the kernel was initially written, our main target was 7B models. In addition, most models we work with don't have such dimensions so we put a threshold here (is it possible for you to share which model this is, or point us to a similar open source model?) This kernel requires both weights and activation matrices be transposed. This kernel have caused some issues for us in the past (e.g., initially it wasn't able to properly handle 0.5B models). Feel free to use this kernel for smaller ne00 values and let us know if you see any issue.

1 reply

forforever73 Dec 9, 2025
Author

Thank you very much for the detailed reply!

Our team is working on a new model that uses PLE (Per-Layer Embeddings) , which introduces weight shapes like [256, 1280, 1, 1]. So far, after lowering the threshold to threshold_ne0 = 256, we haven’t observed any issues — both accuracy and performance.

We’ll continue testing , we’ll definitely report back if encounter any problems.
Thanks again for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

Uh oh!

forforever73 Dec 3, 2025

Replies: 1 comment · 1 reply

Uh oh!

lhez Dec 8, 2025 Collaborator

Uh oh!

forforever73 Dec 9, 2025 Author

forforever73
Dec 3, 2025

Replies: 1 comment 1 reply

lhez
Dec 8, 2025
Collaborator

forforever73 Dec 9, 2025
Author