Skip to content

Conversation

@lhez
Copy link
Collaborator

@lhez lhez commented Nov 19, 2025

Fixes #17351

The kernel introduced in #17181 is specialized for f16f32 mm in attention, in particular for KQ and KQV. Ordinary f16f32 mm would go to this kernel under the original condition causing incorrect results like in #17351. This PR refines the condition (similarly to what vulkan backend does) so that this kernel will only be used for attention.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Nov 19, 2025
@forforever73
Copy link

@lhez Great, the output is correct now. However, I'm still a bit confused about the condition you use to determine KQV. Most implementations of build_attn call ggml_cont on v, which prevents kqv from entering the ggml_cl_mul_mat_kq_kqv_adreno kernel. After enabling the GGML_OPENCL_PROFILING=ON build option, I confirmed that this is indeed the case.
image

@lhez lhez marked this pull request as ready for review November 20, 2025 18:55
@lhez
Copy link
Collaborator Author

lhez commented Nov 21, 2025

@lhez Great, the output is correct now. However, I'm still a bit confused about the condition you use to determine KQV. Most implementations of build_attn call ggml_cont on v, which prevents kqv from entering the ggml_cl_mul_mat_kq_kqv_adreno kernel.

@forforever73 I think what you are referring to is for mmproj models and you are right - v is made contiguous with ggml_cont (here) for mmproj. In this case, kqv will use the ordinary f16f32 mm.

However, this is not the case for language models. For language models, v usually is not made contiguous and remains a view into the v cache. In this case, v is a non-contiguous tensor and kq is contiguous (output of softmax).

If you dump the kernel timing, e.g., qwen2.5-1.5b, you will see something like this,

image

So, for language models there is usually no cont and we still need to identify for kqv.

@max-krasnyansky max-krasnyansky merged commit 8e9ddba into ggml-org:master Nov 21, 2025
133 of 134 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Incorrect VLM image description on Android after commit 4db5641

3 participants