opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970

lhez · 2026-01-20T19:03:09Z

The specialized kqv kernel for Adreno uses image1d_buffer_t to wrap dst to utilize texture processor. However, image1d_buffer_t has dimension limit. The limit can be reached when there are enough tokens in kv cache.

For example, providing gpt-oss-20b with a prompt of about 4k tokens using llama-completion with -ub 1024 will hit the limit, resulting in assert failure.

This PR utilizes the general mm for floating point by copying to src0 and src1 to contiguous if necessary. I believe this has been used in vulkan backend. A check is added to ensure that the arguments are within the limit of image1d_buffer_t for the specialized kqv kernel.

max-krasnyansky

Looks good.
Didn't get a chance to try it on my setup yet. I'm testing some Hexagon changes and will give this a go along with that.

lhez added 5 commits January 19, 2026 23:06

opencl: add copy_to_contiguous and utilize mm kernels

6d0a567

opencl: only copy to cont for f32 and f16 tensors

b773905

opencl: use cont mm for fallback when dst is large

861c981

opencl: use nb local to copy-to-cont

ca8a506

opencl: use local offset as well

f04a782

loci-dev mentioned this pull request Jan 20, 2026

UPSTREAM PR #18970: opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno auroralabs-loci/llama.cpp#986

Open

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jan 20, 2026

lhez marked this pull request as ready for review January 21, 2026 19:07

lhez requested a review from max-krasnyansky as a code owner January 21, 2026 19:07

max-krasnyansky approved these changes Jan 22, 2026

View reviewed changes

max-krasnyansky merged commit 9c96465 into ggml-org:master Jan 22, 2026
147 of 149 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970

opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970

lhez commented Jan 20, 2026

Uh oh!

max-krasnyansky left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970

opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno #18970

Conversation

lhez commented Jan 20, 2026

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants