vulkan: support larger argsort #17313

jeffbolznv · 2025-11-17T06:38:54Z

This is an extension of the original bitonic sorting shader that puts the temporary values in global memory and when more than 1024 threads are needed it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with the index value. I've applied this same change to the original shared memory version of the shader, which is still used when ncols <= 1024.

Performance seems pretty good relative to the cuda backend, but somewhat worse than CUB for the largest sizes with multiple rows.

This is an extension of the original bitonic sorting shader that puts the temporary values in global memory and when more than 1024 threads are needed it runs multiple workgroups and synchronizes through a pipelinebarrier. To improve the memory access pattern, a copy of the float value is kept with the index value. I've applied this same change to the original shared memory version of the shader, which is still used when ncols <= 1024.

…ng a single pass, for a modest perf boost

0cc4m

LGTM

jeffbolznv requested review from 0cc4m and slaren as code owners November 17, 2025 06:38

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 17, 2025

DajanaV mentioned this pull request Nov 17, 2025

UPSTREAM PR #17313: vulkan: support larger argsort auroralabs-loci/llama.cpp#231

Open

jeffbolznv force-pushed the argsort_large branch from a6f38e1 to b7f5a07 Compare November 18, 2025 22:44

jeffbolznv force-pushed the argsort_large branch from b7f5a07 to 69a5306 Compare November 18, 2025 23:18

jeffbolznv added 3 commits November 18, 2025 21:12

Reduce the number of shader variants. Use smaller workgroups when doi…

4cf884b

…ng a single pass, for a modest perf boost

reduce loop overhead

68bd013

run multiple cols per invocation, to reduce barrier overhead

a19c81b

0cc4m approved these changes Nov 19, 2025

View reviewed changes

0cc4m merged commit 1fa4551 into ggml-org:master Nov 19, 2025
74 checks passed

jeffbolznv mentioned this pull request Nov 19, 2025

ggml : add ggml_top_k #17365

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: support larger argsort #17313

vulkan: support larger argsort #17313

jeffbolznv commented Nov 17, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vulkan: support larger argsort #17313

vulkan: support larger argsort #17313

Conversation

jeffbolznv commented Nov 17, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants