vulkan: improve topk perf for large k, fix overflow in unit tests #17582

jeffbolznv · 2025-11-28T21:19:13Z

The top_k shader reduces workgroupsize elements to k elements in each iteration. When k is large, choose a larger workgroup size so it converges in fewer iterations. This helps with the test cases added here: https://github.com/ggml-org/llama.cpp/pull/17004/files#diff-2749fdb8974ec96afa18444a9d546409318b0a862709139b677eee468c479578R8045

I also saw that for very small tensors, test-backend-ops has an overflow that makes it compute incorrect results. This change does the std::min operation as 64b before converting to 32b.

before
  TOP_K(type=f32,ne=[2,1,1,1],k=1):                 -1431647801 runs -    -0.00 us/run -        0 kB/run - -382276.15 GB/s
  TOP_K(type=f32,ne=[1,1,1,1],k=1):                    23544 runs -    42.48 us/run -        0 kB/run -    0.00 GB/s
...
  TOP_K(type=f32,ne=[400,1,1,1],k=400):               548864 runs -     1.83 us/run -        3 kB/run -    1.63 GB/s
  TOP_K(type=f32,ne=[1000,1,1,1],k=400):              311296 runs -     3.22 us/run -        5 kB/run -    1.62 GB/s
  TOP_K(type=f32,ne=[65000,1,1,1],k=400):              16384 runs -    63.84 us/run -      255 kB/run -    3.82 GB/s
  TOP_K(type=f32,ne=[200000,1,1,1],k=400):             16384 runs -    83.12 us/run -      782 kB/run -    8.98 GB/s
  TOP_K(type=f32,ne=[400,16,1,1],k=400):              516096 runs -     1.95 us/run -       50 kB/run -   24.45 GB/s
  TOP_K(type=f32,ne=[1000,16,1,1],k=400):             303104 runs -     3.33 us/run -       87 kB/run -   25.05 GB/s
  TOP_K(type=f32,ne=[65000,16,1,1],k=400):             16384 runs -    97.10 us/run -     4087 kB/run -   40.15 GB/s
  TOP_K(type=f32,ne=[200000,16,1,1],k=400):             5358 runs -   211.52 us/run -    12525 kB/run -   56.47 GB/s

after
  TOP_K(type=f32,ne=[2,1,1,1],k=1):                   679936 runs -     1.48 us/run -        0 kB/run -    0.01 GB/s
  TOP_K(type=f32,ne=[1,1,1,1],k=1):                   679936 runs -     1.47 us/run -        0 kB/run -    0.01 GB/s
...
  TOP_K(type=f32,ne=[400,1,1,1],k=400):               548864 runs -     1.83 us/run -        3 kB/run -    1.63 GB/s
  TOP_K(type=f32,ne=[1000,1,1,1],k=400):              311296 runs -     3.22 us/run -        5 kB/run -    1.62 GB/s
  TOP_K(type=f32,ne=[65000,1,1,1],k=400):              49152 runs -    23.47 us/run -      255 kB/run -   10.38 GB/s
  TOP_K(type=f32,ne=[200000,1,1,1],k=400):             32768 runs -    31.72 us/run -      782 kB/run -   23.54 GB/s
  TOP_K(type=f32,ne=[400,16,1,1],k=400):              524288 runs -     1.92 us/run -       50 kB/run -   24.83 GB/s
  TOP_K(type=f32,ne=[1000,16,1,1],k=400):             303104 runs -     3.32 us/run -       87 kB/run -   25.17 GB/s
  TOP_K(type=f32,ne=[65000,16,1,1],k=400):             24576 runs -    45.12 us/run -     4087 kB/run -   86.39 GB/s
  TOP_K(type=f32,ne=[200000,16,1,1],k=400):            10716 runs -   108.55 us/run -    12525 kB/run -  110.04 GB/s

vulkan: improve topk perf for large k, fix overflow in unit tests

9be071b

jeffbolznv requested review from 0cc4m and ggerganov as code owners November 28, 2025 21:19

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025

loci-dev mentioned this pull request Nov 28, 2025

UPSTREAM PR #17582: vulkan: improve topk perf for large k, fix overflow in unit tests auroralabs-loci/llama.cpp#353

Open

0cc4m approved these changes Nov 29, 2025

View reviewed changes

0cc4m merged commit 59d8d4e into ggml-org:master Nov 29, 2025
72 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: improve topk perf for large k, fix overflow in unit tests #17582

vulkan: improve topk perf for large k, fix overflow in unit tests #17582

Uh oh!

jeffbolznv commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vulkan: improve topk perf for large k, fix overflow in unit tests #17582

vulkan: improve topk perf for large k, fix overflow in unit tests #17582

Uh oh!

Conversation

jeffbolznv commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants