ggml : add ggml_top_k #17365

ggerganov · 2025-11-18T14:41:21Z

Add a dedicated top-k op so that it can be more efficiently optimized by backend implementations. The old implementation is renamed to ggml_argsort_top_k.

TODO:

Allow unsorted output (ggml : add ggml_top_k #17365 (comment)) (see: c63ecde)
Do not rely on op_params
CUDA (will be added in sampling : add support for backend sampling #17004)
Metal

Next PRs:

Vulkan
etc.

am17an · 2025-11-19T10:32:27Z

Does this operator expect the top-K elements to be sorted?

ORippler · 2025-11-19T10:39:09Z

Does this operator expect the top-K elements to be sorted?

I feel it should not be sorted, as algorithmically we are performing a selection, and depending on the algorithm the outcome of this selection is unordered:

https://leimao.github.io/blog/CPU-TopK-Algorithm/
https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceTopK.html#overview

Should one wish to sort, one could easily do GGML_OP_TOP_K -> GGML_OP_ARGSORT

ggerganov · 2025-11-19T10:40:43Z

Does this operator expect the top-K elements to be sorted?

In principle it does not have to expect the elements to be sorted. However the current implementation sorts them in descending order in order to be able to verify correctness with test-backend-ops. If we allow arbitrary order I am not sure how we would verify correctness.

ORippler · 2025-11-19T10:44:54Z

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

am17an · 2025-11-19T10:46:44Z

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

Currently, test-backend-ops relies on NMSE of outputs rather than cardinality checks, but I guess that can be changed.

slaren · 2025-11-19T10:49:20Z

It would be ok to add an overrideable error function to test_case. Leave NMSE as the default and override it in test_top_k to compare as a set.

jeffbolznv · 2025-11-19T14:43:02Z

What are common tensor shapes and values of k we should optimize for?

Does this operation support non-contiguous rows?

ggerganov · 2025-11-19T15:13:17Z

@jeffbolznv This will be used in #17004 to do top-k sampling efficiently on the GPU. The typical shapes are:

large src[0]->ne[0] (i.e. up to vocab size)
small k (usually 1, 10, 40)

Support for non-contiguous rows is not necessary for now - will add asserts for that.

jeffbolznv · 2025-11-19T20:43:47Z

OK, understood. When you get a chance, please rebase, I'll implement something based on #17313.

CISC · 2025-11-20T09:57:54Z

src/llama-graph.cpp

        ggml_tensor * selection_groups = ggml_reshape_3d(ctx0, selection_probs, n_exp_per_group, hparams.n_expert_groups, n_tokens); // [n_exp_per_group, n_expert_groups, n_tokens]

-        ggml_tensor * group_scores = ggml_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]
+        ggml_tensor * group_scores = ggml_argsort_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]


I guess these are temporary until all backends support are in place? Add a TODO?

Not 100% sure yet - keeping the expert order deterministic might be necessary. And using ggml_top_k here would likely not make a big difference performance wise since the arrays are very small.

Completely unnecessary for the expert group selection at least.

Please don't change it anywhere else as it will also break fusion in all backends for topk-moe

ggerganov · 2025-11-20T10:08:19Z

since we store ascending int numbers in our array,

The values are also shuffled:

llama.cpp/tests/test-backend-ops.cpp

Lines 5034 to 5042 in c63ecde

    
           // initialize with unique values to avoid ties 
        
           for (int64_t r = 0; r < ggml_nrows(t); r++) { 
        
               std::vector<float> data(t->ne[0]); 
        
               for (int i = 0; i < t->ne[0]; i++) { 
        
                   data[i] = i; 
        
               } 
        
               std::shuffle(data.begin(), data.end(), rng); 
        
               ggml_backend_tensor_set(t, data.data(), r * t->nb[1], t->ne[0] * sizeof(float)); 
        
           }

So top 1 could be any number.

jeffbolznv · 2025-11-21T00:02:42Z

Vulkan support is ready in #17418.

ggerganov force-pushed the gg/ggml-top-k branch from 56ab2ca to 4d75c05 Compare November 18, 2025 16:06

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 18, 2025

ggerganov force-pushed the gg/ggml-top-k branch from 4d75c05 to 5d8ce1c Compare November 19, 2025 10:31

ggerganov added 3 commits November 20, 2025 10:29

ggml : add ggml_top_k

20f1050

cont : add ggml_argsort_top_k

a283069

metal : add top_k support

4dea5dd

ggerganov force-pushed the gg/ggml-top-k branch from 5d8ce1c to 4dea5dd Compare November 20, 2025 08:29

ggerganov added 2 commits November 20, 2025 10:46

ggml : cleanup

b46acfe

tests : add virtual err() function for test_case

c63ecde

ggerganov marked this pull request as ready for review November 20, 2025 09:37

ggerganov requested review from CISC and slaren as code owners November 20, 2025 09:37

CISC approved these changes Nov 20, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

ggml : add comments

db4570a

jeffbolznv mentioned this pull request Nov 21, 2025

vulkan: Implement top-k #17418

Draft

ggml : add ggml_top_k #17365

Are you sure you want to change the base?

ggml : add ggml_top_k #17365

Uh oh!

Conversation

ggerganov commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

ORippler commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

ORippler commented Nov 19, 2025

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

slaren commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

CISC Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

am17an Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

jeffbolznv commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ggerganov commented Nov 18, 2025 •

edited

Loading

ORippler commented Nov 19, 2025 •

edited

Loading