Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 18, 2025

Add a dedicated top-k op so that it can be more efficiently optimized by backend implementations. The old implementation is renamed to ggml_argsort_top_k.

TODO:

Next PRs:

  • Vulkan
  • etc.

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 18, 2025
@am17an
Copy link
Collaborator

am17an commented Nov 19, 2025

Does this operator expect the top-K elements to be sorted?

@ORippler
Copy link
Contributor

ORippler commented Nov 19, 2025

Does this operator expect the top-K elements to be sorted?

I feel it should not be sorted, as algorithmically we are performing a selection, and depending on the algorithm the outcome of this selection is unordered:

https://leimao.github.io/blog/CPU-TopK-Algorithm/
https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceTopK.html#overview

Should one wish to sort, one could easily do GGML_OP_TOP_K -> GGML_OP_ARGSORT

@ggerganov
Copy link
Member Author

Does this operator expect the top-K elements to be sorted?

In principle it does not have to expect the elements to be sorted. However the current implementation sorts them in descending order in order to be able to verify correctness with test-backend-ops. If we allow arbitrary order I am not sure how we would verify correctness.

@ORippler
Copy link
Contributor

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

@am17an
Copy link
Collaborator

am17an commented Nov 19, 2025

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

Currently, test-backend-ops relies on NMSE of outputs rather than cardinality checks, but I guess that can be changed.

@slaren
Copy link
Member

slaren commented Nov 19, 2025

It would be ok to add an overrideable error function to test_case. Leave NMSE as the default and override it in test_top_k to compare as a set.

@jeffbolznv
Copy link
Collaborator

What are common tensor shapes and values of k we should optimize for?

Does this operation support non-contiguous rows?

@ggerganov
Copy link
Member Author

@jeffbolznv This will be used in #17004 to do top-k sampling efficiently on the GPU. The typical shapes are:

  • large src[0]->ne[0] (i.e. up to vocab size)
  • small k (usually 1, 10, 40)

Support for non-contiguous rows is not necessary for now - will add asserts for that.

@jeffbolznv
Copy link
Collaborator

OK, understood. When you get a chance, please rebase, I'll implement something based on #17313.

@ggerganov ggerganov marked this pull request as ready for review November 20, 2025 09:37
ggml_tensor * selection_groups = ggml_reshape_3d(ctx0, selection_probs, n_exp_per_group, hparams.n_expert_groups, n_tokens); // [n_exp_per_group, n_expert_groups, n_tokens]

ggml_tensor * group_scores = ggml_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]
ggml_tensor * group_scores = ggml_argsort_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are temporary until all backends support are in place? Add a TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure yet - keeping the expert order deterministic might be necessary. And using ggml_top_k here would likely not make a big difference performance wise since the arrays are very small.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unnecessary for the expert group selection at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change it anywhere else as it will also break fusion in all backends for topk-moe

ORippler

This comment was marked as resolved.

@ggerganov
Copy link
Member Author

since we store ascending int numbers in our array,

The values are also shuffled:

// initialize with unique values to avoid ties
for (int64_t r = 0; r < ggml_nrows(t); r++) {
std::vector<float> data(t->ne[0]);
for (int i = 0; i < t->ne[0]; i++) {
data[i] = i;
}
std::shuffle(data.begin(), data.end(), rng);
ggml_backend_tensor_set(t, data.data(), r * t->nb[1], t->ne[0] * sizeof(float));
}

So top 1 could be any number.

@jeffbolznv
Copy link
Collaborator

Vulkan support is ready in #17418.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants