vulkan: improve topk perf for large k, fix overflow in unit tests #17582
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The top_k shader reduces workgroupsize elements to k elements in each iteration. When k is large, choose a larger workgroup size so it converges in fewer iterations. This helps with the test cases added here: https://github.com/ggml-org/llama.cpp/pull/17004/files#diff-2749fdb8974ec96afa18444a9d546409318b0a862709139b677eee468c479578R8045
I also saw that for very small tensors, test-backend-ops has an overflow that makes it compute incorrect results. This change does the std::min operation as 64b before converting to 32b.