-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of select-top-k WARP_SORT implementation #606
Improve performance of select-top-k WARP_SORT implementation #606
Conversation
…to improve its logic
a45a2f1
to
dba9c57
Compare
NB from previous discussions: I considered merging
|
631ed31
to
42470d5
Compare
NB: more tests are coming in #618 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Artem, thanks for the PR! It looks good, just a found a few small things.
Question, is the following statement true? With these modifications, capacity < warpsize is allowed. Since the capacity is chosen automatically according to the k value, the existing test cover these changes.
Please add to the PR description that the warp_sort::load functions are removed to simplify the API.
Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>
Yes, that is true. Also a few more tests are coming in #618 (which expands the api a little by allowing to pass nullptr as input indices). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Artem for addressing the issues! The PR looks good to me.
@gpucibot merge |
A few simplifications and tricks to improve the performance of the kernel:
capacity < WarpSize
sort
operations forfiltered
versionwarp_sort::load
to simplify the api and implementation