Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve small sort performance on CUDA
Currently, `bitonicSortKVInPlace` is written to sort one array per block of threads. If that dimension happens to be very small (<128 elements), this results in low thread occupancy. Instead, this changes `bitonicSortKVInPlace` to operate with a 2d block. Sorting happens along the x dimension, and the y dimension is a fixed size batch. Pull Request resolved: #79627 Approved by: https://github.com/ngimel
- Loading branch information
1 parent
9244547
commit 61305cd
Showing
3 changed files
with
106 additions
and
92 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters