Skip to content

Conversation

r-devulap
Copy link
Member

Relative to scalar argsort, SIMD argsort is slower for ordered/sorted arrays. An easy fix is to detect sorted arrays and exit early. This doesn't add much overhead to random arrays but does slightly regress on uniform/constant arrays. But perhaps that trade off is worth it since argsort is fairly faster than scalar argsort for uniform/constant arrays. Benchmarks on AVX2:

Benchmark                                                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[simdargsort/constant_10k/uint64_t vs. simdargsort/constant_10k/uint64_t]                +0.1902         +0.1902         14544         17310         14544         17310
[simdargsort/sorted_10k/uint64_t vs. simdargsort/sorted_10k/uint64_t]                -0.9408         -0.9408        292847         17329        292836         17329

Copy link
Contributor

@sterrettm2 sterrettm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@sterrettm2 sterrettm2 merged commit c306ac5 into numpy:main Apr 15, 2025
28 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants