-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] benchmark sort scalar list #3022
Conversation
76c78b4
to
7a865b0
Compare
I ran the benchmark on Apple M3 and I got see that To reduce overhead, I added a function if len <= 5:
_delegate_small_sort[Scalar[type], _less_than_equal](buff, len)
return
if len < 32:
_insertion_sort[Scalar[type], _less_than_equal](buff, 0, len)
return
else:
_quicksort[Scalar[type], _less_than_equal](buff, len) This makes Then I rewrote Then I rewrote if start == end:
return end
var left = start
var right = end - 2
var pivot_value = array[end - 1]
while True:
while cmp_fn(array[left], pivot_value):
left += 1
while left < right and not cmp_fn(array[right], pivot_value):
right -= 1
if left >= right:
swap(array[left], array[end - 1])
return left
swap(array[left], array[right])
left += 1
right -= 1 This makes var pivot = int(random.random_si64(start, end - 1))
var pivot_value = array[pivot] which makes Conclusion
|
Would be great if someone could run the benchmark on Intel/AMD and other architectures. |
BTW, I just noticed that the |
This is likely because of #3045 |
It will also be great if you can make everything in one chunk of tests run on same set of data. Currently, you are generating a new set of list for every iteration. |
7a865b0
to
21c9059
Compare
Thanks for this and the other suggestions, I implemented them in one go, please have a look. |
21c9059
to
959a788
Compare
!sync |
Hey @mzaks do you mind fixing the merge conflict? We are ready to import this PR :) EDIT: It's a quick fix. I did it :) |
@mzaks Can you reformat |
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
…dback by @Dan13llljws Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Co-authored-by: Joe Loser <joeloser@fastmail.com> Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Co-authored-by: Joe Loser <joeloser@fastmail.com> Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Co-authored-by: Wangshu Jiang <59179986+Dan13llljws@users.noreply.github.com> Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Co-authored-by: Wangshu Jiang <59179986+Dan13llljws@users.noreply.github.com> Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Co-authored-by: Wangshu Jiang <59179986+Dan13llljws@users.noreply.github.com> Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
b343bcd
to
b48a47b
Compare
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
@Dan13llljws all problems should be resolved now. |
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
[External] [stdlib] benchmark sort scalar list This PR adds a benchmark for sorting a list of scalars. The benchmarks are performed on the public `sort` and internal `_insertion_sort` and `_small_sort` functions. The intention of the benchmark are following: 1. Evaluate results on tiny lists (size 2 to 5) and observe how much faster the `_small_sort` function is compared to `_insertion_sort`. 2. Evaluate result on tiny lists (size 2 to 5) and observe how much overhead does the `sort` function produce compared to internal functions 3. Evaluate results on small lists (size 10 to 100) and observe how much overhead does the `sort` function produce compared to calling the internal insertion sort function My observations running the benchmarks on an Apple M1: 1. `_small_sort` is sometimes ~10% faster 2. `sort` function performs 5 - 6x slower than internal functions 3. `sort` function performs 3 - 100x slower than internal insertion sort Suggestions: - We should run the benchmark on other architectures to identify if my observations are consistent. - Consider removing the `_small_sort` function if it does really provided almost no benefits compared to insertion sort. - Expose insertion sort as top level function, as it has some nice characteristics which users might want to utilise directly. - Consider simplifying the `_quicksort` function and maybe provide a parameter, which defines upper bound of the list size to be sorted with insertion sort. - Consider adding similar parameter to `sort` function. Co-authored-by: Maxim Zaks <maxim.zaks@gmail.com> Closes #3022 MODULAR_ORIG_COMMIT_REV_ID: 86e412f3310544c3198b7be355cc3c527b584cc5
Landed in 3822acd! Thank you for your contribution 🎉 |
[External] [stdlib] benchmark sort scalar list This PR adds a benchmark for sorting a list of scalars. The benchmarks are performed on the public `sort` and internal `_insertion_sort` and `_small_sort` functions. The intention of the benchmark are following: 1. Evaluate results on tiny lists (size 2 to 5) and observe how much faster the `_small_sort` function is compared to `_insertion_sort`. 2. Evaluate result on tiny lists (size 2 to 5) and observe how much overhead does the `sort` function produce compared to internal functions 3. Evaluate results on small lists (size 10 to 100) and observe how much overhead does the `sort` function produce compared to calling the internal insertion sort function My observations running the benchmarks on an Apple M1: 1. `_small_sort` is sometimes ~10% faster 2. `sort` function performs 5 - 6x slower than internal functions 3. `sort` function performs 3 - 100x slower than internal insertion sort Suggestions: - We should run the benchmark on other architectures to identify if my observations are consistent. - Consider removing the `_small_sort` function if it does really provided almost no benefits compared to insertion sort. - Expose insertion sort as top level function, as it has some nice characteristics which users might want to utilise directly. - Consider simplifying the `_quicksort` function and maybe provide a parameter, which defines upper bound of the list size to be sorted with insertion sort. - Consider adding similar parameter to `sort` function. Co-authored-by: Maxim Zaks <maxim.zaks@gmail.com> Closes modularml#3022 MODULAR_ORIG_COMMIT_REV_ID: 86e412f3310544c3198b7be355cc3c527b584cc5 Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
This PR adds a benchmark for sorting a list of scalars.
The benchmarks are performed on the public
sort
and internal_insertion_sort
and_small_sort
functions.The intention of the benchmark are following:
_small_sort
function is compared to_insertion_sort
.sort
function produce compared to internal functionssort
function produce compared to calling the internal insertion sort functionMy observations running the benchmarks on an Apple M1:
_small_sort
is sometimes ~10% fastersort
function performs 5 - 6x slower than internal functionssort
function performs 3 - 100x slower than internal insertion sortSuggestions:
_small_sort
function if it does really provided almost no benefits compared to insertion sort._quicksort
function and maybe provide a parameter, which defines upper bound of the list size to be sorted with insertion sort.sort
function.