-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/exp/slices: use dual-pivot quicksort instead of pdqsort #52789
Comments
There is no API change here, so taking it out of the proposal process. |
The For this reason, pdqsort loses many opportunities for optimization, such as removing Run the Benchmark in https://github.com/zhangyunhao116/pdqsort (original generic version), it contains more benchmark scenarios, new is the dual-pivot quicksort. (AMD 3700x amd64/linux)
For compatibility reasons, we can't use different algorithms in different versions, for example Updated the benchmark results, because the new algorithm is not on the default branch of go-exp(https://github.com/PeterRK/go-exp/tree/rk/slices), the previous result is wrong. Based on the new benchmark, in most of the common patterns, the dual-pivot quicksort causes performance regression. In the random case, the dual-pivot quicksort is better. Not sure if this is a performance enhancement overall. cc @PeterRK |
Is it necessary to keep stability of unstable sort? |
|
I don't think I intended my statement to be that absolute. When we were developing the first version of sort in exp/slices, it helped to keep the sorts exactly the same. Especially when they were both code-generated from the same source. Longer term, we can consider differences if there is a motivating reason for doing so. |
Thanks! I misunderstood something, and I am wondering that do we need to keep the |
Related test for this case.
|
We see that pdqsort's effort on optimizing some common patterns, but we also see those patterns usually are not the bottleneck of a real job. Dual-pivot quicksort works better with the slowest unordered pattern. I am not sure it's faster in average. But improving the slowest case is worthy. I ran benchmark on cloud machine, because most of my go programs run on them. More benchmark results on other platforms are needed to figure out which algorithm to choose. |
Agree! The performance is depends. It would be great if we could find a better way to improve the performance in random cases, could you please post some references for this variant? |
The most famous variant of dual-pivot quicksort may be that in OpenJDK. However, the java version looks a little too complex. I like simple implement and make one. It's perhaps not the best. |
Facts form benchmark:
Do you think it's resonable to use dual-pivot quicksort instead of pattern-defeating quicksort? |
I'm happy to explore faster sort algorithms. I agree that faster worst case time is probably more important than faster already-sorted time. But I don't see any fundamental reason why we need to make that tradeoff - the ideas behind pdqsort and dual-pivot quicksort can probably be combined. |
I have tried to add pattern-defeating technique into dual-pivot quicksort. But it’s not easy to implement such technique in a 3-way algorithm. |
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789 Change-Id: I1e572923ee90c7d4ec6792c7dba9a443134f9e61
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Change https://go.dev/cl/406594 mentions this issue: |
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts-4 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted-4 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed-4 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed-4 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings-4 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs-4 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable-4 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. Benchmark results on Xeon-8372C name old time/op new time/op delta SlicesSortInts 7.01ms ± 0% 5.84ms ± 0% -16.64% (p=0.000 n=10+10) SlicesSortInts_Sorted 80.1µs ± 1% 69.1µs ± 0% -13.76% (p=0.000 n=8+10) SlicesSortInts_Reversed 116µs ± 1% 108µs ± 0% -6.72% (p=0.000 n=10+10) SlicesSortInts_Mixed 5.36ms ± 0% 4.31ms ± 0% -19.53% (p=0.000 n=10+9) SlicesSortStrings 22.5ms ± 0% 20.9ms ± 1% -7.12% (p=0.000 n=10+10) SortFuncStructs 13.4ms ± 1% 12.5ms ± 1% -7.00% (p=0.000 n=10+10) SortFuncStructs_Stable 31.1ms ± 1% 30.7ms ± 1% -1.25% (p=0.000 n=10+10) For golang/go#52789
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating technique is added to make this variant. And use double reversion instead of blockswap for symmerge. For golang/go#52789
Change https://go.dev/cl/406834 mentions this issue: |
Benchmark result on Ampere Altra (ARM64)
|
Give up. |
New benchmark result with go 1.20rc2
Xeon-8374C
EPYC-7K83
Pdqsort works well, but I found dual-pivot quicksort may run faster generally. Here is my implement, a variant of dual-pivot quicksort. Pdqsort wins a lot in percentage when the input is already in order, but absolute gaps are small. It's reasonable to optimize the slow cases first.
Benchmark result on Xeon-8372C
Benchmark result on EPYC-7K83
From benchmark result, we can see that pattern-defeating quicksort may have better performance than dual-pivot quicksort in average when more than 75% of workload are in special patterns. I don't it's common to have so much input with special patterns in a real job.
Unlike pdqsort, dual-pivot quicksort is not a new algorithm. It has been used by OpenJDK about 10 years ago. We can find many papers about it. Dual-pivot quicksort brings significantly more swaps, but swap operation is much cheaper now with generics.
The text was updated successfully, but these errors were encountered: