-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort: use pdqsort #50154
Comments
Change https://golang.org/cl/371574 mentions this issue: |
Without running the same set of benchmarks on the same data with the new and old code, it's hard to see the improvement. |
Updated, add |
Reran the stdpdqsort benchmarks to get a delta. EC2 t2.micro (1 vCPU, 1 GiB mem), goos: linux, goarch: amd64
|
I think the only reason this could be considered as a proposal is that it will change the sorting of equal elements when using |
Since |
We've changed the sort results in the past. There was a release that greatly reduced the number of comparisons but changed sort orders. So changing again if we have significant optimizations to apply should be fine (modulo reviews, code being fine, etc). |
Would you mind saying a few words about what you tried for this, and any theories as to why it had poor performance in Go? For example, was the Go compiler not generating similar cmov or setcc instructions as a C++ or Rust compiler, or maybe it was something else? (My rough understanding is that BlockQuicksort was a very useful optimization in the original pdqsort. Maybe someone will have an idea about how to write it in a way that is friendly to the current Go compiler, or maybe there is a future compiler optimization that might make it more workable for Go, or ...). Thank you for your work on this -- it looks to be a very nice improvement! |
Yes, the Go compiler looks like not generating CMOV as the Rust compiler does. It would be nice if you could find a better way to write this code :)
This code will generate something like
Other reasons maybe cache miss, or the BlockQuickSort needs more stack for each call(BLOCK*2). The benchmark is based on a global pre-allocated block, but the performance is still not good. A more important reason is that cc @thepudds |
- Across all benchmarks, pdqsort is never significantly slower than the previous algorithm. - In common patterns, pdqsort is often faster (i.e. 10x faster in sorted slices). The pdqsort is described at https://arxiv.org/pdf/2106.05123.pdf This CL is inspired by both C++ implementation and Rust implementation. - C++ implementation: https://github.com/orlp/pdqsort - Rust implementation: https://docs.rs/pdqsort/latest/pdqsort/ For #50154 name old time/op new time/op delta SearchWrappers-16 72.8ns ± 3% 75.1ns ± 2% +3.25% (p=0.000 n=20+10) SortString1K-16 85.2µs ± 3% 86.2µs ± 5% ~ (p=0.247 n=19+10) SortString1K_Slice-16 84.6µs ± 4% 86.1µs ± 4% ~ (p=0.120 n=20+10) StableString1K-16 112µs ± 5% 112µs ± 5% ~ (p=0.604 n=19+10) SortInt1K-16 44.8µs ± 3% 43.2µs ± 2% -3.68% (p=0.000 n=20+10) SortInt1K_Sorted-16 28.2µs ± 3% 3.3µs ± 3% -88.16% (p=0.000 n=19+10) SortInt1K_Reversed-16 29.4µs ± 3% 4.8µs ± 2% -83.59% (p=0.000 n=20+10) SortInt1K_Mod8-16 25.1µs ± 2% 20.0µs ± 2% -20.35% (p=0.000 n=18+10) StableInt1K-16 51.3µs ± 3% 50.9µs ± 2% ~ (p=0.562 n=20+9) StableInt1K_Slice-16 49.5µs ± 2% 50.7µs ± 4% +2.55% (p=0.009 n=19+10) SortInt64K-16 4.73ms ± 3% 4.49ms ± 4% -5.08% (p=0.000 n=20+10) SortInt64K_Slice-16 4.51ms ± 3% 4.35ms ± 1% -3.42% (p=0.000 n=20+8) StableInt64K-16 4.85ms ± 2% 4.82ms ± 2% ~ (p=0.267 n=20+10) Sort1e2-16 27.9µs ± 1% 28.1µs ± 2% ~ (p=0.198 n=20+10) Stable1e2-16 56.6µs ± 2% 55.0µs ± 2% -2.88% (p=0.000 n=20+10) Sort1e4-16 5.51ms ± 1% 5.36ms ± 1% -2.58% (p=0.000 n=19+9) Stable1e4-16 17.8ms ± 1% 17.3ms ± 1% -2.40% (p=0.000 n=20+10) Sort1e6-16 833ms ± 1% 807ms ± 1% -3.02% (p=0.000 n=20+10) Stable1e6-16 3.49s ± 2% 3.44s ± 1% -1.41% (p=0.001 n=20+10) Change-Id: Iecded047d237b9330b5a4101001a5fdc2f50646a Reviewed-on: https://go-review.googlesource.com/c/go/+/371574 Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eli Bendersky <eliben@google.com>
Change https://go.dev/cl/399315 mentions this issue: |
Sync with CL 371574. - pdqsort paper: https://arxiv.org/pdf/2106.05123.pdf - C++ implementation: https://github.com/orlp/pdqsort - Rust implementation: https://docs.rs/pdqsort/latest/pdqsort/ For golang/go#50154 name old time/op new time/op delta SortInts-16 10.8ms ± 3% 10.8ms ± 3% ~ (p=0.461 n=20+20) SlicesSortInts-16 6.14ms ± 3% 6.29ms ± 3% +2.43% (p=0.000 n=20+19) SlicesSortInts_Sorted-16 1.78ms ± 4% 0.09ms ± 2% -95.01% (p=0.000 n=18+20) SlicesSortInts_Reversed-16 1.82ms ± 5% 0.13ms ± 2% -92.67% (p=0.000 n=20+20) SortStrings-16 23.0ms ± 2% 23.1ms ± 1% ~ (p=0.253 n=20+20) SlicesSortStrings-16 19.1ms ± 2% 19.0ms ± 2% ~ (p=0.270 n=20+19) SortStructs-16 15.7ms ± 4% 15.7ms ± 3% ~ (p=0.989 n=20+20) SortFuncStructs-16 13.6ms ± 1% 12.7ms ± 3% -6.39% (p=0.000 n=19+20) Change-Id: Ia563fe1c70d6d2ac098a97e1f870040d038cf61c Reviewed-on: https://go-review.googlesource.com/c/exp/+/399315 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eli Bendersky <eliben@google.com>
All changes have been submitted, so closing this issue. Thanks all! |
Good job! Long time to be merged. Cheers~ |
Abstract
We suggest using pdqsort in the
sort
package. The new algorithm is mainly based on pattern-defeating quicksort by Orson Peters. Both Rust and C++ Boost have used pdqsort in their library.sort
package).Implementations
sort.Interface
. Same as the Gerrit CL, but contains more tests and benchmarks.Ordered
types). Since Go1.18 is not released, this implementation is just a demo for now.Algorithm details
The new algorithm is mainly based on pdqsort, but with these modifications:
Known issues
sort
package.Benchmarks
sort
package. (go test sort -count=20 -run=NOTEST -bench=.
)Benchmark from stdpdqsort. (
go test -run=NOTEST -bench=. -cpu=1 -benchtime=1000x -count=10 -timeout=60m > release.txt && benchstat release.txt
)Random:
data[i] = rand.Int()
Sorted:
data[i] = i
NearlySorted:
Reversed:
data[i] = len(data) - i
NearlyReversed:
Mod8:
data[i] = i % 8
The text was updated successfully, but these errors were encountered: