New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a faster sort algorithm #38192

Merged
merged 2 commits into from Dec 9, 2016

Conversation

@stjepang
Contributor

stjepang commented Dec 6, 2016

Hi everyone, this is my first PR.

I've made some changes to the standard sort algorithm, starting out with a few tweaks here and there, but in the end this endeavour became a complete rewrite of it.

Summary

Changes:

  • Improved performance, especially on partially sorted inputs.
  • Performs fewer comparisons on both random and partially sorted inputs.
  • Decreased the size of temporary memory: the new sort allocates 4x less.

Benchmark:

 name                                        out1 ns/iter          out2 ns/iter          diff ns/iter   diff %
 slice::bench::sort_large_ascending          85,323 (937 MB/s)     8,970 (8918 MB/s)          -76,353  -89.49%
 slice::bench::sort_large_big_ascending      2,135,297 (599 MB/s)  355,955 (3595 MB/s)     -1,779,342  -83.33%
 slice::bench::sort_large_big_descending     2,266,402 (564 MB/s)  416,479 (3073 MB/s)     -1,849,923  -81.62%
 slice::bench::sort_large_big_random         3,053,031 (419 MB/s)  1,921,389 (666 MB/s)    -1,131,642  -37.07%
 slice::bench::sort_large_descending         313,181 (255 MB/s)    14,725 (5432 MB/s)        -298,456  -95.30%
 slice::bench::sort_large_mostly_ascending   287,706 (278 MB/s)    243,204 (328 MB/s)         -44,502  -15.47%
 slice::bench::sort_large_mostly_descending  415,078 (192 MB/s)    271,028 (295 MB/s)        -144,050  -34.70%
 slice::bench::sort_large_random             545,872 (146 MB/s)    521,559 (153 MB/s)         -24,313   -4.45%
 slice::bench::sort_large_random_expensive   30,321,770 (2 MB/s)   23,533,735 (3 MB/s)     -6,788,035  -22.39%
 slice::bench::sort_medium_ascending         616 (1298 MB/s)       155 (5161 MB/s)               -461  -74.84%
 slice::bench::sort_medium_descending        1,952 (409 MB/s)      202 (3960 MB/s)             -1,750  -89.65%
 slice::bench::sort_medium_random            3,646 (219 MB/s)      3,421 (233 MB/s)              -225   -6.17%
 slice::bench::sort_small_ascending          39 (2051 MB/s)        34 (2352 MB/s)                  -5  -12.82%
 slice::bench::sort_small_big_ascending      96 (13333 MB/s)       96 (13333 MB/s)                  0    0.00%
 slice::bench::sort_small_big_descending     248 (5161 MB/s)       243 (5267 MB/s)                 -5   -2.02%
 slice::bench::sort_small_big_random         501 (2554 MB/s)       490 (2612 MB/s)                -11   -2.20%
 slice::bench::sort_small_descending         95 (842 MB/s)         63 (1269 MB/s)                 -32  -33.68%
 slice::bench::sort_small_random             372 (215 MB/s)        354 (225 MB/s)                 -18   -4.84%

Background

First, let me just do a quick brain dump to discuss what I learned along the way.

The official documentation says that the standard sort in Rust is a stable sort. This constraint is thus set in stone and immediately rules out many popular sorting algorithms. Essentially, the only algorithms we might even take into consideration are:

  1. Merge sort
  2. Block sort (famous implementations are WikiSort and GrailSort)
  3. TimSort

Actually, all of those are just merge sort flavors. :) The current standard sort in Rust is a simple iterative merge sort. It has three problems. First, it's slow on partially sorted inputs (even though #29675 helped quite a bit). Second, it always makes around log(n) iterations copying the entire array between buffers, no matter what. Third, it allocates huge amounts of temporary memory (a buffer of size 2*n, where n is the size of input).

The problem of auxilliary memory allocation is a tough one. Ideally, it would be best for our sort to allocate O(1) additional memory. This is what block sort (and it's variants) does. However, it's often very complicated (look at this) and even then performs rather poorly. The author of WikiSort claims good performance, but that must be taken with a grain of salt. It performs well in comparison to std::stable_sort in C++. It can even beat std::sort on partially sorted inputs, but on random inputs it's always far worse. My rule of thumb is: high performance, low memory overhead, stability - choose two.

TimSort is another option. It allocates a buffer of size n/2, which is not great, but acceptable. Performs extremelly well on partially sorted inputs. However, it seems pretty much all implementations suck on random inputs. I benchmarked implementations in Rust, C++, and D. The results were a bit disappointing. It seems bad performance is due to complex galloping procedures in hot loops. Galloping noticeably improves performance on partially sorted inputs, but worsens it on random ones.

The new algorithm

Choosing the best algorithm is not easy. Plain merge sort is bad on partially sorted inputs. TimSort is bad on random inputs and block sort is even worse. However, if we take the main ideas from TimSort (intelligent merging strategy of sorted runs) and drop galloping, then we'll have great performance on random inputs and it won't be bad on partially sorted inputs either.

That is exactly what this new algorithm does. I can't call it TimSort, since it steals just a few of it's ideas. Complete TimSort would be a much more complex and elaborate implementation. In case we in the future figure out how to incorporate more of it's ideas into this implementation without crippling performance on random inputs, it's going to be very easy to extend. I also did several other minor improvements, like reworked insertion sort to make it faster.

There are also new, more thorough benchmarks and panic safety tests.

The final code is not terribly complex and has less unsafe code than I anticipated, but there's still plenty of it that should be carefully reviewed. I did my best at documenting non-obvious code.

I'd like to notify several people of this PR, since they might be interested and have useful insights:

  1. @huonw because he wrote the original merge sort.
  2. @alexcrichton because he was involved in multiple discussions of it.
  3. @veddan because he wrote introsort in Rust.
  4. @notriddle because he wrote TimSort in Rust.
  5. @bluss because he had an attempt at writing WikiSort in Rust.
  6. @gnzlbg, @rkruppe, and @mark-i-m because they were involved in discussion #36318.

P.S. quickersort describes itself as being universally faster than the standard sort, which is true. However, if this PR gets merged, things might change a bit. ;)

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Dec 6, 2016

Contributor

Since it is quite a major rewrite of the sorting routine, and I see that we have no test guaranteeing stability, I would very much like a test that ensures this invariant.

Something along the lines of

let x = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3];
let y = // array/vector/slice of references to each element of x. References compare the underlying value
y.shuffle();
let z = y.clone();
y.sort();
// assert that the addresses in y and z are ordered in the same order for each block of values.
Contributor

nagisa commented Dec 6, 2016

Since it is quite a major rewrite of the sorting routine, and I see that we have no test guaranteeing stability, I would very much like a test that ensures this invariant.

Something along the lines of

let x = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3];
let y = // array/vector/slice of references to each element of x. References compare the underlying value
y.shuffle();
let z = y.clone();
y.sort();
// assert that the addresses in y and z are ordered in the same order for each block of values.
@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 6, 2016

Contributor

Actually, we do have a sort guaranteeing stability. It's called test_sort_stability. However, it tests sort on short arrays only, essentially verifying correctness of just insertion sort. I'll bump the lengths up a bit.

Contributor

stjepang commented Dec 6, 2016

Actually, we do have a sort guaranteeing stability. It's called test_sort_stability. However, it tests sort on short arrays only, essentially verifying correctness of just insertion sort. I'll bump the lengths up a bit.

@bluss

bluss requested changes Dec 6, 2016 edited

I loved reading this. The request for changes is mostly comments and the guard struct names, but also a look at the zero sized types case, that it doesn't panic.

Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
len: left.len + right.len,
};
runs.remove(r + 1);
}
}

This comment has been minimized.

@bluss

bluss Dec 6, 2016

Contributor

There should be exactly one Run in the runs vector here, is that right? Maybe a debug assertion for that (so that tests exercise it)?

@bluss

bluss Dec 6, 2016

Contributor

There should be exactly one Run in the runs vector here, is that right? Maybe a debug assertion for that (so that tests exercise it)?

This comment has been minimized.

@stjepang

stjepang Dec 7, 2016

Contributor

Added a debug assertion.

@stjepang

stjepang Dec 7, 2016

Contributor

Added a debug assertion.

Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
///
/// The invariants ensure that the total running time is `O(n log n)` worst-case.
fn merge_sort<T, F>(v: &mut [T], mut compare: F)
where F: FnMut(&T, &T) -> Ordering

This comment has been minimized.

@gereeter

gereeter Dec 6, 2016

Contributor

As far as I can tell, this implementation of merge sort only ever compares the result of compare to Greater. It might be more efficient to just pass around a function greater: FnMut(&T, &T) -> bool, since I think that some types can implement binary comparisons slightly more efficiently than trinary comparisons.

@gereeter

gereeter Dec 6, 2016

Contributor

As far as I can tell, this implementation of merge sort only ever compares the result of compare to Greater. It might be more efficient to just pass around a function greater: FnMut(&T, &T) -> bool, since I think that some types can implement binary comparisons slightly more efficiently than trinary comparisons.

This comment has been minimized.

@stjepang

stjepang Dec 6, 2016

Contributor

How would we expose the ability to pass a function greater instead of compare in the public API, if at all?

@stjepang

stjepang Dec 6, 2016

Contributor

How would we expose the ability to pass a function greater instead of compare in the public API, if at all?

This comment has been minimized.

@sfackler

sfackler Dec 7, 2016

Member

It would probably only matter for sort, where we can call a.gt(b) rather than a.cmp(b) == Greater.

@sfackler

sfackler Dec 7, 2016

Member

It would probably only matter for sort, where we can call a.gt(b) rather than a.cmp(b) == Greater.

This comment has been minimized.

@EpicatSupercell

EpicatSupercell Dec 14, 2016

How about adding greater (and the others - less, less_eq, ...) to std::cmp::Ord trait with a default implementation using cmp? That will allow anyone to write specializations for any type. It's the same we are doing for std::iter::Iterator.

@EpicatSupercell

EpicatSupercell Dec 14, 2016

How about adding greater (and the others - less, less_eq, ...) to std::cmp::Ord trait with a default implementation using cmp? That will allow anyone to write specializations for any type. It's the same we are doing for std::iter::Iterator.

This comment has been minimized.

@bluss

bluss Dec 14, 2016

Contributor

gt is already that (PartialOrd and Ord must be in sync).

@bluss

bluss Dec 14, 2016

Contributor

gt is already that (PartialOrd and Ord must be in sync).

This comment has been minimized.

@EpicatSupercell

EpicatSupercell Dec 14, 2016

Oh, missed it because I foolishly looked at Ord.

@EpicatSupercell

EpicatSupercell Dec 14, 2016

Oh, missed it because I foolishly looked at Ord.

Show outdated Hide outdated src/libcollections/slice.rs
Show outdated Hide outdated src/libcollections/slice.rs
#![feature(trusted_len)]
#![feature(unicode)]
#![feature(unique)]
#![feature(slice_get_slice)]
#![feature(untagged_unions)]

This comment has been minimized.

@Stebalien

Stebalien Dec 6, 2016

Contributor

Maybe I'm missing something obvious but I don't see where you use this feature (or why you changed these).

@Stebalien

Stebalien Dec 6, 2016

Contributor

Maybe I'm missing something obvious but I don't see where you use this feature (or why you changed these).

This comment has been minimized.

@stjepang

stjepang Dec 6, 2016

Contributor

Three changes to feature list are in this file:

  1. Removed step_by. Old sort used it, but the new one doesn't.
  2. Simply moved slice_get_slice a few line up to make the feature list sorted alphabetically.
  3. Added untagged_unions. This feature is used in function insert_head above struct NoDrop<T>.
@stjepang

stjepang Dec 6, 2016

Contributor

Three changes to feature list are in this file:

  1. Removed step_by. Old sort used it, but the new one doesn't.
  2. Simply moved slice_get_slice a few line up to make the feature list sorted alphabetically.
  3. Added untagged_unions. This feature is used in function insert_head above struct NoDrop<T>.

This comment has been minimized.

@Stebalien

Stebalien Dec 7, 2016

Contributor

GitHub wasn't loading all the diffs... sigh (sorry). I was excited to see a non-ffi usecase for unions and got a bit confused when searching didn't pick up any.

@Stebalien

Stebalien Dec 7, 2016

Contributor

GitHub wasn't loading all the diffs... sigh (sorry). I was excited to see a non-ffi usecase for unions and got a bit confused when searching didn't pick up any.

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 7, 2016

Contributor

I addressed all requests for changes and improved comments in the code.

Contributor

stjepang commented Dec 7, 2016

I addressed all requests for changes and improved comments in the code.

@bluss

bluss approved these changes Dec 7, 2016

@mark-i-m

This comment has been minimized.

Show comment
Hide comment
@mark-i-m

mark-i-m Dec 7, 2016

Contributor

Re: #38192 (comment), sorting a collection of ZSTs

For what its worth, I think the only correct answer is to make it a nop. ZSTs are types with exactly one value. Thus, given a collection x of values of type T, for some ZST T, and two arbitrary values in the collection x[i] and x[j], we know that x[i] == x[j], so the only correct behavior (for a stable sort) is to preserve the exact order of the input collection, right?

Contributor

mark-i-m commented Dec 7, 2016

Re: #38192 (comment), sorting a collection of ZSTs

For what its worth, I think the only correct answer is to make it a nop. ZSTs are types with exactly one value. Thus, given a collection x of values of type T, for some ZST T, and two arbitrary values in the collection x[i] and x[j], we know that x[i] == x[j], so the only correct behavior (for a stable sort) is to preserve the exact order of the input collection, right?

@mark-i-m

This comment has been minimized.

Show comment
Hide comment
@mark-i-m

mark-i-m Dec 7, 2016

Contributor
Contributor

mark-i-m commented Dec 7, 2016

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Dec 7, 2016

Contributor

Is there a case for which this implementation performs worse than the old one?

Contributor

gnzlbg commented Dec 7, 2016

Is there a case for which this implementation performs worse than the old one?

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 7, 2016

Contributor

@gnzlbg I don't think so. Let's discuss a few cases:

  1. Small arrays (below the insertion sort threshold). I tried sorting big & small structures on ascending & descending & random inputs. The sorts are always either tied, or the new sort wins.

  2. Medium sized arrays (just above the insertion sort threshold). One might think that additional allocation caused by runs is troubling, but turns out it isn't. I tried poking at various cases. Again, the new sort wins in all of them, except a few in which they are tied.

  3. Large arrays. The new sort destroys it on partially sorted arrays, so such cases are no contest. The old sort is strongest on random inputs, and even then it's consistently behind the new sort. There is a test called sort_large_random_expensive, which sorts a random array using a very costly comparison function. I've examined the total number of comparisons required using the old vs new algorithm, and it turns out the new one performs a slightly smaller number of comparisons. It's no wonder it wins with a slight edge in that test, too.

Sorting depends on structure size, type of input, comparison function, your platform, etc. I can't guarantee the new sort always wins, but haven't found a case where it wouldn't. If it ever loses, I'm sure that would be a race lost by a very small margin.

Contributor

stjepang commented Dec 7, 2016

@gnzlbg I don't think so. Let's discuss a few cases:

  1. Small arrays (below the insertion sort threshold). I tried sorting big & small structures on ascending & descending & random inputs. The sorts are always either tied, or the new sort wins.

  2. Medium sized arrays (just above the insertion sort threshold). One might think that additional allocation caused by runs is troubling, but turns out it isn't. I tried poking at various cases. Again, the new sort wins in all of them, except a few in which they are tied.

  3. Large arrays. The new sort destroys it on partially sorted arrays, so such cases are no contest. The old sort is strongest on random inputs, and even then it's consistently behind the new sort. There is a test called sort_large_random_expensive, which sorts a random array using a very costly comparison function. I've examined the total number of comparisons required using the old vs new algorithm, and it turns out the new one performs a slightly smaller number of comparisons. It's no wonder it wins with a slight edge in that test, too.

Sorting depends on structure size, type of input, comparison function, your platform, etc. I can't guarantee the new sort always wins, but haven't found a case where it wouldn't. If it ever loses, I'm sure that would be a race lost by a very small margin.

@bluss bluss added the relnotes label Dec 7, 2016

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Dec 7, 2016

Contributor

@stjepang In practice, most data to be sorted has some structure and is rarely truly random. I think it would be worth it to check the benchmarking patterns of Boost.Sort, libc++ algorithm benchmarks, Morwenn/cpp-sort, ... (the licenses are permissive). For example, libc++-std::sort is benchmarked using the following patterns:

p8k6kxe

This makes libc++'s std::sort slower than libstdc++'s for truly random data, but much better when the data isn't truly random. Besides these patterns, the following patterns are also common:

See also this Boost.Sort issue.

IMO the new implementation doesn't need to be better than the old one in all of these cases, but I still would like to be sure that it doesn't blow up in any of them. libc++'s std::sort turned out to be N^2 for some particular patterns (see llvm's bugzilla for the pattern and the fix), we should try hard to avoid something like that from happening.

Pinging @Morwenn since he might be interested on this and might know other people who can give constructive feedback.

Contributor

gnzlbg commented Dec 7, 2016

@stjepang In practice, most data to be sorted has some structure and is rarely truly random. I think it would be worth it to check the benchmarking patterns of Boost.Sort, libc++ algorithm benchmarks, Morwenn/cpp-sort, ... (the licenses are permissive). For example, libc++-std::sort is benchmarked using the following patterns:

p8k6kxe

This makes libc++'s std::sort slower than libstdc++'s for truly random data, but much better when the data isn't truly random. Besides these patterns, the following patterns are also common:

See also this Boost.Sort issue.

IMO the new implementation doesn't need to be better than the old one in all of these cases, but I still would like to be sure that it doesn't blow up in any of them. libc++'s std::sort turned out to be N^2 for some particular patterns (see llvm's bugzilla for the pattern and the fix), we should try hard to avoid something like that from happening.

Pinging @Morwenn since he might be interested on this and might know other people who can give constructive feedback.

@notriddle

This comment has been minimized.

Show comment
Hide comment
@notriddle

notriddle Dec 7, 2016

Contributor

@gnzlbg That was already done, comparing this PR's modified TimSort with with the current MergeSort and an Introsort implementation: https://gist.github.com/stjepang/b9f0c3eaa0e1f1280b61b963dae19a30

Contributor

notriddle commented Dec 7, 2016

@gnzlbg That was already done, comparing this PR's modified TimSort with with the current MergeSort and an Introsort implementation: https://gist.github.com/stjepang/b9f0c3eaa0e1f1280b61b963dae19a30

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Dec 7, 2016

Contributor

ok, can you explain what the two performance listings mean and which numbers are for which implementation? Important information is missing like: larger is better(?)

Contributor

bluss commented Dec 7, 2016

ok, can you explain what the two performance listings mean and which numbers are for which implementation? Important information is missing like: larger is better(?)

@notriddle

This comment has been minimized.

Show comment
Hide comment
@notriddle

notriddle Dec 7, 2016

Contributor

Here's the program that generated those patterns: https://github.com/notriddle/quickersort/blob/master/examples/perf_txt.rs#L63

Also, here's how the old slice::sort did on the same benchmark: https://github.com/notriddle/quickersort/blob/master/perf.txt

Contributor

notriddle commented Dec 7, 2016

Here's the program that generated those patterns: https://github.com/notriddle/quickersort/blob/master/examples/perf_txt.rs#L63

Also, here's how the old slice::sort did on the same benchmark: https://github.com/notriddle/quickersort/blob/master/perf.txt

@Morwenn

This comment has been minimized.

Show comment
Hide comment
@Morwenn

Morwenn Dec 7, 2016

Another trick to speed up sorting algorithms is to switch to an unstable sorting algorithm when you can statically infer that the stability (or lack thereof) of the sorting algorithm isn't an observable effect. I don't know about Rust, but if you've got a plain array of simple integers and you sort it with a trivial comparator, if I'm not mistaken there's no way to tell the difference between a stable and an unstable sort.

Morwenn commented Dec 7, 2016

Another trick to speed up sorting algorithms is to switch to an unstable sorting algorithm when you can statically infer that the stability (or lack thereof) of the sorting algorithm isn't an observable effect. I don't know about Rust, but if you've got a plain array of simple integers and you sort it with a trivial comparator, if I'm not mistaken there's no way to tell the difference between a stable and an unstable sort.

@Stebalien

This comment has been minimized.

Show comment
Hide comment
@Stebalien

Stebalien Dec 7, 2016

Contributor

@Morwenn rust could do this through specialization and a unstable-sort-safe marker trait.

Contributor

Stebalien commented Dec 7, 2016

@Morwenn rust could do this through specialization and a unstable-sort-safe marker trait.

@notriddle

This comment has been minimized.

Show comment
Hide comment
@notriddle

notriddle Dec 7, 2016

Contributor

A brief description of what perf.txt is showing you

  • size is exactly what it sounds like it means. At size = 10, we're comparing the insertion sort implementations of quickersort and rust-lang/rust#38192

  • m is the "constant factor" used by the list generator and transformation.

  • pattern describes what kind of list should be generated at first. For examples where size = 20 and m = 5:

    • sawtooth: A repeating, incrementing list with period m: [ 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4 ]
    • rand: A series of random numbers (note that m is not used): [ 174485772, 204021123, 71605603, 334131482, 785758972, 574747816, 150801346, 844973720, 876360420, 210798088, 688904552, 975251835, 835778151, 935999844, 786148954, 779096211, 338255767, 826878933, 563001734, 315096245 ]
    • stagger: A classic, terrible pseduo-RNG with a short period: [ 0, 6, 12, 18, 4, 10, 16, 2, 8, 14, 0, 6, 12, 18, 4, 10, 16, 2, 8, 14 ]
    • plateau: A list that starts out incrementing but becomes repeating at m: [ 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 ]
    • shuffle: The result of applying a single pass of "bridge" shuffling to two incrementing lists (note that this is a lopsided shuffle unless m is 2): [ 0, 0, 2, 4, 4, 4, 6, 8, 12, 8, 14, 8, 12, 16, 14, 16, 18, 18, 20, 20 ]
  • After generating the pattern, we then generate a variant by applying a transformation to the list:

    • ident: Don't do any transformation. Note that rand / ident completely ignores m.
    • reverse: Turn it backwards. Note that this is semantically a no-op for rand, and ignores m.
    • reverse_front: Turn the first len / 2 items backwards. Note that this is semantically a no-op for rand, and ignores m.
    • reverse_back: Turn the last len / 2 items backwards. Note that this is semantically a no-op for rand, and ignores m.
    • sorted: Sort the list. Note that this is a no-op for plateau.
    • dither: Take all the items modulus m, thus increasing the number of duplicates. For example, this will turn plateau into [ 0, 1, 2, 3, 4, 0, 0, 0, 0, ... ].
  • After the two-step process of generating the list, the list is copied, and the copies are sorted with both sorting algorithms. The time is recorded, and the "throughput" is computed using the formula size / ( time / trial_count ). Larger throughput is better.

  • Finally, the ratio of throughputs is taken. A larger ratio means quickersort did better than standard sort, while a ratio smaller than 1 means standard sort did better.

Contributor

notriddle commented Dec 7, 2016

A brief description of what perf.txt is showing you

  • size is exactly what it sounds like it means. At size = 10, we're comparing the insertion sort implementations of quickersort and rust-lang/rust#38192

  • m is the "constant factor" used by the list generator and transformation.

  • pattern describes what kind of list should be generated at first. For examples where size = 20 and m = 5:

    • sawtooth: A repeating, incrementing list with period m: [ 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4 ]
    • rand: A series of random numbers (note that m is not used): [ 174485772, 204021123, 71605603, 334131482, 785758972, 574747816, 150801346, 844973720, 876360420, 210798088, 688904552, 975251835, 835778151, 935999844, 786148954, 779096211, 338255767, 826878933, 563001734, 315096245 ]
    • stagger: A classic, terrible pseduo-RNG with a short period: [ 0, 6, 12, 18, 4, 10, 16, 2, 8, 14, 0, 6, 12, 18, 4, 10, 16, 2, 8, 14 ]
    • plateau: A list that starts out incrementing but becomes repeating at m: [ 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 ]
    • shuffle: The result of applying a single pass of "bridge" shuffling to two incrementing lists (note that this is a lopsided shuffle unless m is 2): [ 0, 0, 2, 4, 4, 4, 6, 8, 12, 8, 14, 8, 12, 16, 14, 16, 18, 18, 20, 20 ]
  • After generating the pattern, we then generate a variant by applying a transformation to the list:

    • ident: Don't do any transformation. Note that rand / ident completely ignores m.
    • reverse: Turn it backwards. Note that this is semantically a no-op for rand, and ignores m.
    • reverse_front: Turn the first len / 2 items backwards. Note that this is semantically a no-op for rand, and ignores m.
    • reverse_back: Turn the last len / 2 items backwards. Note that this is semantically a no-op for rand, and ignores m.
    • sorted: Sort the list. Note that this is a no-op for plateau.
    • dither: Take all the items modulus m, thus increasing the number of duplicates. For example, this will turn plateau into [ 0, 1, 2, 3, 4, 0, 0, 0, 0, ... ].
  • After the two-step process of generating the list, the list is copied, and the copies are sorted with both sorting algorithms. The time is recorded, and the "throughput" is computed using the formula size / ( time / trial_count ). Larger throughput is better.

  • Finally, the ratio of throughputs is taken. A larger ratio means quickersort did better than standard sort, while a ratio smaller than 1 means standard sort did better.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Dec 7, 2016

Contributor

@notriddle Where are the benchmarks for the patterns: organ pipe and sawtooth? I cannot find them.

EDIT: the benchmarks are there (at least for sawtooth, can't still find organ pipe), but the results for those don't seem to have been posted in the OP?

Contributor

gnzlbg commented Dec 7, 2016

@notriddle Where are the benchmarks for the patterns: organ pipe and sawtooth? I cannot find them.

EDIT: the benchmarks are there (at least for sawtooth, can't still find organ pipe), but the results for those don't seem to have been posted in the OP?

@notriddle

This comment has been minimized.

Show comment
Hide comment
@notriddle

notriddle Dec 7, 2016

Contributor

Sawtooth is literally the first one shown.

Organ pipe is actually not there. I'd be happy to add it, though.

Contributor

notriddle commented Dec 7, 2016

Sawtooth is literally the first one shown.

Organ pipe is actually not there. I'd be happy to add it, though.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Dec 7, 2016

Contributor

Thanks for the explanation.

Contributor

bluss commented Dec 7, 2016

Thanks for the explanation.

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 7, 2016

Contributor

@gnzlbg Some of the patterns displayed in the picture you posted are already benchmarked as part of this PR. I get what you're saying about sorting structured data, and agree with you.

The new implementation is better than the old one on all of those benchmarks. That is undisputable even without any further benchmarking, really. And it won't blow up by design either. :) It's O(n log n) worst-case and finds natural runs in input data before sorting, so there's not much to worry.

It is certainly possible to harness more speed on structured data at the expense of speed on random data (and code complexity, too!). However, note that speed on structured data is already very good (look at the benchmarks we have).

Also, keep in mind that we really don't have much choice with stable sorts. TimSort is basically the only reasonably fast and versatile stable sort, and it was designed to be fast on structured data in the first place. Our choices boil down to: how far do we go with it, and how do we adapt TimSort (designed for Python) to Rust (a vastly different language)?

Contributor

stjepang commented Dec 7, 2016

@gnzlbg Some of the patterns displayed in the picture you posted are already benchmarked as part of this PR. I get what you're saying about sorting structured data, and agree with you.

The new implementation is better than the old one on all of those benchmarks. That is undisputable even without any further benchmarking, really. And it won't blow up by design either. :) It's O(n log n) worst-case and finds natural runs in input data before sorting, so there's not much to worry.

It is certainly possible to harness more speed on structured data at the expense of speed on random data (and code complexity, too!). However, note that speed on structured data is already very good (look at the benchmarks we have).

Also, keep in mind that we really don't have much choice with stable sorts. TimSort is basically the only reasonably fast and versatile stable sort, and it was designed to be fast on structured data in the first place. Our choices boil down to: how far do we go with it, and how do we adapt TimSort (designed for Python) to Rust (a vastly different language)?

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Dec 7, 2016

Contributor

The PR is a great improvement on several fronts (🎉 @stjepang!!) and I believe it to be correct and memory safe so it's r=me when the fixup commits have been squashed together like the author indicated they wanted to do.

There's a lot of interesting things that can follow here. We can create a meta issue to track the different ideas for those that are interested.

Contributor

bluss commented Dec 7, 2016

The PR is a great improvement on several fronts (🎉 @stjepang!!) and I believe it to be correct and memory safe so it's r=me when the fixup commits have been squashed together like the author indicated they wanted to do.

There's a lot of interesting things that can follow here. We can create a meta issue to track the different ideas for those that are interested.

Implement a faster sort algorithm
This is a complete rewrite of the standard sort algorithm. The new algorithm
is a simplified variant of TimSort. In summary, the changes are:

* Improved performance, especially on partially sorted inputs.
* Performs less comparisons on both random and partially sorted inputs.
* Decreased the size of temporary memory: the new sort allocates 4x less.
@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 7, 2016

Contributor

The fixup commits are squashed now.

Contributor

stjepang commented Dec 7, 2016

The fixup commits are squashed now.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Dec 7, 2016

Contributor

@bors r+

Contributor

bluss commented Dec 7, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Dec 7, 2016

Contributor

📌 Commit c8d73ea has been approved by bluss

Contributor

bors commented Dec 7, 2016

📌 Commit c8d73ea has been approved by bluss

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Dec 8, 2016

Contributor

⌛️ Testing commit c8d73ea with merge 7d4c55a...

Contributor

bors commented Dec 8, 2016

⌛️ Testing commit c8d73ea with merge 7d4c55a...

bors added a commit that referenced this pull request Dec 8, 2016

Auto merge of #38192 - stjepang:faster-sort-algorithm, r=bluss
Implement a faster sort algorithm

Hi everyone, this is my first PR.

I've made some changes to the standard sort algorithm, starting out with a few tweaks here and there, but in the end this endeavour became a complete rewrite of it.

#### Summary

Changes:

* Improved performance, especially on partially sorted inputs.
* Performs less comparisons on both random and partially sorted inputs.
* Decreased the size of temporary memory: the new sort allocates 4x less.

Benchmark:

```
 name                                        out1 ns/iter          out2 ns/iter          diff ns/iter   diff %
 slice::bench::sort_large_ascending          85,323 (937 MB/s)     8,970 (8918 MB/s)          -76,353  -89.49%
 slice::bench::sort_large_big_ascending      2,135,297 (599 MB/s)  355,955 (3595 MB/s)     -1,779,342  -83.33%
 slice::bench::sort_large_big_descending     2,266,402 (564 MB/s)  416,479 (3073 MB/s)     -1,849,923  -81.62%
 slice::bench::sort_large_big_random         3,053,031 (419 MB/s)  1,921,389 (666 MB/s)    -1,131,642  -37.07%
 slice::bench::sort_large_descending         313,181 (255 MB/s)    14,725 (5432 MB/s)        -298,456  -95.30%
 slice::bench::sort_large_mostly_ascending   287,706 (278 MB/s)    243,204 (328 MB/s)         -44,502  -15.47%
 slice::bench::sort_large_mostly_descending  415,078 (192 MB/s)    271,028 (295 MB/s)        -144,050  -34.70%
 slice::bench::sort_large_random             545,872 (146 MB/s)    521,559 (153 MB/s)         -24,313   -4.45%
 slice::bench::sort_large_random_expensive   30,321,770 (2 MB/s)   23,533,735 (3 MB/s)     -6,788,035  -22.39%
 slice::bench::sort_medium_ascending         616 (1298 MB/s)       155 (5161 MB/s)               -461  -74.84%
 slice::bench::sort_medium_descending        1,952 (409 MB/s)      202 (3960 MB/s)             -1,750  -89.65%
 slice::bench::sort_medium_random            3,646 (219 MB/s)      3,421 (233 MB/s)              -225   -6.17%
 slice::bench::sort_small_ascending          39 (2051 MB/s)        34 (2352 MB/s)                  -5  -12.82%
 slice::bench::sort_small_big_ascending      96 (13333 MB/s)       96 (13333 MB/s)                  0    0.00%
 slice::bench::sort_small_big_descending     248 (5161 MB/s)       243 (5267 MB/s)                 -5   -2.02%
 slice::bench::sort_small_big_random         501 (2554 MB/s)       490 (2612 MB/s)                -11   -2.20%
 slice::bench::sort_small_descending         95 (842 MB/s)         63 (1269 MB/s)                 -32  -33.68%
 slice::bench::sort_small_random             372 (215 MB/s)        354 (225 MB/s)                 -18   -4.84%
```

#### Background

First, let me just do a quick brain dump to discuss what I learned along the way.

The official documentation says that the standard sort in Rust is a stable sort. This constraint is thus set in stone and immediately rules out many popular sorting algorithms. Essentially, the only algorithms we might even take into consideration are:

1. [Merge sort](https://en.wikipedia.org/wiki/Merge_sort)
2. [Block sort](https://en.wikipedia.org/wiki/Block_sort) (famous implementations are [WikiSort](https://github.com/BonzaiThePenguin/WikiSort) and [GrailSort](https://github.com/Mrrl/GrailSort))
3. [TimSort](https://en.wikipedia.org/wiki/Timsort)

Actually, all of those are just merge sort flavors. :) The current standard sort in Rust is a simple iterative merge sort. It has three problems. First, it's slow on partially sorted inputs (even though #29675 helped quite a bit). Second, it always makes around `log(n)` iterations copying the entire array between buffers, no matter what. Third, it allocates huge amounts of temporary memory (a buffer of size `2*n`, where `n` is the size of input).

The problem of auxilliary memory allocation is a tough one. Ideally, it would be best for our sort to allocate `O(1)` additional memory. This is what block sort (and it's variants) does. However, it's often very complicated (look at [this](https://github.com/BonzaiThePenguin/WikiSort/blob/master/WikiSort.cpp)) and even then performs rather poorly. The author of WikiSort claims good performance, but that must be taken with a grain of salt. It performs well in comparison to `std::stable_sort` in C++. It can even beat `std::sort` on partially sorted inputs, but on random inputs it's always far worse. My rule of thumb is: high performance, low memory overhead, stability - choose two.

TimSort is another option. It allocates a buffer of size `n/2`, which is not great, but acceptable. Performs extremelly well on partially sorted inputs. However, it seems pretty much all implementations suck on random inputs. I benchmarked implementations in [Rust](https://github.com/notriddle/rust-timsort), [C++](https://github.com/gfx/cpp-TimSort), and [D](https://github.com/dlang/phobos/blob/fd518eb310a9494cccf28c54892542b052c49669/std/algorithm/sorting.d#L2062). The results were a bit disappointing. It seems bad performance is due to complex galloping procedures in hot loops. Galloping noticeably improves performance on partially sorted inputs, but worsens it on random ones.

#### The new algorithm

Choosing the best algorithm is not easy. Plain merge sort is bad on partially sorted inputs. TimSort is bad on random inputs and block sort is even worse. However, if we take the main ideas from TimSort (intelligent merging strategy of sorted runs) and drop galloping, then we'll have great performance on random inputs and it won't be bad on partially sorted inputs either.

That is exactly what this new algorithm does. I can't call it TimSort, since it steals just a few of it's ideas. Complete TimSort would be a much more complex and elaborate implementation. In case we in the future figure out how to incorporate more of it's ideas into this implementation without crippling performance on random inputs, it's going to be very easy to extend. I also did several other minor improvements, like reworked insertion sort to make it faster.

There are also new, more thorough benchmarks and panic safety tests.

The final code is not terribly complex and has less unsafe code than I anticipated, but there's still plenty of it that should be carefully reviewed. I did my best at documenting non-obvious code.

I'd like to notify several people of this PR, since they might be interested and have useful insights:

1. @huonw because he wrote the [original merge sort](#11064).
2. @alexcrichton because he was involved in multiple discussions of it.
3. @veddan because he wrote [introsort](https://github.com/veddan/rust-introsort) in Rust.
4. @notriddle because he wrote [TimSort](https://github.com/notriddle/rust-timsort) in Rust.
5. @bluss because he had an attempt at writing WikiSort in Rust.
6. @gnzlbg, @rkruppe, and @mark-i-m because they were involved in discussion #36318.

**P.S.** [quickersort](https://github.com/notriddle/quickersort) describes itself as being universally [faster](https://github.com/notriddle/quickersort/blob/master/perf.txt) than the standard sort, which is true. However, if this PR gets merged, things might [change](https://gist.github.com/stjepang/b9f0c3eaa0e1f1280b61b963dae19a30) a bit. ;)
@notriddle

This comment has been minimized.

Show comment
Hide comment
@notriddle

notriddle Dec 8, 2016

Contributor

Why did the Travis build fail?

Contributor

notriddle commented Dec 8, 2016

Why did the Travis build fail?

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 8, 2016

Contributor

@notriddle Due to an unrelated problem. Looks like all builds have been failing for the past day or two.
https://travis-ci.org/rust-lang/rust/builds

Contributor

stjepang commented Dec 8, 2016

@notriddle Due to an unrelated problem. Looks like all builds have been failing for the past day or two.
https://travis-ci.org/rust-lang/rust/builds

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Dec 8, 2016

Contributor

💔 Test failed - auto-linux-64-x-android-t

Contributor

bors commented Dec 8, 2016

💔 Test failed - auto-linux-64-x-android-t

Inline nested fn collapse
Since merge_sort is generic and collapse isn't, that means calls to
collapse won't be inlined.  inlined. Therefore, we must stick an
`#[inline]` above `fn collapse`.
@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Dec 8, 2016

Contributor

@bors r+

Contributor

bluss commented Dec 8, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Dec 8, 2016

Contributor

📌 Commit c0e150a has been approved by bluss

Contributor

bors commented Dec 8, 2016

📌 Commit c0e150a has been approved by bluss

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Dec 9, 2016

Contributor

⌛️ Testing commit c0e150a with merge dedd985...

Contributor

bors commented Dec 9, 2016

⌛️ Testing commit c0e150a with merge dedd985...

bors added a commit that referenced this pull request Dec 9, 2016

Auto merge of #38192 - stjepang:faster-sort-algorithm, r=bluss
Implement a faster sort algorithm

Hi everyone, this is my first PR.

I've made some changes to the standard sort algorithm, starting out with a few tweaks here and there, but in the end this endeavour became a complete rewrite of it.

#### Summary

Changes:

* Improved performance, especially on partially sorted inputs.
* Performs less comparisons on both random and partially sorted inputs.
* Decreased the size of temporary memory: the new sort allocates 4x less.

Benchmark:

```
 name                                        out1 ns/iter          out2 ns/iter          diff ns/iter   diff %
 slice::bench::sort_large_ascending          85,323 (937 MB/s)     8,970 (8918 MB/s)          -76,353  -89.49%
 slice::bench::sort_large_big_ascending      2,135,297 (599 MB/s)  355,955 (3595 MB/s)     -1,779,342  -83.33%
 slice::bench::sort_large_big_descending     2,266,402 (564 MB/s)  416,479 (3073 MB/s)     -1,849,923  -81.62%
 slice::bench::sort_large_big_random         3,053,031 (419 MB/s)  1,921,389 (666 MB/s)    -1,131,642  -37.07%
 slice::bench::sort_large_descending         313,181 (255 MB/s)    14,725 (5432 MB/s)        -298,456  -95.30%
 slice::bench::sort_large_mostly_ascending   287,706 (278 MB/s)    243,204 (328 MB/s)         -44,502  -15.47%
 slice::bench::sort_large_mostly_descending  415,078 (192 MB/s)    271,028 (295 MB/s)        -144,050  -34.70%
 slice::bench::sort_large_random             545,872 (146 MB/s)    521,559 (153 MB/s)         -24,313   -4.45%
 slice::bench::sort_large_random_expensive   30,321,770 (2 MB/s)   23,533,735 (3 MB/s)     -6,788,035  -22.39%
 slice::bench::sort_medium_ascending         616 (1298 MB/s)       155 (5161 MB/s)               -461  -74.84%
 slice::bench::sort_medium_descending        1,952 (409 MB/s)      202 (3960 MB/s)             -1,750  -89.65%
 slice::bench::sort_medium_random            3,646 (219 MB/s)      3,421 (233 MB/s)              -225   -6.17%
 slice::bench::sort_small_ascending          39 (2051 MB/s)        34 (2352 MB/s)                  -5  -12.82%
 slice::bench::sort_small_big_ascending      96 (13333 MB/s)       96 (13333 MB/s)                  0    0.00%
 slice::bench::sort_small_big_descending     248 (5161 MB/s)       243 (5267 MB/s)                 -5   -2.02%
 slice::bench::sort_small_big_random         501 (2554 MB/s)       490 (2612 MB/s)                -11   -2.20%
 slice::bench::sort_small_descending         95 (842 MB/s)         63 (1269 MB/s)                 -32  -33.68%
 slice::bench::sort_small_random             372 (215 MB/s)        354 (225 MB/s)                 -18   -4.84%
```

#### Background

First, let me just do a quick brain dump to discuss what I learned along the way.

The official documentation says that the standard sort in Rust is a stable sort. This constraint is thus set in stone and immediately rules out many popular sorting algorithms. Essentially, the only algorithms we might even take into consideration are:

1. [Merge sort](https://en.wikipedia.org/wiki/Merge_sort)
2. [Block sort](https://en.wikipedia.org/wiki/Block_sort) (famous implementations are [WikiSort](https://github.com/BonzaiThePenguin/WikiSort) and [GrailSort](https://github.com/Mrrl/GrailSort))
3. [TimSort](https://en.wikipedia.org/wiki/Timsort)

Actually, all of those are just merge sort flavors. :) The current standard sort in Rust is a simple iterative merge sort. It has three problems. First, it's slow on partially sorted inputs (even though #29675 helped quite a bit). Second, it always makes around `log(n)` iterations copying the entire array between buffers, no matter what. Third, it allocates huge amounts of temporary memory (a buffer of size `2*n`, where `n` is the size of input).

The problem of auxilliary memory allocation is a tough one. Ideally, it would be best for our sort to allocate `O(1)` additional memory. This is what block sort (and it's variants) does. However, it's often very complicated (look at [this](https://github.com/BonzaiThePenguin/WikiSort/blob/master/WikiSort.cpp)) and even then performs rather poorly. The author of WikiSort claims good performance, but that must be taken with a grain of salt. It performs well in comparison to `std::stable_sort` in C++. It can even beat `std::sort` on partially sorted inputs, but on random inputs it's always far worse. My rule of thumb is: high performance, low memory overhead, stability - choose two.

TimSort is another option. It allocates a buffer of size `n/2`, which is not great, but acceptable. Performs extremelly well on partially sorted inputs. However, it seems pretty much all implementations suck on random inputs. I benchmarked implementations in [Rust](https://github.com/notriddle/rust-timsort), [C++](https://github.com/gfx/cpp-TimSort), and [D](https://github.com/dlang/phobos/blob/fd518eb310a9494cccf28c54892542b052c49669/std/algorithm/sorting.d#L2062). The results were a bit disappointing. It seems bad performance is due to complex galloping procedures in hot loops. Galloping noticeably improves performance on partially sorted inputs, but worsens it on random ones.

#### The new algorithm

Choosing the best algorithm is not easy. Plain merge sort is bad on partially sorted inputs. TimSort is bad on random inputs and block sort is even worse. However, if we take the main ideas from TimSort (intelligent merging strategy of sorted runs) and drop galloping, then we'll have great performance on random inputs and it won't be bad on partially sorted inputs either.

That is exactly what this new algorithm does. I can't call it TimSort, since it steals just a few of it's ideas. Complete TimSort would be a much more complex and elaborate implementation. In case we in the future figure out how to incorporate more of it's ideas into this implementation without crippling performance on random inputs, it's going to be very easy to extend. I also did several other minor improvements, like reworked insertion sort to make it faster.

There are also new, more thorough benchmarks and panic safety tests.

The final code is not terribly complex and has less unsafe code than I anticipated, but there's still plenty of it that should be carefully reviewed. I did my best at documenting non-obvious code.

I'd like to notify several people of this PR, since they might be interested and have useful insights:

1. @huonw because he wrote the [original merge sort](#11064).
2. @alexcrichton because he was involved in multiple discussions of it.
3. @veddan because he wrote [introsort](https://github.com/veddan/rust-introsort) in Rust.
4. @notriddle because he wrote [TimSort](https://github.com/notriddle/rust-timsort) in Rust.
5. @bluss because he had an attempt at writing WikiSort in Rust.
6. @gnzlbg, @rkruppe, and @mark-i-m because they were involved in discussion #36318.

**P.S.** [quickersort](https://github.com/notriddle/quickersort) describes itself as being universally [faster](https://github.com/notriddle/quickersort/blob/master/perf.txt) than the standard sort, which is true. However, if this PR gets merged, things might [change](https://gist.github.com/stjepang/b9f0c3eaa0e1f1280b61b963dae19a30) a bit. ;)

@bors bors merged commit c0e150a into rust-lang:master Dec 9, 2016

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
homu Test successful
Details

@stjepang stjepang deleted the stjepang:faster-sort-algorithm branch Dec 9, 2016

/// satisfied, for every `i` in `0 .. runs.len() - 2`:
///
/// 1. `runs[i].len > runs[i + 1].len`
/// 2. `runs[i].len > runs[i + 1].len + runs[i + 2].len`

This comment has been minimized.

@tbelaire

tbelaire Dec 14, 2016

Contributor

Uh, I think there's a typo in your invariant?
From the timsort post

What turned out to be a good compromise maintains two invariants on the
stack entries, where A, B and C are the lengths of the three righmost not-yet
merged slices:

  1. A > B+C
  2. B > C

But here you have

  1. A > B
  2. A > B + C
    which seems incorrect.
@tbelaire

tbelaire Dec 14, 2016

Contributor

Uh, I think there's a typo in your invariant?
From the timsort post

What turned out to be a good compromise maintains two invariants on the
stack entries, where A, B and C are the lengths of the three righmost not-yet
merged slices:

  1. A > B+C
  2. B > C

But here you have

  1. A > B
  2. A > B + C
    which seems incorrect.

This comment has been minimized.

@stjepang

stjepang Dec 15, 2016

Contributor

Good catch, there's a minor off-by-one error in the comment. However, probably not for the reason you have in mind :)

Take a look at fn collapse - it's equivalent to the corrected and formally verified mergeCollapse presented in the paper (link). Let's trust that one instead of the original TimSort post from 2002.

I suggest rewriting the comment like this:

/// satisfied:
///
/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len + runs[i].len`

These invariants are exactly what fn collapse is enforcing.
What do you think - does that look good to you?

@stjepang

stjepang Dec 15, 2016

Contributor

Good catch, there's a minor off-by-one error in the comment. However, probably not for the reason you have in mind :)

Take a look at fn collapse - it's equivalent to the corrected and formally verified mergeCollapse presented in the paper (link). Let's trust that one instead of the original TimSort post from 2002.

I suggest rewriting the comment like this:

/// satisfied:
///
/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len + runs[i].len`

These invariants are exactly what fn collapse is enforcing.
What do you think - does that look good to you?

@TomOnTime

This comment has been minimized.

Show comment
Hide comment
@TomOnTime

TomOnTime Dec 15, 2016

I'm not usually this particular about English grammar, but the sentence "Performs less comparisons on both random and partially sorted inputs." should be changed s/less/fewer/g. Less refers to non-countable items (I'm less happy today than yesterday). Fewer refers to countable items ("Mary has fewer apples than John.").

TomOnTime commented Dec 15, 2016

I'm not usually this particular about English grammar, but the sentence "Performs less comparisons on both random and partially sorted inputs." should be changed s/less/fewer/g. Less refers to non-countable items (I'm less happy today than yesterday). Fewer refers to countable items ("Mary has fewer apples than John.").

@mark-i-m

This comment has been minimized.

Show comment
Hide comment
@mark-i-m

mark-i-m Dec 15, 2016

Contributor

Lol, grammar makes me fewer happy :P

Contributor

mark-i-m commented Dec 15, 2016

Lol, grammar makes me fewer happy :P

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Dec 15, 2016

Contributor

Heh, fixed :) Feel free to point out any grammar mistakes in comments within the code.

Contributor

stjepang commented Dec 15, 2016

Heh, fixed :) Feel free to point out any grammar mistakes in comments within the code.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Mar 17, 2017

Member

For those interested, the pdqsort implementation is here: #40601

Member

alexcrichton commented Mar 17, 2017

For those interested, the pdqsort implementation is here: #40601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment