New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement feature sort_unstable #40601

Merged
merged 9 commits into from Mar 21, 2017

Conversation

Projects
None yet
5 participants
@stjepang
Copy link
Contributor

stjepang commented Mar 17, 2017

Tracking issue for the feature: #40585

This is essentially integration of pdqsort into libcore.

There's plenty of unsafe blocks to review. The heart of pdqsort is fn partition_in_blocks and is probably the most challenging function to understand. It requires some patience, but let me know if you find it too difficult - comments could always be improved.

Changes

  • Added sort_unstable feature.
  • Tweaked insertion sort constants for stable sort. Sorting integers is now up to 5% slower, but sorting big elements is much faster (in particular, sort_large_big_random is 35% faster). The old constants were highly optimized for sorting integers, so overall the configuration is more balanced now. A minor regression in case of integers is forgivable as we recently had performance improvements (#39538) that completely make up for it.
  • Removed some uninteresting sort benchmarks.
  • Added a new sort benchmark for string sorting.

Benchmarks

The following table compares stable and unstable sorting:

name                                 stable ns/iter        unstable ns/iter     diff ns/iter   diff %
slice::sort_large_ascending          7,240 (11049 MB/s)    7,380 (10840 MB/s)            140    1.93%
slice::sort_large_big_random         1,454,138 (880 MB/s)  910,269 (1406 MB/s)      -543,869  -37.40%
slice::sort_large_descending         13,450 (5947 MB/s)    10,895 (7342 MB/s)         -2,555  -19.00%
slice::sort_large_mostly_ascending   204,041 (392 MB/s)    88,639 (902 MB/s)        -115,402  -56.56%
slice::sort_large_mostly_descending  217,109 (368 MB/s)    99,009 (808 MB/s)        -118,100  -54.40%
slice::sort_large_random             477,257 (167 MB/s)    346,028 (231 MB/s)       -131,229  -27.50%
slice::sort_large_random_expensive   21,670,537 (3 MB/s)   22,710,238 (3 MB/s)     1,039,701    4.80%
slice::sort_large_strings            6,284,499 (38 MB/s)   6,410,896 (37 MB/s)       126,397    2.01%
slice::sort_medium_random            3,515 (227 MB/s)      3,327 (240 MB/s)             -188   -5.35%
slice::sort_small_ascending          42 (1904 MB/s)        41 (1951 MB/s)                 -1   -2.38%
slice::sort_small_big_random         503 (2544 MB/s)       514 (2490 MB/s)                11    2.19%
slice::sort_small_descending         72 (1111 MB/s)        69 (1159 MB/s)                 -3   -4.17%
slice::sort_small_random             369 (216 MB/s)        367 (217 MB/s)                 -2   -0.54%

Interesting cases:

  • Expensive comparison function and string sorting - it's a really close race, but timsort performs a slightly smaller number of comparisons. This is a natural difference of bottom-up merging versus top-down partitioning.
  • large_descending - unstable sort is slightly faster. Both just check whether the slice is descending and if so, they reverse it. I believe the slight difference in performance is because unstable sort traces the slice forwards, while stable sort goes backwards.

r? @alexcrichton

/// ```
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub fn sort_by<F>(&mut self, mut compare: F)

This comment has been minimized.

@stjepang

stjepang Mar 17, 2017

Contributor

This function is not added or anything. It was just reordered with sort_by_key.
Previous order: sort, sort_by_key, sort_by.
Current order: sort, sort_by, sort_by_key.

/// ```
///
/// [pdqsort]: https://github.com/orlp/pdqsort
// FIXME #40585: Mention `sort_unstable` in the documentation for `sort`.

This comment has been minimized.

@stjepang

stjepang Mar 17, 2017

Contributor

FIXME because we shouldn't mention unstable functions in the documentation for stable functions.... right?
I assume we'll fix this as soon as the feature gets stabilized.

This comment has been minimized.

@alexcrichton

alexcrichton Mar 17, 2017

Member

Yeah that seems reasonable to me, punting on this for now until the function is stabilized.

// to perform better than merge sort. For bigger types `T`, the threshold is smaller.
//
// Short runs are extended using insertion sort to span at least `min_run` elements, in order
// to improve performance.

This comment has been minimized.

@stjepang

stjepang Mar 17, 2017

Contributor

I don't believe it's worthwhile differentiating between big and small Ts - it doesn't have a large effect.

Instead, it's better to focus on some middle ground that works well all around: big/small Ts and expensive/cheap comparison functions as well. I took a look at a std::stable_sort implementation in C++ and it has similar constants.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 17, 2017

☔️ The latest upstream changes (presumably #40598) made this pull request unmergeable. Please resolve the merge conflicts.

@stjepang stjepang force-pushed the stjepang:sort-unstable branch from f421527 to 50eb24b Mar 17, 2017

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 17, 2017

I have troubles making Travis tests pass - the problem is in src/libcollections/lib.rs, line #![feature(sort_unstable)].

The compiler warns about unused future, and warnings are configured to be denied.
I tried changing to #![cfg_attr(not(test), feature(sort_unstable))], but then doctests complain that the feature is not enabled.

Whatever I do, the compiler complains one way or another... Any idea how to fix that? :/

@alexcrichton
Copy link
Member

alexcrichton left a comment

Awesome implementation, thanks for the PR!

I didn't review the sorting itself in too close detail, but everything I looked at seemed quite solid.

@@ -52,6 +52,7 @@
#![feature(shared)]
#![feature(slice_get_slice)]
#![feature(slice_patterns)]
#![feature(sort_unstable)]

This comment has been minimized.

@alexcrichton

alexcrichton Mar 17, 2017

Member

I think Travis is failing on this line during tests: https://travis-ci.org/rust-lang/rust/jobs/212204751

/// ```
///
/// [pdqsort]: https://github.com/orlp/pdqsort
// FIXME #40585: Mention `sort_unstable` in the documentation for `sort`.

This comment has been minimized.

@alexcrichton

alexcrichton Mar 17, 2017

Member

Yeah that seems reasonable to me, punting on this for now until the function is stabilized.

/// # Examples
///
/// ```
/// let mut v = [-5, 4, 1, -3, 2];

This comment has been minimized.

@alexcrichton

alexcrichton Mar 17, 2017

Member

I think you'll need to add #![feature] to this doctest to get past bors

This comment has been minimized.

@stjepang

stjepang Mar 18, 2017

Contributor

That worked, thanks!

//! Unstable sorting is compatible with libcore because it doesn't allocate memory, unlike our
//! stable sorting implementation.

#![unstable(feature = "sort_unstable", issue = "40585")]

This comment has been minimized.

@alexcrichton

alexcrichton Mar 17, 2017

Member

It should be ok to omit this annotation I believe as it's a private module anyway.

This comment has been minimized.

@stjepang

stjepang Mar 18, 2017

Contributor

Done

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 17, 2017

Whatever I do, the compiler complains one way or another... Any idea how to fix that? :/

Oh your fix for lib.rs is correct I believe, you'll just need to add #![feature(...)] to the doctests themselves I believe.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 17, 2017

cc @steveklabnik

Not long ago we discussed what guarantees sort should provide.
This unstable sort is somewhat relevant: you might want to take a look at documentation for sort_unstable, which lives in src/libcollections/slice.rs.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 18, 2017

I managed to close the gap between stable and unstable sorting on ascending & descending inputs (see the new benchmarks). :)

One more thing: pdqsort falls back to heapsort after several imbalanced partitions. I'm pretty sure we never hit this case in our tests, so it'd be wise to write tests for heapsort (fn heapsort in sort.rs).
However, I don't know how to call that function from libcoretest without making it public. Suggestions?

@stjepang stjepang force-pushed the stjepang:sort-unstable branch 4 times, most recently from 6764eb0 to 7322ce8 Mar 18, 2017

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 19, 2017

☔️ The latest upstream changes (presumably #40538) made this pull request unmergeable. Please resolve the merge conflicts.

@orlp

This comment has been minimized.

Copy link
Contributor

orlp commented Mar 19, 2017

@stjepang Perhaps you can trigger the worst case using a modification of the technique described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy.

I used it a couple years ago to prove that libc++'s sort routine is quadratic.

@orlp

This comment has been minimized.

Copy link
Contributor

orlp commented Mar 19, 2017

Also, for those interested in a deeper explanation in the how/why of pattern-defeating quicksort, a (nearly finished) draft of my paper is available here: https://drive.google.com/file/d/0B1-vl-dPgKm_T0Fxeno1a0lGT0E/view

There are some subtle differences between my C++ implementation and @stjepang's version due to design goals, so some parts might not apply, but most of it does.

@stjepang stjepang force-pushed the stjepang:sort-unstable branch 4 times, most recently from b22c8ca to 36bc0f4 Mar 19, 2017

@arthurprs

This comment has been minimized.

Copy link
Contributor

arthurprs commented Mar 20, 2017

@stjepang I guess the new pub(...) syntax doesn't help here since code/test is in different crates?

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 21, 2017

@arthurprs No, doesn't help unfortunately. They are totally separate crates.

However, I did a workaround @alexcrichton suggested to me over IRC - expose the heapsort function publicly, but gate it through a dummy feature. Can't say I feel good about this hack, but apparently there's no better option at the moment.

@stjepang stjepang force-pushed the stjepang:sort-unstable branch from c86c8ee to 038d605 Mar 21, 2017

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 21, 2017

@bors: r+

Yeah perhaps one day we'll be able to move tests back into core (maybe selectively this time) but for now this is functionally the same (an internal function to the library)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 21, 2017

📌 Commit 038d605 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 21, 2017

🔒 Merge conflict

@stjepang stjepang force-pushed the stjepang:sort-unstable branch from 038d605 to a718051 Mar 21, 2017

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 21, 2017

@bors: r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 21, 2017

📌 Commit a718051 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 21, 2017

⌛️ Testing commit a718051 with merge cab4bff...

bors added a commit that referenced this pull request Mar 21, 2017

Auto merge of #40601 - stjepang:sort-unstable, r=alexcrichton
Implement feature sort_unstable

Tracking issue for the feature: #40585

This is essentially integration of [pdqsort](https://github.com/stjepang/pdqsort) into libcore.

There's plenty of unsafe blocks to review. The heart of pdqsort is `fn partition_in_blocks` and is probably the most challenging function to understand. It requires some patience, but let me know if you find it too difficult - comments could always be improved.

#### Changes

* Added `sort_unstable` feature.
* Tweaked insertion sort constants for stable sort. Sorting integers is now up to 5% slower, but sorting big elements is much faster (in particular, `sort_large_big_random` is 35% faster). The old constants were highly optimized for sorting integers, so overall the configuration is more balanced now. A minor regression in case of integers is forgivable as we recently had performance improvements (#39538) that completely make up for it.
* Removed some uninteresting sort benchmarks.
* Added a new sort benchmark for string sorting.

#### Benchmarks

The following table compares stable and unstable sorting:
```
name                                 stable ns/iter        unstable ns/iter     diff ns/iter   diff %
slice::sort_large_ascending          7,240 (11049 MB/s)    7,380 (10840 MB/s)            140    1.93%
slice::sort_large_big_random         1,454,138 (880 MB/s)  910,269 (1406 MB/s)      -543,869  -37.40%
slice::sort_large_descending         13,450 (5947 MB/s)    10,895 (7342 MB/s)         -2,555  -19.00%
slice::sort_large_mostly_ascending   204,041 (392 MB/s)    88,639 (902 MB/s)        -115,402  -56.56%
slice::sort_large_mostly_descending  217,109 (368 MB/s)    99,009 (808 MB/s)        -118,100  -54.40%
slice::sort_large_random             477,257 (167 MB/s)    346,028 (231 MB/s)       -131,229  -27.50%
slice::sort_large_random_expensive   21,670,537 (3 MB/s)   22,710,238 (3 MB/s)     1,039,701    4.80%
slice::sort_large_strings            6,284,499 (38 MB/s)   6,410,896 (37 MB/s)       126,397    2.01%
slice::sort_medium_random            3,515 (227 MB/s)      3,327 (240 MB/s)             -188   -5.35%
slice::sort_small_ascending          42 (1904 MB/s)        41 (1951 MB/s)                 -1   -2.38%
slice::sort_small_big_random         503 (2544 MB/s)       514 (2490 MB/s)                11    2.19%
slice::sort_small_descending         72 (1111 MB/s)        69 (1159 MB/s)                 -3   -4.17%
slice::sort_small_random             369 (216 MB/s)        367 (217 MB/s)                 -2   -0.54%
```

Interesting cases:
* Expensive comparison function and string sorting - it's a really close race, but timsort performs a slightly smaller number of comparisons. This is a natural difference of bottom-up merging versus top-down partitioning.
* `large_descending` - unstable sort is faster, but both sorts should have equivalent performance. Both just check whether the slice is descending and if so, they reverse it. I blame LLVM for the discrepancy.

r? @alexcrichton
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 21, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing cab4bff to master...

@bors bors merged commit a718051 into rust-lang:master Mar 21, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@bors bors referenced this pull request Mar 21, 2017

Merged

Implement Manually Drop #40559

@stjepang stjepang deleted the stjepang:sort-unstable branch Mar 21, 2017

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Mar 27, 2017

Rollup merge of rust-lang#40832 - pftbest:fix_msp430, r=stjepang
libcore: fix compilation on 16bit target (MSP430).

Since PR rust-lang#40601 has been merged, libcore no longer compiles on MSP430.
The reason is this code in `break_patterns`:
```rust
 let mut random = len;
 random ^= random << 13;
 random ^= random >> 17;
 random ^= random << 5;
 random &= modulus - 1;
```
It assumes that `len` is at least a 32 bit integer.
As a workaround replace `break_patterns` with an empty function for 16bit targets.

cc @stjepang
cc @alexcrichton

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Mar 27, 2017

Rollup merge of rust-lang#40832 - pftbest:fix_msp430, r=stjepang
libcore: fix compilation on 16bit target (MSP430).

Since PR rust-lang#40601 has been merged, libcore no longer compiles on MSP430.
The reason is this code in `break_patterns`:
```rust
 let mut random = len;
 random ^= random << 13;
 random ^= random >> 17;
 random ^= random << 5;
 random &= modulus - 1;
```
It assumes that `len` is at least a 32 bit integer.
As a workaround replace `break_patterns` with an empty function for 16bit targets.

cc @stjepang
cc @alexcrichton

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 29, 2017

Rollup merge of rust-lang#40832 - pftbest:fix_msp430, r=stjepang
libcore: fix compilation on 16bit target (MSP430).

Since PR rust-lang#40601 has been merged, libcore no longer compiles on MSP430.
The reason is this code in `break_patterns`:
```rust
 let mut random = len;
 random ^= random << 13;
 random ^= random >> 17;
 random ^= random << 5;
 random &= modulus - 1;
```
It assumes that `len` is at least a 32 bit integer.
As a workaround replace `break_patterns` with an empty function for 16bit targets.

cc @stjepang
cc @alexcrichton

bors added a commit that referenced this pull request Jul 24, 2018

Auto merge of #52672 - ljedrz:unstable_sorts, r=<try>
Prefer sort_unstable*() over sort*()

Since `sort_unstable` is considered typically faster than the regular `sort` ([benchmarks](#40601 (comment))), it might be a good idea to check for potential wins in the compiler.

bors added a commit that referenced this pull request Jul 25, 2018

Auto merge of #52672 - ljedrz:unstable_sorts, r=<try>
Prefer sort_unstable*() over sort*()

Since `sort_unstable` is considered typically faster than the regular `sort` ([benchmarks](#40601 (comment))), it might be a good idea to check for potential wins in the compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment