Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd unstable sort to libcore #1884
Conversation
sfackler
reviewed
Feb 3, 2017
| **Q: How much faster can unstable sort be?**<br> | ||
| A: Sorting 64-bit integers using [pdqsort][stjepang-pdqsort] (an | ||
| unstable sort implementation) is **40% faster** than using `slice::sort`. | ||
| Detailed benchmarks are [here](https://github.com/stjepang/pdqsort#extensive-benchmarks). |
This comment has been minimized.
This comment has been minimized.
sfackler
Feb 3, 2017
Member
Out of curiosity, is this compared against the recent improvements to the existing sort implementation?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
The conflation of API stability and order stability is a bit unfortunate, but I agree that trying to name this something other than |
sfackler
added
the
T-libs
label
Feb 3, 2017
This comment has been minimized.
This comment has been minimized.
|
As long as it's clearly documented I would think |
This comment has been minimized.
This comment has been minimized.
arthurprs
commented
Feb 4, 2017
|
|
stjepang
force-pushed the
stjepang:unstable-sort
branch
from
0f1f5bd
to
8b513d0
Feb 5, 2017
This comment has been minimized.
This comment has been minimized.
|
This is an excellent RFC, thank you for taking the time to write it up and be so thorough @stjepang! I wonder if we could consider perhaps Other than that, the only question I'd have is to clarify if you have an idea in mind for the concrete implementation of this API in libcore. Are you thinking pdqsort or perhaps some other implementation? |
This comment has been minimized.
This comment has been minimized.
|
I think many QAs in motivation section are out of topic. eg. those about 'stable sort'. |
This comment has been minimized.
This comment has been minimized.
Exactly - proposed implementation lives in the pdqsort crate.
Makes sense, even though Let's see what others think - can you please vote this comment:
|
This comment has been minimized.
This comment has been minimized.
|
IMO |
This comment has been minimized.
This comment has been minimized.
|
@stjepang oops thanks for the clarification! I clearly need to read more closely :) |
This comment has been minimized.
This comment has been minimized.
|
I think it'd be good to also specialise |
This comment has been minimized.
This comment has been minimized.
(Stability of sorting |
This comment has been minimized.
This comment has been minimized.
I strongly disagree. Please take a look at this surprising benchmark in C++:
In C++, By specializing Note that by not specializing we're giving users a choice. We're providing two different sorts with consistent, predictable, and reliable performance. If they want speed, or in other words, if they don't care about stability, then high performance sort is a few keystrokes away: just add "unstable". Unstable sort is generally much faster than stable sort, with a few exceptions. This will be clearly explained in the documentations. Users who understand the difference will choose the benefits they want to reap by themselves. The difference will be explained here, under "Current implementation". The whole reason why the default |
This comment has been minimized.
This comment has been minimized.
|
Thanks everyone for your inputs! It's pretty clear now that renaming |
This comment has been minimized.
This comment has been minimized.
|
It seems like his has reached a steady state and there isn't too much controversy over the approach, so I'll FCP to merge! @rfcbot fcp merge |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Feb 15, 2017
•
|
Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
This comment has been minimized.
This comment has been minimized.
jongiddy
commented
Feb 16, 2017
|
Why is libcore the appropriate place to put this, rather than keeping it as a separate crate? |
This comment has been minimized.
This comment has been minimized.
|
After rewriting
After implementing pdqsort, the feedback was:
There was also a question on StackOverflow complaining that the fact that These complaints are understandable considering that the default sorts in C, C++, Swift, Go, and D don't allocate nor sacrifice speed. Generally speaking, I'm in favor of keeping libstd and libcore small, but sorting is such a common operation. Today, almost every call to |
stjepang
force-pushed the
stjepang:unstable-sort
branch
from
0e9ef72
to
2bae25f
Feb 18, 2017
This comment has been minimized.
This comment has been minimized.
cristicbz
commented
Feb 20, 2017
|
@stjepang Regarding specialisation: how about adding Also as far as I can tell your example isn't really about stable vs unstable sort, it's more about particular |
This comment has been minimized.
This comment has been minimized.
The problem with specialization is that it's a half-baked solution. Suppose you specialize
Well, stability is a constraint. Because
No,
Interestingly, in this case the names of these data structures actually do represent particular implementations (they are not called |
This comment has been minimized.
This comment has been minimized.
cristicbz
commented
Feb 21, 2017
It depends on how exactly you do this, I was imagining something like an What do you think?
I agree! I think maybe I wasn't clear. It's a safe bet that The counterpoint I was making to "specialising-removes-choice" is that the particular pathological cases are not generally applicable to stable-vs-unstable, but depend on the particular implementations of If you hit one of these pathological cases, switching to |
This comment has been minimized.
This comment has been minimized.
|
You made some very good points, but I'm still not in favor of specialization :)
Absolutely, I agree. You might want to take a look at rust-lang/rust#38524. So, in this issue a discussion was brought up about what kind of guarantees we should or shouldn't make about the sort algorithm. It's a really difficult call. This culminated with @aturon's conclusion:
I think our current discussion is similar to that one. There are good arguments both ways, but ultimately I think we shouldn't make too hard guarantees (let's use the stable/unstable dichotomy rather than timsort/pdqsort), while at the same time documenting particular implementations and making them clear and reliable (but not guaranteeing them). Although specialization improves performance, it makes it less reliable and more surprising.
Again, let me quote @aturon here:
Whatever guarantees we make, people will rely on observable behavior. Let's keep it simple and allow users to rely on those implementations if they wish. Let's not try to outsmart them with specialization. That's my take. Hope it makes some sense. |
This comment has been minimized.
This comment has been minimized.
|
OFFTOPIC
Often I find my self doing multiple stable sorts in a loop over items of the same type and I always wondered if it would be possible to "split"
That way I could:
Would it make sense to purse this in a different follow-up RFC? |
This comment has been minimized.
This comment has been minimized.
arthurprs
commented
Mar 7, 2017
|
@gnzlbg really good points, worth discussing in a separated thread. |
This comment has been minimized.
This comment has been minimized.
|
The sorting machinery you are proposing makes sense, but I believe it's way outside the scope of this RFC, and also outside the focus of libcore and libstd. Unless all that can be covered under an elegant API, it's better to provide it as a separate crate. If this RFC gets implemented, we'll have six (!!) methods for sorting. This is already a large number, although I believe a necessary number nonetheless - hence the RFC. But having even more of them would be really getting out of hand, don't you think? :) |
This comment has been minimized.
This comment has been minimized.
You are arguing here that we should provide an unstable sorting algorithm that never allocates in libcore, but somehow are arguing as well that providing an stable sorting algorithm that never allocates is out-of-scope for libcore? Cleaning up the libstd implementation of But this increases the API of libstd with at most two functions that arguably should already be there, and this saves users for having to reimplement stable sort for embedded applications (which is arguably a non-trivial task). I don't know, I see a net win :) |
This comment has been minimized.
This comment has been minimized.
If I understood correctly, you are proposing to add at least 3 new functions, right?
Well, kind of. :) It's a matter of balancing the costs and benefits. A stable sort in libcore would (as proposed) introduce yet another Unstable sort is not only sometimes nice to have, but the correct choice for the vast majority of sorting needs. Just because it is so sorely needed is why introducing a new triple of methods is forgivable. I consider those methods "essential". High cost (3 functions), high benefit (almost all use cases). I'd like to reiterate that the methods you are proposing overall make sense. They would certainly fill a gap in libcore and make it more complete. It's just that they introduce a lot of functions for niche use cases - that's the only reason why I'm hesistant in accepting the idea. If you could design a cleaner/smaller API, I'd be totally up for it. Perhaps Rust should get some blame here - if we had default and optional arguments, maybe we could do something along the lines of: fn sort(&mut self, stable: bool = true, buffer: Option<&mut [u8]> = None) where Self::Item: Ord {
// ...
}Does that make sense? What does the libs team think? |
This comment has been minimized.
This comment has been minimized.
|
Also an important thing to note is that libcore will be able to have a stable sort if #1909 gets accepted. |
This comment has been minimized.
This comment has been minimized.
|
@clarcharr Are you suggesting that stable sorting in libcore could allocate the buffer on the stack rather than on the heap? If so, I'm afraid that's not a viable option - it could easily overflow the stack. |
This comment has been minimized.
This comment has been minimized.
|
@stjepang: it could overflow the stack for large slices, but it'd still be worthwhile to have an upper limit and document it as panicking on no_std. We can still just do [Default::default(); 4096] on stable now and that size covers a majority of use cases. |
This comment has been minimized.
This comment has been minimized.
|
@clarcharr Hmm, there are two problems with that:
We need something more robust. Admittedly, designing a clean and uncompromising sort interface is challenging. :) |
This comment has been minimized.
This comment has been minimized.
The minimal viable set is to add two single functions to lib core:
Anything else can be build efficiently on top of that, and doing so is trivial (at least when compared to the cost of implementing your own stable sorting algorithm). I think that also having a:
would be nice, but since it is a two-liner, we don't need to add it to libcore if there are real concerns about API bloat. I actually don't really care if these are available in The issue IMO is that when you need them, reimplementing a correct stable sort algorithm that is both performant and correct is a titanic task. We can spare this task to our users by providing the building blocks for the hard parts in libcore. Whether you then want to stable sort with a In particular, this code is already in @clarcharr
Note that the default buffer length for stable sort in
I think that we would at least need an extra parameter to tell sort whether it can heap allocate or not... It also puts all the options on everybody's faces. I don't know, I think that just refactoring |
This comment has been minimized.
This comment has been minimized.
|
I opened an internal threads for this, the discussion here is derailing, and we should just merge this RFC and discuss future extensions somewhere else: https://internals.rust-lang.org/t/pre-pre-rfc-stable-sorting-building-blocks-in-libcore/4928 |
This comment has been minimized.
This comment has been minimized.
Please remove or change this text. It is self-contradictory and I believe incorrect. There are tradeoffs between stable and unstable sorting. This is a hostile way to begin an RFC. I'm fine with the technical content of this RFC. Thanks @stjepang. |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Mar 8, 2017
|
|
rfcbot
added
the
final-comment-period
label
Mar 8, 2017
This comment has been minimized.
This comment has been minimized.
cristicbz
commented
Mar 8, 2017
|
@brson I think that's not contradictory. It's often the case of conservative defaults: they're not useful most of the time, but when they are it would be an expensive / hard to track problem; that still makes it the right default. |
stjepang
force-pushed the
stjepang:unstable-sort
branch
from
242f3d2
to
d0df880
Mar 8, 2017
This comment has been minimized.
This comment has been minimized.
|
@brson Sorry, I should've worded that sentence better. I gave the motivation section another try so hopefully it's clearer and more balanced now. What I wanted to say is: stability as a property is rarely desired in practice, and some other characteristics are desired more often. So far I only gave anecdotal evidence for that statement, but if someone is seeking empirical evidence, here's an easy way to find some: simply search C++ code on GitHub for |
stjepang
force-pushed the
stjepang:unstable-sort
branch
3 times, most recently
from
8237147
to
35189fd
Mar 8, 2017
stjepang
force-pushed the
stjepang:unstable-sort
branch
from
35189fd
to
7d2e940
Mar 8, 2017
This comment has been minimized.
This comment has been minimized.
|
It's been a week now since FCP start (not sure where fcpbot is) and nothing major has coming up, so I'm going to merge! Thanks again for the RFC @stjepang! |
alexcrichton
referenced this pull request
Mar 16, 2017
Closed
Tracking issue for unstable sort in libcore #40585
alexcrichton
merged commit d25e483
into
rust-lang:master
Mar 16, 2017
This comment has been minimized.
This comment has been minimized.
|
Tracking issue: rust-lang/rust#40585 |
stjepang commentedFeb 3, 2017
•
edited
Every other systems programming language has a fast, non-allocating, unstable sort in standard library. Rust still doesn't. This is a proposal to add one to libcore.
The proposed implementation is very fast, faster than
std::slice::sortandstd::sortin C++ by a wide margin.This topic was discussed before in the issue #790
Rendered