-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N)) #21211
Comments
assigned to @mclow |
Hi Orson, Please mail this patch to the cfe-commits list for review. Please make sure the patch is sent as an attachment to the e-mail (not inline text), and please make sure the repeat the background information you've included here in the e-mail itself. For more information on the development policy and workflow, please see: http://llvm.org/docs/DeveloperPolicy.html Thanks! |
Hi Hal, I've sent the patch to the mailing list, where it is waiting for moderator approval since I'm not a member of the list. Greetings, Orson Peters |
An update. EricWF is working on a set of benchmarks for std::sort and other algorithms/containers. He expects to finish this quite soon, and I'll be using that to make sure this doesn't introduce a performance regression - and will be adding this to the test suite. So I haven't forgotten about it. |
What's the status on this? |
I think std::sort() is still broken in trunk as of today. std::sort() should not call the compare function on the same element: it cannot be O(N log N) if it does. Related bug: https://llvm.org/bugs/show_bug.cgi?id=28551 Reduced testcase: int main(int argc, char** argv) { // Elements in v are unique. std::sort(v.begin(), v.end(), return 0; $ clang++ -std=c++11 -stdlib=libstdc++ a.cc -o a.out |
*** Bug llvm/llvm-bugzilla-archive#28551 has been marked as a duplicate of this bug. *** |
I would like to point out that our std::sort implementation is between 2x and 5x faster on already sorted inputs than libstdc++ is. Some benchmark results: // libc++ //
|
Eric Fiselier, can you run that same benchmark with the 'block' branch of my pattern-defeating quicksort algorithm? https://github.com/orlp/pdqsort/tree/block I've been waiting for a while now on this benchmark suite to start discussion of integrating the algorithm to libc++, I wasn't aware you finished the benchmark. |
Could you please measure the number of instructions executed, data reads, and data writes, instead of the number of nano seconds of execution? I think those are a more stable metrics than the execution time that may vary with the state of the machine, daemons, noise, etc. $ valgrind --tool=cachegrind ./a.out |
I double Orson's request about using pdqsort as internal algorithm for std::sort. You can see here, how pdqsort is fast (https://github.com/orlp/pdqsort). |
Fixed with https://reviews.llvm.org/D113413, should land in llvm 14 If you have stability sorting guarantees that you cannot fix and need migration, use -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY or -D_LIBCPP_DEBUG=1 from https://reviews.llvm.org/D96946 to randomize the input range before sorting |
mentioned in issue llvm/llvm-bugzilla-archive#28551 |
Extended Description
The C++ standard mandates that
std::sort
has complexity O(N log(N)), but libc++'s implementation takes O(N^2) in the worst case.As proof I've attached a program that constructs a worst case input for several sizes. It also instruments the number of comparisons used to sort these worst cases and prints the results. The technique used is described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy.
Compiling and running:
$ clang++ -O2 -m64 -march=native -std=c++11 -stdlib=libc++ main.cpp -nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc && ./a.out
N: comparisons
100: 2479
200: 10229
400: 40729
800: 161729
1600: 448698
3200: 1413898
6400: 5264298
This is clearly quadratic. To illustrate this defect more, these are the comparison counts given by compiling using libstdc++:
$ clang++ -O2 -m64 -march=native -std=c++11 main.cpp && ./a.out
N: comparisons
100: 1742
200: 4260
400: 10035
800: 22784
1600: 51049
3200: 112386
6400: 244850
Inspecting the source of shows the cause of the issue: there is no introsort implemented, only quicksort (albeit with non-trivial optimizations, but nothing preventing the worst case). I've written a patch that augments the current implementation to make it work as an introspective sort, switching to heapsort if the recursion depth exceeds 2*floor(log(N)). This post can only have one attachment, so I'll post it as an attachment to a comment.
Having not contributed to libc++ before I'm not 100% familiar with all coding styles/(un)written rules. My changes are rather minimal though, so I've just followed what style could be found in context. And of course I knowingly submit my code under the libc++ licenses, the MIT license and the UIUC License.
The text was updated successfully, but these errors were encountered: