Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libc++ std::nth_element is quadratic, should be linear #52747

Open
orlp opened this issue Dec 16, 2021 · 10 comments
Open

libc++ std::nth_element is quadratic, should be linear #52747

orlp opened this issue Dec 16, 2021 · 10 comments
Labels
confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@orlp
Copy link

orlp commented Dec 16, 2021

The reproduction program is almost the same as my 7 year old one for std::sort that got fixed only recently:

#include <algorithm>
#include <iostream>
#include <vector>

void make_killer(int size, std::vector<int>& v) {
    int candidate = 0;
    int num_solid = 0;
    int gas = size - 1;

    std::vector<int> tmp(size);
    v.resize(size);

    for (int i = 0; i < size; ++i) {
        tmp[i] = i;
        v[i] = gas;
    }

    std::nth_element(tmp.begin(), tmp.end() - 2, tmp.end(), [&](int x, int y) {
        if (v[x] == gas && v[y] == gas) {
            if (x == candidate) v[x] = num_solid++;
            else v[y] = num_solid++;
        }

        if (v[x] == gas) candidate = x;
        else if (v[y] == gas) candidate = y;

        return v[x] < v[y];
    });
}

int main(int argc, char** argv) {
    std::vector<int> v;
    int comparison_count;

    auto counter = [&](int x, int y) { ++comparison_count; return x < y; };

    std::cout << "N: comparisons\n";
    for (int i = 100; i <= 6400; i *= 2) {
        // to nullify small constants we multiply by 100
        make_killer(i, v);

        comparison_count = 0;
        std::nth_element(v.begin(), v.end() - 2, v.end(), counter);
        std::cout << i << ": " << comparison_count << "\n";
    }

    return 0;
}

The output:

N: comparisons
100: 2741
200: 10491
400: 40991
800: 161991
1600: 643991
3200: 2567991
6400: 10255991

Evidently quadratic, but the standard requires:

Complexity: For the overloads with no ExecutionPolicy, linear on average. For the overloads
with an ExecutionPolicy, O(N) applications of the predicate, and O(N log N) swaps, where N = last - first.

The problem is that pure quickselect is implemented without a fallback for worst cases, like median of medians.

@ldionne
Copy link
Member

ldionne commented Dec 16, 2021

@nilayvaish @Morwenn

Any interest in working on this? I'm just trying to nerdsnipe people, don't feel any obligation. :-)

@ldionne ldionne added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Dec 16, 2021
@nilayvaish
Copy link
Contributor

I'll take a look at this.

@ldionne, I have been meaning to increase my participation in libc++ development. Feel free to send some part of the work my way.

@Morwenn
Copy link

Morwenn commented Dec 16, 2021

I've been stealing the nth_element implementation I use from miniselect, which itself took it from Andrei Alexandrescu's adaptive quicskelect own implementation. Unfortunately I haven't really tried to understand how it works yet, and it's rather massive.

I guess an introselect that falls back to median-of-median or heap selection would be enough for most uses. Median-of-median itself isn't terribly hard to implement in O(1) space, but it's probably better to have it as a doubly-recursive algorithm with quickselect. HeapSelect is probably the easiest solution though since heap operations are already in the standard library.

@danlark1
Copy link
Contributor

danlark1 commented Dec 17, 2021

Author of miniselect is here. I analyzed quite thoroughly all existing nth_element implementations and the one from Alexandrescu seems the best to be standard compliant, fast and predictable at the same time

I strongly recommend not to have median of medians implementation as it has huge constant underneath, see the benchmarks in the repository

The easiest way will be to implement HeapSelect as a fallback, current nth_element is already quite good in terms of speed. See the benchmark section

@danlark1
Copy link
Contributor

By the way, technically current implementation is standard compliant

For the overloads with no ExecutionPolicy, linear on average.

Quickselect works linear on average

@ldionne
Copy link
Member

ldionne commented Dec 17, 2021

@nilayvaish

I'll take a look at this.

@ldionne, I have been meaning to increase my participation in libc++ development. Feel free to send some part of the work my way.

Thank you so much!

@danlark1

The easiest way will be to implement HeapSelect as a fallback, current nth_element is already quite good in terms of speed.

The link you shared says that libstdc++'s implementation is good, but it says about the libc++ implementation:

This algorithm is used in standard library and is not recommended to use if you are looking for performance.

So I think we definitely do want to improve the libc++ implementation, right?

@danlark1
Copy link
Contributor

danlark1 commented Dec 17, 2021

So I think we definitely do want to improve the libc++ implementation, right?

Benchmarks showed that Floyd-Rivest is the best algorithm to use but it requires floating point arithmetic and libc++ implementation is much better on average than libstdc++ right now.

The final solution depends on the goals of that. At google we don't see much perf opportunity for nth_element, for example. Not that much to spend even a month working on that.

If we want to "fix" worst case scenarios, then we can add HeapSelect and forget about it by spending several hours.

If we want to be at top of the industry, we can consider Alexandrescu's algorithm or Floyd-Rivest one.

And as I said, we already are standard compliant. Unlike std::sort, std::nth_element is required to be linear on average.

@orlp
Copy link
Author

orlp commented Dec 17, 2021

By the way, technically current implementation is standard compliant

For the overloads with no ExecutionPolicy, linear on average.

Quickselect works linear on average

I don't agree, even if we go strictly by the standard, if we allow custom comparison functions. Take this example, where I shuffle the input before using nth_element on it. No matter how it's shuffled, we are quadratic.

#include <algorithm>
#include <iostream>
#include <vector>
#include <random>

int quadratic(int size) {
    int num_solid = 0;
    int gas = size + 1;
    int comparison_count = 0;

    std::vector<int> indices(size);
    std::vector<int> values(size);
    for (int i = 0; i < size; ++i) {
        indices[i] = i;
        values[i] = gas;
    }

    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(indices.begin(), indices.end(), g); // Enforce uniform input distribution!
    std::nth_element(indices.begin(), indices.end() - 2, indices.end(), [&](int x, int y) {
        // Invariant: gas always compares greater than solid.
        comparison_count += 1;
        if (values[x] == gas && values[y] == gas) {
            // We must solidify one of the two elements.
            // Note: still greater than any previous solids - doesn't violate order.
            values[x] = num_solid++;
            return true;
        } else if (values[x] == gas) {
            return false;
        } else if (values[y] == gas) {
            return true;
        } else {
            return values[x] < values[y];
        }
    });
    return comparison_count;
}

int main(int argc, char** argv) {
    std::cout << "N: comparisons\n";
    for (int i = 100; i <= 6400; i *= 2) {
        std::cout << i << ": " << quadratic(i) << "\n";
    }
    return 0;
}

Output:

N: comparisons
100: 2788
200: 10588
400: 41188
800: 162388
1600: 644788
3200: 2569588
6400: 10259188

Mind you, this comparison function satisfies the requirements of a strict weak ordering and is perfectly valid. I don't think the standard claims to only be linear on average when used with std::less or std::greater.

@danlark1
Copy link
Contributor

Good point and self-referenced comparators definitely make the wording slightly ambiguous

alg.nth.element-3

Effects: After nth_­element the element in the position pointed to by nth is the element that would be in that position if the whole range were sorted with respect to comp and proj, unless nth == last

alg.sorting#general-5

A sequence is sorted with respect to a comp and proj for a comparator and projection comp and proj if for every iterator i pointing to the sequence and every non-negative integer n such that i + n is a valid iterator pointing to an element of the sequence,
bool(invoke(comp, invoke(proj, *(i + n)), invoke(proj, *i)))
is false.

I can come up with the sequence of such invocations (from std::sort) where from some point all of them will be false with your comparator. And if removing these invocations, nth_element is going to return the wrong value.

What "if the whole range were sorted" is ambiguous here.

Does that mean if it was sorted with std::sort or just any order? Or does that mean std::is_sorted returns true?

"On average" here does not make sense as well as the definition of the sorted range where the property starts to depend on the order of checking itself.

We can bring this to discussion at c++ mailing list but generally I try not to be a lawyer and "on average" in the standard meant likely the ability to implement quickselect.

@Morwenn
Copy link

Morwenn commented Dec 17, 2021

I'm reading the introselect Wikipedia article again and surprised to see that it includes strategies that, as far as I can tell, haven't been explored in miniselect:

Introselect works by optimistically starting out with quickselect and only switching to a worst-case linear-time selection algorithm (the Blum-Floyd-Pratt-Rivest-Tarjan median of medians algorithm) if it recurses too many times without making sufficient progress. The switching strategy is the main technical content of the algorithm. Simply limiting the recursion to constant depth is not good enough, since this would make the algorithm switch on all sufficiently large lists. Musser discusses a couple of simple approaches:

Keep track of the list of sizes of the subpartitions processed so far. If at any point k recursive calls have been made without halving the list size, for some small positive k, switch to the worst-case linear algorithm.
Sum the size of all partitions generated so far. If this exceeds the list size times some small positive constant k, switch to the worst-case linear algorithm. This sum is easy to track in a single scalar variable.

Both approaches limit the recursion depth to k ⌈log n⌉ = O(log n) and the total running time to O(n).

Also if I understand your implementation in miniselect, you really implemented a raw median-of-medians which doesn't attempt to pick the pivot any other way than with median-of-medians itself. The techniques proposed in the quote above might be worth exploring as a simple alternative to Alexandrecu's adaptive quickselect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

7 participants