Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heapsort fallback in select_nth_unstable #106997

Merged
merged 1 commit into from
Jan 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions library/core/src/slice/sort.rs
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,15 @@ fn partition_at_index_loop<'a, T, F>(
) where
F: FnMut(&T, &T) -> bool,
{
// Limit the amount of iterations and fall back to heapsort, similarly to `slice::sort_unstable`.
// This lowers the worst case running time from O(n^2) to O(n log n).
// FIXME: Investigate whether it would be better to use something like Median of Medians
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia says that median-of-median is somewhat slow, and not used normally, so I don't know if we'd want to start there, but maybe using it as fallback instead of heapsort. But that's a discussion for another time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I meant was to use median of medians as a fallback instead of heapsort to keep the fast average case of quickselect and still ensure O(n) worst case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I agree that fallback to median-of-medians would make sense.

And it sounds like it is used in practice in some contexts, see: https://en.wikipedia.org/wiki/Introselect

// or Fast Deterministic Selection to guarantee O(n) worst case.
let mut limit = usize::BITS - v.len().leading_zeros();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: if you're doing BITS - ctlz, I'd generally suggest using ilog2 instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the limit calculation from the sorting code, but I can change it if that's preferred. In that case, I'll also change it for sort I guess.


// True if the last partitioning was reasonably balanced.
let mut was_balanced = true;

loop {
// For slices of up to this length it's probably faster to simply sort them.
const MAX_INSERTION: usize = 10;
Expand All @@ -839,6 +848,18 @@ fn partition_at_index_loop<'a, T, F>(
return;
}

if limit == 0 {
heapsort(v, is_less);
return;
}

// If the last partitioning was imbalanced, try breaking patterns in the slice by shuffling
// some elements around. Hopefully we'll choose a better pivot this time.
if !was_balanced {
break_patterns(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure: This looks like it makes it randomized? Does it need both randomization and the heapsort fallback? Could we just use one or the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I just copied this from the sort implementation. But the fact that sort does need both the pattern breaking and the heapsort fallback makes it seem to me like they're both needed. Alternatively, we could just remove this check, decrement limit unconditionally and initialize it to 2 * ilog2(len); this is what the "baseline" implementation of introsort/introselect does (or at least the pseudocode on Wikipedia does it this way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, this is exactly what sort is doing

// If too many bad pivot choices were made, simply fall back to heapsort in order to
// guarantee `O(n * log(n))` worst-case.
if limit == 0 {
heapsort(v, is_less);
return;
}
// If the last partitioning was imbalanced, try breaking patterns in the slice by shuffling
// some elements around. Hopefully we'll choose a better pivot this time.
if !was_balanced {
break_patterns(v);
limit -= 1;
}

Makes sense to do the same thing, I guess.

limit -= 1;
}

// Choose a pivot
let (pivot, _) = choose_pivot(v, is_less);

Expand All @@ -863,6 +884,7 @@ fn partition_at_index_loop<'a, T, F>(
}

let (mid, _) = partition(v, pivot, is_less);
was_balanced = cmp::min(mid, v.len() - mid) >= v.len() / 8;

// Split the slice into `left`, `pivot`, and `right`.
let (left, right) = v.split_at_mut(mid);
Expand Down