Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for slice_partition_at_index #55300

Closed
Mokosha opened this issue Oct 24, 2018 · 33 comments
Closed

Tracking issue for slice_partition_at_index #55300

Mokosha opened this issue Oct 24, 2018 · 33 comments

Comments

@Mokosha
Copy link
Contributor

@Mokosha Mokosha commented Oct 24, 2018

This is the tracking bug for the discussion outlined in rust-lang/rfcs#1470.

I'm working on implementing this in the libstd, and will add it as an unstable feature pending resolution of any comments that may arise in this thread. I plan to add comprehensive testing as well.

@nikic
Copy link
Contributor

@nikic nikic commented Oct 24, 2018

Just for the record, if someone needs this functionality right now, the order_stat crate provides an implementation of the Floyd-Rivest selection algorithm.

Centril added a commit to Centril/rust that referenced this issue Jan 23, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug rust-lang#55300.
bors added a commit that referenced this issue Jan 23, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug #55300.
bors added a commit that referenced this issue Mar 11, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug #55300.
bors added a commit that referenced this issue Apr 2, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug #55300.
@scottmcm
Copy link
Member

@scottmcm scottmcm commented Apr 2, 2019

A thought: since we have sort and sort_unstable, should the method that got implemented be partition_unstable_at_index instead? Though I suppose there's no good stable way to do it...

Centril added a commit to Centril/rust that referenced this issue Apr 2, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug rust-lang#55300.
Centril added a commit to Centril/rust that referenced this issue Apr 3, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug rust-lang#55300.
@umanwizard
Copy link

@umanwizard umanwizard commented Aug 30, 2019

Hi @Mokosha , is this expected to be stabilized eventually?

@Mark-Simulacrum Mark-Simulacrum changed the title Feature Request: analog of C++'s std::nth_element for rust slices. Tracking issue for slice_partition_at_index Sep 2, 2019
@Mokosha
Copy link
Contributor Author

@Mokosha Mokosha commented Sep 12, 2019

@umanwizard I have no idea what the standardization process is for features, so I don't really know how to answer that. Happy to support whatever needs to be done for that though!

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 18, 2019

APIs currently pointing here below. These seem very niche to me. @rust-lang/libs any thoughts?

impl<T> [T] {
    pub fn partition_at_index(&mut self, index: usize) -> (&mut [T], &mut T, &mut [T])
        where T: Ord
    {…}
    pub fn partition_at_index_by<F>(&mut self, index: usize, mut compare: F)
                                    -> (&mut [T], &mut T, &mut [T])
        where F: FnMut(&T, &T) -> Ordering
    {…}
    pub fn partition_at_index_by_key<K, F>(&mut self, index: usize, mut f: F)
                                           -> (&mut [T], &mut T, &mut [T])
        where F: FnMut(&T) -> K, K: Ord
    {…}
}

@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Oct 21, 2019

The naming here threw me for a spin and once I got around to reading the documentation this does seems something standard-library-ish but I agree it's pretty niche. On naming at_index I think can be replaced with just at (e.g. split_at, not split_at_index). I find the name "partitioning" confusing when it's actually doing a sort and returning the index'th element. The documentation also indicates that it's unstable, but we currently have a convention of "stable unless otherwise specified", so the term *_unstable_* may want to be somewhere in the name.

@Ixrec
Copy link
Contributor

@Ixrec Ixrec commented Oct 22, 2019

I think "doing a sort and ..." is at best a very misleading description. Every element is guaranteed to end up on the correct side of the indexth element, but no (non-trivial) subsequence of the slice is guaranteed to end up in sorted order, much less the whole slice. So even something like "partial sort" would be extremely misleading.

Do you know of any other term for this? I've only ever heard this operation referred to as partition in C++ (they even have a stable_partition), and also as partition in pretty much every pseudocode description of quicksort. The only alternatives that come to mind are synonyms like "segregate" which don't seem any better.

(but I agree with s/at_index/at, and adding an unstable)

@scottmcm
Copy link
Member

@scottmcm scottmcm commented Oct 22, 2019

This isn't C++'s partition, but rather nth_element (https://en.cppreference.com/w/cpp/algorithm/nth_element), right?

If so I think it's an excellent thing to have in the library, especially since the complicated parts of the implementation are things that sort_unstable already needs, to the maintenance cost is low.

(Interestingly, note that this method is actually an answer to an exercise in the book.)

As for naming, I'm not super fond of the current one, but unfortunately don't have any good suggestions.

@Mokosha
Copy link
Contributor Author

@Mokosha Mokosha commented Oct 23, 2019

Yes, this is closer to C++'s nth_element.

With regards to the naming -- there were some alternatives to partition proposed in the RFC, but partition seemed to be more akin to what others expected such a function to be called. I'd caution against using simply at instead of at_index because by having users provide their own comparators, we would need to change at_index_by to be at_by, which could be confusing. I otherwise agree that dropping the _index and adding _unstable_ would be generally good.

With the attention of the libs team, though, provided the procedure outlined here is still followed, what else needs to be done for stabilization?

@glandium
Copy link
Contributor

@glandium glandium commented Jan 24, 2020

FWIW, this was confusing to me at quick glance because in python partition does, on the surface, the same thing as partition_at_index... without the sorting part.

@Digital-Monk
Copy link

@Digital-Monk Digital-Monk commented Jun 4, 2020

I'm fine with partition_at_index, and prefer to keep the _index in there, because otherwise I would tend to think it was partitioning at (by?) a value and returning the index... Sorry, that may be mental corruption caused by my students when I was teaching ComSci, but I've found that being explicit about index-vs-value is useful, especially as you get more powerful libraries.

(Sorry if I necro'ed this and everything is already decided. Just my 2 cents...)

@jesperdj
Copy link

@jesperdj jesperdj commented Jul 17, 2020

These algorithms (what C++ std::nth_element does and what these partition_by_... functions do) are selection algorithms and they are related to partitioning and sorting algorithms such as quicksort.

I agree that these functions in Rust are called partition_by_... instead of C++'s nth_element because they not just find the n'th element in a sequence when ordered using a specific ordering, they also partition the elements into two parts (those 'less than' and those 'greater than or equal to' a particular element).

@jagill
Copy link
Contributor

@jagill jagill commented Sep 12, 2020

I'd like to "push for stabilization", as per the stabilization guide. These functions are very useful for building various types of spatial indices (RTrees, KDTrees, etc), for which you need to repeatedly partition, but not sort. From reading the comments, it seems only the name is still under discussion. Is that true? If so how do we drive that to completion? Are there other concerns?

If there aren't, then I can start the Documentation PR and writing a stabilization report (probably with some oversight and feedback!).

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Sep 16, 2020

The main issue that that for consistency with sort/sort_unstable we should add _unstable to the function names since this is an unstable partitioning algorithm. The one downside is that the function names then start to get a bit long (partition_unstable_at_index_by_key).

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Sep 18, 2020

OK let's stabilize with a rename to select_nth_unstable.

@rfcbot fcp merge

@rfcbot
Copy link

@rfcbot rfcbot commented Sep 18, 2020

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jesperdj
Copy link

@jesperdj jesperdj commented Sep 23, 2020

partition would refer to a different operation than what this function currently does.

Currently partition_at_index is equivalent to C++'s nth_element, not partition.

All aboard the select_nth train!

Ok, but...: the name used in C++, nth_element, is not great. This function does more than select the nth element - it also rearranges the elements in the collection so that all elements that are "less than" the nth element are to the left, and all elements that are "greater than" the nth element are to the right of it. In other words: C++ nth_element does in fact partition the collection (just not in the way that partition does). It doesn't just select the nth element, as the name suggests.

So I don't like the name select_nth because that also doesn't fully explain what the function does.

@jagill
Copy link
Contributor

@jagill jagill commented Sep 23, 2020

So I don't like the name select_nth because that also doesn't fully explain what the function does.

Are you in favor of the partition_at_* names? If so, with or without _unstable? Or do you have another suggestion?

I think the unfortunate reality is that there is no good name for this particular function -- each candidate has its flaws. Over the last 2 years no one has found a name that is universally liked, so we may have to go with the least objectionable choice.

@adamreichold
Copy link
Contributor

@adamreichold adamreichold commented Sep 23, 2020

So I don't like the name select_nth because that also doesn't fully explain what the function does.

I agree with this observation, but as @jagill pointed out, no name that would fulfill this criterion has been proposed so far.

select_nth_unstable does not try to, but rather tries to invoke "selection algorithm" as the relevant context to better understand how this algorithm fits into the larger landscape of sorting and order statistics.

@JulianKnodt
Copy link
Contributor

@JulianKnodt JulianKnodt commented Sep 23, 2020

What about something like bisect_at_* or bisect_nth_*? Altho I worry that bisect is not used commonly enough for it to be immediately clear. I personally am fine with any name, but bisect seems to fit what is missing.

@KodrAus
Copy link
Contributor

@KodrAus KodrAus commented Sep 29, 2020

I'm on board with select_nth_unstable, given we haven't found any names that don't come with their own downsides, and the downside for select_nth_unstable is that it's "as bad" as the equivalent used in C++. That seems like a fair tie-breaker to me.

@rfcbot
Copy link

@rfcbot rfcbot commented Sep 29, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@umanwizard
Copy link

@umanwizard umanwizard commented Sep 29, 2020

Was something with top in the name considered? E.g. select_top_n_unstable.

Similar functions (not exactly the same, but close enough) are named topk and top_k in PyTorch and TensorFlow respectively.

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Sep 29, 2020

I don't think we can do much better than select_nth_unstable so let's just stick with that.

@jagill
Copy link
Contributor

@jagill jagill commented Oct 7, 2020

@Amanieu I'm happy to make the PR changing the name and stabilizing the issue, but I have a question: If we change the name, this will be breaking for anyone who's been using the current name. It this acceptable, or is there an aliasing step that will preserve behavior for those using the feature flag?

jagill added a commit to jagill/rust that referenced this issue Oct 7, 2020
This stabilizes slice_partition_at_index, including the
rename `partition_at_index*` -> `select_nth_unstable*`.

Closes rust-lang#55300
@scottmcm
Copy link
Member

@scottmcm scottmcm commented Oct 7, 2020

@jagill It's nightly so it's allowed, but to make it a bit easier on nightly users it's generally preferable to to stabilize it under a different feature gate name so the old method can be left there for a month or two before being removing.

jagill added a commit to jagill/rust that referenced this issue Oct 7, 2020
This stabilizes the functionality in slice_partition_at_index,
but under the names `select_nth_unstable*`.  The functions
`partition_at_index*` are left as deprecated, to be removed in
a later release.

Closes rust-lang#55300
jagill added a commit to jagill/rust that referenced this issue Oct 7, 2020
This stabilizes the functionality in slice_partition_at_index,
but under the names `select_nth_unstable*`.  The functions
`partition_at_index*` are left as deprecated, to be removed in
a later release.

Closes rust-lang#55300
@rfcbot
Copy link

@rfcbot rfcbot commented Oct 9, 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Oct 13, 2020
…_index, r=Amanieu

Stabilize slice_partition_at_index

This stabilizes slice_partition_at_index, including renaming `partition_at_index*` -> `select_nth_unstable*`.

Closes rust-lang#55300

r? @Amanieu
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 13, 2020
…ndex, r=Amanieu

Stabilize slice_partition_at_index

This stabilizes slice_partition_at_index, including renaming `partition_at_index*` -> `select_nth_unstable*`.

Closes rust-lang#55300

r? `@Amanieu`
@bors bors closed this in 01ac5a9 Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.