Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for partition_point #73831

Closed
5 tasks done
VillSnow opened this issue Jun 28, 2020 · 31 comments
Closed
5 tasks done

Tracking Issue for partition_point #73831

VillSnow opened this issue Jun 28, 2020 · 31 comments
Labels
A-slice Area: [T] B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@VillSnow
Copy link
Contributor

VillSnow commented Jun 28, 2020

Feature gate: #![feature(partition_point)]

This is a tracking issue for [T]::partition_point.

See also rust-lang/rfcs#2184

Public API

impl<T> [T] {
    pub fn partition_point<P>(&self, mut pred: P) -> usize
    where
        P: FnMut(&T) -> bool;
}

Steps / History

Unresolved Questions

  • Is there a better name? partition_point matches C++, but might be hard to recognize as a binary search.
@VillSnow VillSnow added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jun 28, 2020
@jonas-schievink jonas-schievink added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 28, 2020
@timvermeulen
Copy link
Contributor

In order to deduplicate this code with binary_search_by's implementation, I think we can simply implement partition_point in terms of binary_search_by like so:

use crate::cmp::Ordering::{Less, Greater};

impl<T> [T] {
    pub fn partition_point<P>(&self, mut pred: P) -> usize
    where
        P: FnMut(&T) -> bool,
    {
        self.binary_search_by(|x| if pred(x) { Less } else { Greater })
            .unwrap_or_else(|i| i)
    }
}

@VillSnow
Copy link
Contributor Author

@timvermeulen

I prefer my partition_point code than current binary_search_by code.
I found current binary_search_by has a problem that it usually calls one extra comparison.

I tried

  • previous binary_search_by logic (=logic1),
  • current binary_search_by logic (=logic2), and
  • partition_point logic (=logic3).

In addition, I also tried to

  • remove jump from partition_point (logic4).

I'm not sure succeeded to remove, or possibly already done from current code.

https://github.com/VillSnow/compare-binary-searches/blob/master/src/main.rs

You can find logic1, 3, 4 need 3 comparisons to find from 0..7, while logic2 needs 4 calls.
I think this causes significant cost more than optimization trick if users can pass their own predictions.

Here is a typical result of bench. Find 0 from 0..65536 and find 0 from 7 but inserted sleep(1ms) in comparison.
Logic2 is much slower than others in case of custom predication.

logic [i32; 65536] [Slow; 7]
logic1 30ns 5.4ms
logic2 9ns 7.2ms
logic3 10ns 5.6ms
logic4 10ns 5.6ms

@timvermeulen
Copy link
Contributor

I found current binary_search_by has a problem that it usually calls one extra comparison.

I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.

And regardless of what the ideal implementation of binary_search_by looks like, partition_point should probably still be implemented in terms of it in the way I outlined above 🙂

@VillSnow
Copy link
Contributor Author

@timvermeulen

I found current binary_search_by has a problem that it usually calls one extra comparison.

I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.

AFAIK, this bench is written when logic1 is replaced into logic2, not after logic3. The bench tries to find random element from usize slice, not user type which needs large cost to compare.
If it does not optimize for minimum number of comparisons, itself is a problem, I think.
If user need 1ns faster binary search, they can write by themselves or use 3rd party crate, as well as HashMap.

And regardless of what the ideal implementation of binary_search_by looks like, partition_point should probably still be implemented in terms of it in the way I outlined above 🙂

Is it means partition_point should call binary_search_by instead of binary_search_by calls partition_point?
I agree because binary_search_by has more information and the cost would be small.

@KodrAus KodrAus added A-slice Area: [T] Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@scottmcm
Copy link
Member

scottmcm commented Aug 26, 2020

I noticed in passing that this is using FnMut. That seems strange to me, since there's no guarantee which items will be looked at or in which order by this API, so actually mutating in the closure would seem an anti-pattern. Should it maybe use Fn instead? (People could always use interior mutability if they really really really want to update things in the closure, but this would discourage it.)

Though I guess sort_by takes FnMut...

@VillSnow
Copy link
Contributor Author

@scottmcm memoization or exterior (web/DB) API access using mut handle?

@matklad
Copy link
Member

matklad commented Jan 10, 2021

@VillSnow this has been baking in the nightly for a while, I think this is ready for stabilization. Would like to send a stabilization PR?

@VillSnow
Copy link
Contributor Author

If the team think all right, I'm glad.

I still concern about code duplication, and I'm watching this PR #74024.
However, how predicate is called is not documented, so even if the algorithm was changed, it never change the stable branch.

I have no objection to merge this.

@matklad
Copy link
Member

matklad commented Jan 11, 2021

If the team think all right, I'm glad.

Submitting a stabilization PR is a way to figure that out! I am not on the libs team, but this seems stabilization-worthy to me.

I wouldn't worry about impl details -- they can be adjusted after stabilization.

@VillSnow

This comment has been minimized.

@VillSnow

This comment has been minimized.

@matklad

This comment has been minimized.

@VillSnow

This comment has been minimized.

@VillSnow

This comment has been minimized.

@matklad

This comment has been minimized.

@jyn514

This comment has been minimized.

@jyn514

This comment has been minimized.

@m-ou-se
Copy link
Member

m-ou-se commented Jan 23, 2021

Moved from #81012:

@dtolnay's comment (#81012 (comment)):

@rust-lang/libs

This method finds the start of the range of elements where pred=false, assuming all elements for which pred=true come before all elements for which pred=false, via binary search.

impl<T> [T] {
   pub fn partition_point<P>(&self, pred: P) -> usize
    where
        P: FnMut(&T) -> bool;
}

This is a special case of binary_search_by, but reasonably common and not all that clear expressed using binary_search_by.

slice.partition_point(pred)

// equivalent to
slice.binary_search_by(|x| if pred(x) { Less } else { Greater }).unwrap_err()

The polarity choice for whether false or true elements go first is consistent with the stable Iterator::partition (https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.partition), with STL's std::partition_point in C++ (https://en.cppreference.com/w/cpp/algorithm/partition_point), and with the unstable Iterator::is_partitioned (#62544, https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.is_partitioned).

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jan 23, 2021

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jan 23, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jan 23, 2021

@rfcbot concern name

I was completly unaware we even had this function, and even suggested a few times we might want to add something like this. I guess because the name doesn't remind me of a binary search. This at least means that we should update the documentation of binary_search to point out the existence of this function, and probably add a doc(alias) to this function too. But we might want to consider another name that more people would recognize as this type of binary search.

@BurntSushi
Copy link
Member

I second @m-ou-se's naming concern. My first reaction was, "what the heck is a partition point?"

I think my preference would be to have "binary search" or at least "search" in the name somewhere. A bit verbose, but maybe binary_search_by_pred? Not sure.

@VillSnow
Copy link
Contributor Author

The first reason why this is partition_point is that it borrows C++'s name.

Otherwise, this example shows what it gets is the partition point.

#![feature(partition_point)]
use std::iter::Iterator;
fn main() {
    let mut s = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4];
    let a = itertools::partition(&mut s, |x| *x > 5);

    for (i, x) in s.iter().enumerate() {
        print!("{}{}", if i == a { "|" } else { " " }, x);
    }
    //  8 7 8 8 8|2 1 1 2 2 4

    let b = s.partition_point(|x| *x > 5);
    assert_eq!(a, b); // pass
}

If the method named binary_search_by_pred, I expect that it returns any index where the predicate returns true, while this methods returns first index where returns false.

partition_point: The idea of partition is uncommon.
binary_search_by: In most of cases very annoying to write predicate.
binary_search_by_pred, and flip predicate: Causes confusion for C++ programmers

@KodrAus
Copy link
Contributor

KodrAus commented Jan 29, 2021

If I remember correctly, we had a long bikeshed over the name and ended up following C++ as a tie-breaker.

@scottmcm
Copy link
Member

scottmcm commented Jan 29, 2021

partition_point: The idea of partition is uncommon.

FWIW, we've had Iterator::partition since 1.0.

Barring something obviously better (which doesn't seem forthcoming), using the C++ name seems reasonable. A documentation update to crosslink this from binary_search definitely sounds good, though.

(Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range and such, cc rust-lang/rfcs#2184)

@matklad
Copy link
Member

matklad commented Jan 29, 2021

Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range and such

FWIW, my secret plan is to just submit a pr implementing equal_range after this is stabilized. Unlike lower/upper bound, it doesn't depend on exclusive/inclusive open question, and hopefully can be shipped without to much deliberation :)

@m-ou-se
Copy link
Member

m-ou-se commented Jan 30, 2021

Sounds like this is the best name we can come up with.

So, let's make sure the documentation of the binary_search functions point to partition_point as an alternative, and maybe add a doc(alias) for it as well.

If anybody feels like updating the documentation here: PRs welcome. ^^

Let's kick off the FCP for stabilizing this:

@rfcbot resolve name

@rfcbot
Copy link

rfcbot commented Jan 30, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jan 30, 2021
@VillSnow
Copy link
Contributor Author

I don't know much about doc(alias), but does it show this?

2021-01-31 (1)

It looks like to make a flow from other names. For example, some languages has reduce, so type 'reduce' at the search box, then hit Iterator::fold.
However, Rust already has binary_search for who want to find any item by value. It seems a bit different usage to suggest similar methods.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Feb 9, 2021
@rfcbot
Copy link

rfcbot commented Feb 9, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Feb 9, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Feb 11, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Feb 12, 2021
… r=matklad

Stabilize the partition_point feature

Stabilize the partition_point feature.
Tracking Issue: rust-lang#73831
First PR: rust-lang#73577
@matklad
Copy link
Member

matklad commented Feb 15, 2021

Closed by #81012

Thanks @VillSnow for bringing this all the way from an idea to the stabilization!

@VillSnow
Copy link
Contributor Author

Thanks for much assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-slice Area: [T] B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests