ordered query API #1195

Gankra · 2015-07-09T16:20:56Z

rendered

Add the following to BTreeMap

min
max
get_le
get_lt
get_ge
get_gt
min_mut
max_mut
get_le_mut
get_lt_mut
get_ge_mut
get_gt_mut

and to BTreeSet:

min
max
get_le
get_lt
get_ge
get_gt

Gankra · 2015-07-09T16:22:56Z

CC @bluss @aturon @apasel422

apasel422 · 2015-07-09T16:33:30Z

Looks good to me.

I assume you went with {first, last} instead of {min, max} to be consistent with the sequence types?

Should there also be remove_{first, last, pred_inc, pred_exc, succ_inc, succ_exc}?

Along those lines, we could also provide something like

fn last_entry(&mut self) -> Option<OccupiedEntry<K, V>>;

fn pred_inc_entry<Q: ?Sized>(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>
    where K: Borrow<Q>, Q: Ord;

...

I've been experimenting with this in my BST library. It's a niche use-case, but allows code to inspect the key and value before deciding whether to remove it.

bluss · 2015-07-09T16:35:46Z

text/0000-ordered-queries.md

+
+where `pred(Unbounded)` is max, and `succ(Unbounded)` in min by assuming you're getting the
+predecessor and successor of positive and negative infinity. This RFC does not propose this
+API because it is crazy-pants and would make our users cry.


I think this is a serious alternative.

Bound and .range() have existed for a while, are they not something we want to keep? Can I drag the alternative of using range syntax into this? (Bounded2 = Inclusive | Exclusive) so std::ops::Range<Bounded2> etc could be an alternative.

This seems to just shuffle the combinatorics around and make the calling convention more awkard, as far as I can tell. No?

I think it's inconsistent if we want to keep using Bound as it is (or even changed) in some places (.range()), and then have these methods not use it.

I regard Bound as a necessary evil for range because the combinatorics there seem truly catastrophic (18 iterator methods). That said I've never been super happy with the range design. Someone once suggested a builder pattern to me like:

// unbounded RHS .range().from(x).into_iter() // bounded RHS .range().from(x).to(y).into_iter() ...etc

Might be worth considering that more seriously.

Gankra · 2015-07-09T16:54:54Z

@apasel422 Also if your type is actually ordered max..min; min and max is super confusing IMO.

I actually intended to add remove stuff this morning, but obviously totally forgot!

I had also concluded that an Entry API was silly since VacantEntry is nonsensical, but I suppose Option<OccupiedEntry> actually makes sense. If we have Entry do we also want remove?

apasel422 · 2015-07-09T17:00:12Z

@gankro Presumably a dedicated removal method can be (slightly) more efficient than removing through the entry API, due to less bookkeeping. I hate to increase the combinatoric problem even more, but since the map types already have OccupiedEntry::remove and Map::remove, I'm inclined to add both.

Gankra · 2015-07-09T17:13:55Z

It's not clear to me that Only OccupiedEntry would have overhead. Constructing an OccupiedEntry is literally running remove and then copying some local variables into a struct instead of finishing the job. Creating an Entry has extra potential overhead because you need to be ready to make a VacantEntry.

apasel422 · 2015-07-09T17:20:44Z

It might be fine to just have the *_entry methods for now then. We could always add remove_* later.

nrc · 2015-07-09T20:46:21Z

text/0000-ordered-queries.md

+* succ_exc
+* first
+* last
+


Do these names have precedence from other libraries? They seem a bit too succinct to me (although a big plus one to the actual functionality, I've wanted this).

Java: higher/lower/ceil/floor
C++: lower_bound/upper_bound (these names are terrible and I explicitly killed them in collections reform)
Everything else I looked at: chaos or doesn't seem to have this precise collection/functionality.

I briefly pondered before/after and next/prev before letting my theory background take over and demand predecessor/successor.

Another potential naming scheme could involve {lt, le, ge, gt}, optionally with a prefix or suffix if we're concerned about conflicting with PartialOrd's methods.

Some ideas:
before, after, before_eq, after_eq
find_{lt, le, ge, gt}
get_{lt, le, ge, gt}

Incidentally the lack of genericity over mutability is killing me. Don't how I'd do it, but there's so much repetition in API's these days because of it.

Oh damn right I wanted to avoid that auuuugh.

Another option is {next, next_or_eq, prev, prev_or_eq, first, last}.

I do really like that lt/leq/etc is an established naming convention that people can bring into understanding.

leq or le? The former might be easier to grok, but the latter is consistent with PartialOrd and has the minor benefit of having the same number of characters as {lt, gt}.

Oh whoops, I thought that PartialOrd used leq.

benaryorg · 2015-07-17T16:14:24Z

Honestly, I would prefer using the builder suggestion from above.

Gankra · 2015-07-17T16:16:16Z

@benaryorg These are orthogal API discussions. One is for doing direct queries, one is for iterating ranges. While one can be implemented in terms of the other, this is not necessarily efficient or desirable.

benaryorg · 2015-07-17T16:20:55Z

@gankro So you are planning to build two APIs, which one of them might (please) use a builder pattern and the other being cursor-like?

Sorry if I do not quite get the idea behind the second API.

Gankra · 2015-07-17T16:29:41Z

This RFC is proposing an API just for answering queries of the form "who is the predecessor/successor/minimum/etc". All it does is return Option<(&K, &V)>. In principle this can be implemented as efficiently as possible.

The range API that was being discussed above would produce an Iterator<Item = (&K, &V)>. In principle, this can be as effecient as the query API, but in general it can't (more state needs to be maintained to be able to go to the next/prev).

Cursors are Yet Another thing that are not currently being proposed here, and that the standard library does not currently have a notion of. Cursors and iterators -- particularly &mut ones -- must be implemented as separate types because they have different semantics. Iterators say you can always call next, which enables you to get multiple mutable references into the same container safely. However Cursors explicitly must forbid this to be able to soundly perform other mutation operations or even to just "revisit" an element to avoid invalidating references or aliasing mutable references (both are Undefined Behaviour).

benaryorg · 2015-07-17T16:45:21Z

Okay, I understand now.

I'll leave function naming to you as I am the worst at that.

Gankra · 2015-07-19T23:50:14Z

@apasel422 Would you be fine with punting on remove/entry APIs until BTreeMap is rewritten to use parent pointers?

I believe they can be added afterwards without an RFC based on "natural API holes" logic.

apasel422 · 2015-07-19T23:51:23Z

@gankro Absolutely.

Gankra · 2015-07-20T00:12:20Z

I've renamed the APIs per discussion.

shepmaster · 2015-07-26T14:47:05Z

@gankro you may want to update your original comment. I read succ_exc_mut and was about to go on a trip to pick out a new color for the bikeshed.

Kimundi · 2015-07-29T19:41:43Z

Hm, how about combating the combinatoric explosion with type paramters?

.get_rel::<LE>(&Q) -> Option<(&K, &V)>;

If get where unstable it could have even been defined as

fn get<Ord=EQ>(&Q) -> Option<(&K, &V)>;

apasel422 · 2015-07-29T19:52:20Z

@Kimundi I've talked about something like that with @gankro in the past: Gankra/collect-rs#120 (comment).

Gankra · 2015-07-29T21:30:39Z

🔔 HERE YE HERE YE THIS RFC IS ENTERING ITS FINAL COMMENT PERIOD 🔔

apasel422 · 2015-07-30T12:16:42Z

text/0000-ordered-queries.md

+modulo, but this is a more general problem for the *ordered map* API. There are surely types for
+which a straight-up query will be cheaper than iterator initialization.
+
+It is also siginificantly more ergonomic/discoverable to have `pred_inc_mut(&K)` over


s/pred_inc_mut/get_le_mut/

s/siginificantly/significantly/

huonw · 2015-08-05T16:59:34Z

(It would be good for the typos that @apasel422 has noticed to be fixed if/before this is merged.)

I find the combinatorics here really really bad. It seems a little crazy to add so many methods for what I suspect are relatively niche use-cases. @gankro, I know you're rabidly against any use of enums in APIs to collapse functionality (i.e. the pred/next Bound-based API you propose as an alternative) but it seems to me that this case is a pretty good way to collapse an explosion into something that's actually managable. The methods could even be named something like max_before/min_after or max_upto/min_from (or something like that) which makes their use much more obvious.

apasel422 · 2015-08-05T17:10:15Z

Returning an OccupiedEntry for {min, max, lt, le, ge, gt} subsumes remove and get_mut. I propose that we only add get_* and *_entry varieties of these queries, regardless of whether we combine them via an enum or provide separate methods:

impl<K, V> Map<K, V> {
    fn get_lt<Q: ?Sized>(&self, key: &Q) -> Option<(&K, &V)>;
    fn lt_entry<Q: ?Sized>(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>;
    // ... `le`, `ge`, `gt`
}

or

enum Query<T> {
    Min,
    Lt(T),
    Le(T),
    Ge(T),
    Gt(T),
    Max,
}

impl<K, V> Map<K, V> {
    fn query<Q: ?Sized = K>(&self, query: Query<&Q>) -> Option<(&K, &V)>;
    fn query_entry<Q: ?Sized = K>(&mut self, query: Query<&Q>) -> Option<OccupiedEntry<K, V>>;
}

Gankra · 2015-08-05T18:56:09Z

I disagree that it's niche -- it's one of the primary reasons to use an ordered map.

diwic · 2015-08-05T20:08:55Z

Is there also use for a "nearest" version? I e, if the treemap looks for 1000 and can only find 900 and 1010, it will choose 1010 because it is nearest. That seems useful - although maybe that will require some additional trait bound (e g Sub or Add)?

Bikeshed wise, I don't know why min and max are not labelled get_min and get_max, if they - like get_le and friends - get an item in the map? That seems inconsistent.

arthurprs · 2015-08-05T23:37:50Z

This, please! I'm missing this for a while. Otherwise there's very little reason to have an ordered map!

It's a shame it requires so much code though, we need those parent pointers.

Gankra · 2015-08-06T00:58:29Z

@arthurprs parent pointers wouldn't solve the duplication, it's a pure descent algorithm. (I suppose it would reduce duplication with other APIs.)

gnzlbg · 2015-08-06T11:46:14Z

In think that the following is missing:

a description of what each function in the API does. I cannot find anywhere a description of what get_le does (should be obvious to everybody, but it also should be in the text of the RFC).
what is the algorithmic complexity of each API function (at least the worst case complexity should be specified)
what is the space complexity of each API function (does any of the functions allocate memory?)

When seeing all the get_le|gt|... I wonder:

why can't a function get(K, CMP) be provided instead, where the user can pass whatever cmp it wants (e.g. via a lambda)?

I see two main use-cases for the query API:

queries with the same "order" as the one used by the data-structure,
queries with an arbitrary different order.

So I basically expected to see two functions get(K) and get(K, cmp) (+ their _mut counterparts) for these two use-cases.

@gankro you asked for feedback, I hope this is some constructive one :P

apasel422 · 2015-08-06T12:01:12Z

The time and space complexity of these operations is an implementation
detail that will end up changing once parent pointers are added. However,
it may be useful to point out that the simplest implementation is based on
iterators. get_lt(k), for example, is map.iter().rev().find(|e| e.0 < k).

Can you provide an example of how you would use the user-provided closure
API, and the full signature of the method? I'm not sure how the predecessor
function, for example, could be written in terms of it.

On Thursday, August 6, 2015, gnzlbg notifications@github.com wrote:

In think that the following is missing:

a description of what each function in the API does. I cannot find
anywhere a description of what get_le does (should be obvious to
everybody, but it also should be in the text of the RFC).

what is the algorithmic complexity of each API function (at least
the worst case complexity should be specified)

what is the space complexity of each API function (does any of the
functions allocate memory?)

When seeing all the get_le|gt|... I wonder:

why can't a function get(K, CMP) be provided instead, where the user
can pass whatever cmp it wants (e.g. via a lambda)? (why restrict ourself
to le|gt|...).

I see two main use-cases for the query API:

queries with the same "order" as the one used by the data-structure,

queries with an arbitrary different order.

So I basically expected to see two functions get(K) and get(K, cmp) (+
their _mut counterparts) for these two use-cases.

@gankro https://github.com/gankro you asked for feedback, I hope this
is some constructive one :P

—
Reply to this email directly or view it on GitHub
#1195 (comment).

gnzlbg · 2015-08-06T12:07:12Z

The time and space complexity of these operations is an implementation
detail that will end up changing once parent pointers are added. However,
it may be useful to point out that the simplest implementation is based on
iterators. get_lt(k), for example, is map.iter().rev().find(|e| e.0 < k).

Basically I just want to know if I can call these in a loop without ending up in N^2 complexity or blowing up the stack. If I'm doing something latency-related I also need to know if they allocate any memory in the heap.

If the implementation improves the complexity in the future, that is a non-breaking change, but the current complexity guarantees should be there.

Can you provide an example of how would you use the user-provided closure
API, and the full signature of the method? I'm not sure how the predecessor
function, for example, could be written in terms of it.

I'm not sure either, but for the _lt|_gt|_... methods I'd rather write get(k, |a, e| e.0 < a.0) or get(k, lt) (supposing we provide a std::lt that works here).

apasel422 · 2015-08-06T12:18:20Z

I don't think this RFC needs to make complexity guarantees. People can decide to use these methods based on the public documentation of their current complexity, not the contents of this RFC. But if you need the predecessor of a key for a certain algorithm, you have to find it somehow, so providing these APIs will be beneficial regardless of their complexity. Even if they are implemented completely naively at first, code that calls the methods instead of doing a manual iterator-based search will be made more efficient automatically when the implementation improves.

I don't understand what get(k, |a, e| e.0 < a.0) would do. What is the purpose of passing k there?

cristicbz · 2015-08-06T13:18:38Z

An alternative to the enum would also be a static dispatch version, similar to the way Range worked out

struct Min;
struct Max;
struct Le<Q: ?Sized>(Q);
// ...

trait Query<K, V, Selector: ?Sized> {
    fn query(&self, query: &Selector) -> Option<(&K, &V)>;
    fn query_mut(&mut self, query: &Selector) -> Option<(&K, &mut V)>;
    // maybe query_entry as well
}

impl<K, V> Query<K, V, Min> for Map<K, V> {
    /* ... */
}

impl<K, V, Q> Query<K, V, Le<Q>> for Map<K, V>
        where K: Borrow<Q> {
    /* ... */
}
// ...

Cons: you'd have to import std::collections::Query to get these methods on your map.

apasel422 · 2015-08-06T13:46:37Z

You actually wouldn't have to import the query trait, because we could add inherent methods to the map that simply call out to the appropriate impl. You would have to import the query structs themselves, though, and there would have to be a different trait for set queries, which won't expose mutable elements.

I'm not opposed to the static dispatch approach, because it can be nice to represent the queries themselves as values (e.g. passing around Lt(&5)).

cristicbz · 2015-08-06T14:38:29Z

To avoid having a separate trait for Set, you could have Query and QueryMut and make the return an associated type and implement them on referefences (to avoid needing HKT-s).

trait Query<Selector: ?Sized> {
    type Output;
    fn query(self, query: &Selector) -> Option<Self::Output>;
}

trait QueryMut<Selector: ?Sized> {
    type Output;
    fn query_mut(self, query: &Selector) -> Option<Self::Output>;
}

impl<'a, K, V> Query<Min> for &'a Map<K, V> {
    type Output = (&'a K, &'a V);

    fn query(self, query: &Selector) -> Option<Self::Output> {
        /* ... */
    }
}

impl<'a, K, V> QueryMut<Min> for &'a mut Map<K, V> {
    type Output = (&'a K, &'a mut V);

    fn query_mut(self, query: &Selector) -> Option<Self::Output> {
        /* ... */
    }
}

impl<'a, E> Query<Min> for &'a Set<E> {
    type Output = &'a E;

    fn query(self, query: &Selector) -> Option<Self::Output> {
        /* ... */
    }
}

If you do provide inherent methods though, I don't know if there is much value in using the same trait (which would really only ever show up to bound the argument of the inherent methods).

apasel422 · 2015-08-06T15:51:58Z

@cristicbz I've put a POC implementation of what you are suggesting here: https://github.com/apasel422/bst/tree/query.

aturon · 2015-08-07T16:50:55Z

Here's a thought on an API variant to deal with combinatorics while still being friendly:

fn max<Q: ?Sized, R>(&self, range: R) -> Option<(&K, &V)>
    where K: Borrow<Q>, Q: Ord, AnyRange<&Q>;

fn min<Q: ?Sized, R>(&self, range: R) -> Option<(&K, &V)>
    where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>;

fn max_entry<Q: ?Sized, R>(&mut self, range: R) -> Option<OccupiedEntry<K, V>>
    where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>;

fn min_entry<Q: ?Sized, R>(&mut self, range: R) -> Option<OccupiedEntry<K, V>>
    where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>;

Given inclusive ranges, you can cover all of the cases you wanted to with your API, without requiring any extra imports or names to be used.

UPDATE: in case the above is unclear, here are some examples:

// get_le
map.max(...&k)

// get_lt
map.max(..&k)

// get_ge
map.min(&k..)

// get the smallest element:
map.min(..)

However, @gankro points out on IRC that since exclusive ranges only exclude on the right, we can't express get_gt this way. Too bad!

cristicbz · 2015-08-07T17:48:05Z

@aturon The inability to express get_gt is what lead me to suggest new types. Agreed, ranges would be neater, but short of introducing a new range type (>x.., the unary greater than operator :P) the result would be inconsistent.

huonw · 2015-08-12T20:18:09Z

FWIW, it seems C++'s equivalent container std::map only offers functionality (essentially) equivalent to what we currently offer, but with C++'s iterators (i.e. independent endpoints, and syntactically nicer for the simplest case) and const-overloading.

Gankra · 2015-08-12T20:37:59Z

The libs team has decided to close this RFC pending investigating alternative API solutions.

In particular I think there's a promising opportunity with a range builder pattern.

For now this functionality could be provided by an external crate -- at least semantically, not necessarily perf-wise -- on top of range. A parent pointer impl should address perf.

Gankra added 2 commits July 9, 2015 09:19

ordered query API

9cb1ddf

fixup

243e200

Gankra added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jul 9, 2015

Gankra self-assigned this Jul 9, 2015

bluss reviewed Jul 9, 2015
View reviewed changes

nrc reviewed Jul 9, 2015
View reviewed changes

apasel422 mentioned this pull request Jul 12, 2015

Add sorted map and set traits apasel422/eclectic#6

Open

benaryorg mentioned this pull request Jul 17, 2015

B+ Tree in BTreeMap rust-lang/rust#27090

Closed

Gankra mentioned this pull request Jul 19, 2015

RFC: Btree query API rust-lang/rust#27135

Closed

rename query API

7b2618c

Gankra mentioned this pull request Jul 20, 2015

RFC: Add item recovery collection APIs #1194

Merged

Gankra added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jul 29, 2015

apasel422 reviewed Jul 30, 2015
View reviewed changes

Gankra added 2 commits August 5, 2015 11:51

Update 0000-ordered-queries.md

7a9163d

Fixups and enum nuclear option

309dc21

Gankra closed this Aug 12, 2015

Gankra mentioned this pull request Aug 13, 2015

Tracking issue for sorted collection ranges rust-lang/rust#27787

Closed

ssomers mentioned this pull request Oct 2, 2020

proposal for BTreeMap/Set min/max, #62924 rust-lang/rust#65637

Merged

ssomers mentioned this pull request Dec 10, 2021

Tracking issue for map_first_last: first/last methods on BTreeSet and BTreeMap rust-lang/rust#62924

Closed

6 tasks

ordered query API #1195

ordered query API #1195

Conversation

Gankra commented Jul 9, 2015

Gankra commented Jul 9, 2015

apasel422 commented Jul 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Jul 9, 2015

apasel422 commented Jul 9, 2015

Gankra commented Jul 9, 2015

apasel422 commented Jul 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaryorg commented Jul 17, 2015

Gankra commented Jul 17, 2015

benaryorg commented Jul 17, 2015

Gankra commented Jul 17, 2015

benaryorg commented Jul 17, 2015

Gankra commented Jul 19, 2015

apasel422 commented Jul 19, 2015

Gankra commented Jul 20, 2015

shepmaster commented Jul 26, 2015

Kimundi commented Jul 29, 2015

apasel422 commented Jul 29, 2015

Gankra commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huonw commented Aug 5, 2015

apasel422 commented Aug 5, 2015

Gankra commented Aug 5, 2015

diwic commented Aug 5, 2015

arthurprs commented Aug 5, 2015

Gankra commented Aug 6, 2015

gnzlbg commented Aug 6, 2015

apasel422 commented Aug 6, 2015

gnzlbg commented Aug 6, 2015

apasel422 commented Aug 6, 2015

cristicbz commented Aug 6, 2015

apasel422 commented Aug 6, 2015

cristicbz commented Aug 6, 2015

apasel422 commented Aug 6, 2015

aturon commented Aug 7, 2015

cristicbz commented Aug 7, 2015

huonw commented Aug 12, 2015

Gankra commented Aug 12, 2015