Collection Views #216

Merged
merged 6 commits into from Sep 16, 2014

Conversation

Projects
None yet
@Gankro
Contributor

Gankro commented Aug 28, 2014

Add additional iterator-like View objects to collections.

Views provide a composable mechanism for in-place observation and mutation of a single element in the collection, without having to "re-find" the element multiple times. This deprecates several "internal mutation" methods like hashmap's find_or_insert_with.

Rendered View

active/0000-collection-views.md
+
+```
+let mut view = map.view(key);
+if view.is_empty() {

This comment has been minimized.

@kballard

kballard Aug 28, 2014

Contributor

Presumably this should be !view.is_empty().

@kballard

kballard Aug 28, 2014

Contributor

Presumably this should be !view.is_empty().

This comment has been minimized.

@Gankro

Gankro Aug 28, 2014

Contributor

d'oh! fixed :)

@Gankro

Gankro Aug 28, 2014

Contributor

d'oh! fixed :)

@Valloric

This comment has been minimized.

Show comment
Hide comment
@Valloric

Valloric Aug 28, 2014

This looks really nice. Agreed that the current methods on hashmap are unwieldy; the changes proposed here look like a nice improvement from both a readability/usability perspective and a performance one.

This looks really nice. Agreed that the current methods on hashmap are unwieldy; the changes proposed here look like a nice improvement from both a readability/usability perspective and a performance one.

@huonw

This comment has been minimized.

Show comment
Hide comment
@huonw

huonw Aug 28, 2014

Member

I've been thinking about a view-like objects like this; it would allow us to implement this once for each data structure and get all the functions find_or_insert, insert_or_update_with , ... rather than having to respecify them all the time, which leaves half of them missing for data structures like TreeMap, TrieMap, etc. and this is annoying.

This seems a little like lenses in Haskell, I wonder if we can draw inspiration from them.

Member

huonw commented Aug 28, 2014

I've been thinking about a view-like objects like this; it would allow us to implement this once for each data structure and get all the functions find_or_insert, insert_or_update_with , ... rather than having to respecify them all the time, which leaves half of them missing for data structures like TreeMap, TrieMap, etc. and this is annoying.

This seems a little like lenses in Haskell, I wonder if we can draw inspiration from them.

@pczarn

This comment has been minimized.

Show comment
Hide comment
@pczarn

pczarn Aug 28, 2014

I like this proposal overall. Even though I would rather use an enum with two variants for this: one that allows mutation of the entry, and another that provides means of inserting values into an empty spot. That, or adaptors.

Also, you are stepping out of the formal style. At some point I started wondering if "we" refers to broader audience or only those people that work on APIs.

pczarn commented Aug 28, 2014

I like this proposal overall. Even though I would rather use an enum with two variants for this: one that allows mutation of the entry, and another that provides means of inserting values into an empty spot. That, or adaptors.

Also, you are stepping out of the formal style. At some point I started wondering if "we" refers to broader audience or only those people that work on APIs.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Aug 28, 2014

Contributor

@pczarn: Sorry, I tend to drift into using the academic formal style, in which We is used in odd ways. I am We, you are We, everyone is We, There Has Never Not Been We.

Anyway.

If we're really interested in adapters/traits I would probably lean towards something like the following:

trait View<T> {
    fn get(&self) -> Option<&T>;
    fn get_mut(&mut self) -> Option<&mut T>; 
    fn set(self, T) -> Option<T>;
    fn take(self) -> Option<T>; //sure, why not


    // Optional pre-impl'd junk
    fn is_empty(&self) -> bool { self.get().is_none() }
    fn insert_or_update_with(self, insert: T, update: |&mut T| -> T) -> bool {
        // ehh don't want to have to think about ownership tonight
        if self.is_empty() {
            self.set(insert);
            true
        } else {
            let v = self.get_mut().unwrap();
            let new_v = update(v);
            *v = new_v;
            false
        }
    }
    // ... more of that combinatoric explosion again?
} 

This loses all key information for maps, but it's very generic and simple, and exactly how useful that information is to a user is unclear. We could also bifurcate this into MapView and uh... NotMapView.

However @pczarn raises an interesting idea about the possibility of an empty/full enum. This would cut out a lot of the options and state checks, but would lead to some duplication.

Contributor

Gankro commented Aug 28, 2014

@pczarn: Sorry, I tend to drift into using the academic formal style, in which We is used in odd ways. I am We, you are We, everyone is We, There Has Never Not Been We.

Anyway.

If we're really interested in adapters/traits I would probably lean towards something like the following:

trait View<T> {
    fn get(&self) -> Option<&T>;
    fn get_mut(&mut self) -> Option<&mut T>; 
    fn set(self, T) -> Option<T>;
    fn take(self) -> Option<T>; //sure, why not


    // Optional pre-impl'd junk
    fn is_empty(&self) -> bool { self.get().is_none() }
    fn insert_or_update_with(self, insert: T, update: |&mut T| -> T) -> bool {
        // ehh don't want to have to think about ownership tonight
        if self.is_empty() {
            self.set(insert);
            true
        } else {
            let v = self.get_mut().unwrap();
            let new_v = update(v);
            *v = new_v;
            false
        }
    }
    // ... more of that combinatoric explosion again?
} 

This loses all key information for maps, but it's very generic and simple, and exactly how useful that information is to a user is unclear. We could also bifurcate this into MapView and uh... NotMapView.

However @pczarn raises an interesting idea about the possibility of an empty/full enum. This would cut out a lot of the options and state checks, but would lead to some duplication.

active/0000-collection-views.md
+ pub fn is_empty(&self) -> bool;
+
+ /// Get a reference to the Entry's key
+ pub fn key(&self) -> &K;

This comment has been minimized.

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

Why both key() and get_key()? Does get_key() ever return None? (What does key() return then?)

Edit: To answer my own question: The key presently in the map and the one in this view may be "equivalent", but they might not be the same. get_key() is the key (possibly) already there, key() is the one we're "searching" with.

But maybe this could be made clearer somehow.

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

Why both key() and get_key()? Does get_key() ever return None? (What does key() return then?)

Edit: To answer my own question: The key presently in the map and the one in this view may be "equivalent", but they might not be the same. get_key() is the key (possibly) already there, key() is the one we're "searching" with.

But maybe this could be made clearer somehow.

This comment has been minimized.

@Gankro

Gankro Aug 28, 2014

Contributor

Yeah this me just being a maximalist. key() yields the guarantor, get_key() gets the key in the actual collection.

@Gankro

Gankro Aug 28, 2014

Contributor

Yeah this me just being a maximalist. key() yields the guarantor, get_key() gets the key in the actual collection.

active/0000-collection-views.md
+ *v = new_v;
+} else {
+ view.set(1);
+}

This comment has been minimized.

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

Is there a reason this can't be written to use if let or match on view.get_mut()?

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

Is there a reason this can't be written to use if let or match on view.get_mut()?

This comment has been minimized.

@pczarn

pczarn Aug 28, 2014

I'm convinced it's because the borrow of view currently doesn't end even with get_mut returning None. Using an "empty/full" enum would make matching possible (and perhaps easier, too).

@pczarn

pczarn Aug 28, 2014

I'm convinced it's because the borrow of view currently doesn't end even with get_mut returning None. Using an "empty/full" enum would make matching possible (and perhaps easier, too).

This comment has been minimized.

@kballard

kballard Aug 28, 2014

Contributor

Yes. Lexical borrow scopes. view.get_mut() mutably borrows view, so using it in a match expression prevents mutating view in the fallback arm:

match view.get_mut() {
    Some(v) => {
        *v += 1;
    }
    None => {
        view.set(1); // ERROR: view is still mutably borrowed
    }
}

I'm currently working on a modification to if let to fix the expression borrow scope issue, which would let you express this as

if let Some(v) = view.get_mut() {
    *v += 1;
} else {
    // borrow has been returned already
    view.set(1);
}

However, this is not part of the if let RFC, and may or may not be accepted.

FWIW, non-lexical borrow scopes will fix the match approach (and will fix the RFC-specified if let as well).

@kballard

kballard Aug 28, 2014

Contributor

Yes. Lexical borrow scopes. view.get_mut() mutably borrows view, so using it in a match expression prevents mutating view in the fallback arm:

match view.get_mut() {
    Some(v) => {
        *v += 1;
    }
    None => {
        view.set(1); // ERROR: view is still mutably borrowed
    }
}

I'm currently working on a modification to if let to fix the expression borrow scope issue, which would let you express this as

if let Some(v) = view.get_mut() {
    *v += 1;
} else {
    // borrow has been returned already
    view.set(1);
}

However, this is not part of the if let RFC, and may or may not be accepted.

FWIW, non-lexical borrow scopes will fix the match approach (and will fix the RFC-specified if let as well).

This comment has been minimized.

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

OK, I figured it might be something like this.

@glaebhoerl

glaebhoerl Aug 28, 2014

Contributor

OK, I figured it might be something like this.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Aug 29, 2014

Contributor

Because I think it's interesting, I ported my Hashmap design to @pczarn's Enum style. The consequence is that basically all of the complexity gets pushed into the types themselves. The resultant API is much simpler and cleaner.

I also took the opportunity to add take and remove all references to keys, to demonstrate what this API would look like when normalized to meet a theoretical trait for Views.

I also impled insert_or_update_with on the enum as a demo of what that would look like.

You can also properly use match instead of an is_empty check to update.

Resultant types (impl details stripped):

/// A View into a single occupied location in a HashMap
pub struct OccupiedEntry<'a, K, V>;

/// A View into a single empty location in a HashMap
pub struct VacantEntry<'a, K, V>;

/// A View into a single location in a HashMap
pub enum Entry<'a, K, V> {
    /// An occupied View
    Occupied(OccupiedEntry<'a, K, V>),
    /// A vacant View
    Vacant(VacantEntry<'a, K, V>),
}

impls (impl details stripped)

impl<'a, K, V> Entry<'a, K, V> {
    /// Insert the value or update it, and return whether an insertion occured
    pub fn insert_or_update_with(self, insert: V, update: |&mut V| -> V) -> bool;
}

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Get a reference to the value of this View
    pub fn get(&self) -> &V;

    /// Get a mutable reference to the value of this View
    pub fn get_mut(&mut self) -> &mut V;

    /// Set the value stored in this View
    pub fn set(mut self, value: V) -> V;

    /// Take the value stored in this View
    pub fn take(self) -> V;
}

impl<'a, K, V> VacantEntry<'a, K, V> {
    /// Set the value stored in this View
    pub fn set(self, value: V);
}
Contributor

Gankro commented Aug 29, 2014

Because I think it's interesting, I ported my Hashmap design to @pczarn's Enum style. The consequence is that basically all of the complexity gets pushed into the types themselves. The resultant API is much simpler and cleaner.

I also took the opportunity to add take and remove all references to keys, to demonstrate what this API would look like when normalized to meet a theoretical trait for Views.

I also impled insert_or_update_with on the enum as a demo of what that would look like.

You can also properly use match instead of an is_empty check to update.

Resultant types (impl details stripped):

/// A View into a single occupied location in a HashMap
pub struct OccupiedEntry<'a, K, V>;

/// A View into a single empty location in a HashMap
pub struct VacantEntry<'a, K, V>;

/// A View into a single location in a HashMap
pub enum Entry<'a, K, V> {
    /// An occupied View
    Occupied(OccupiedEntry<'a, K, V>),
    /// A vacant View
    Vacant(VacantEntry<'a, K, V>),
}

impls (impl details stripped)

impl<'a, K, V> Entry<'a, K, V> {
    /// Insert the value or update it, and return whether an insertion occured
    pub fn insert_or_update_with(self, insert: V, update: |&mut V| -> V) -> bool;
}

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Get a reference to the value of this View
    pub fn get(&self) -> &V;

    /// Get a mutable reference to the value of this View
    pub fn get_mut(&mut self) -> &mut V;

    /// Set the value stored in this View
    pub fn set(mut self, value: V) -> V;

    /// Take the value stored in this View
    pub fn take(self) -> V;
}

impl<'a, K, V> VacantEntry<'a, K, V> {
    /// Set the value stored in this View
    pub fn set(self, value: V);
}
@nham

This comment has been minimized.

Show comment
Hide comment
@nham

nham Aug 29, 2014

You mentioned implementing Views for "collections", but the only example provided was for HashMap. Will this only be needed for Map types, or will other collections have Views as well? If so, how would they look?

nham commented Aug 29, 2014

You mentioned implementing Views for "collections", but the only example provided was for HashMap. Will this only be needed for Map types, or will other collections have Views as well? If so, how would they look?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Aug 29, 2014

Contributor

Maps are the nicest example, since they often have complex search procedures. However there's nothing preventing this from being implemented on index-based structures. The only API difference (at least with the minimal version posted just above) would be view, which would I guess take an int in that case. For the more complex version, you could have keyish things yield an int, I guess?

I could also see having a view_when type thing that maybe searches the structure with a predicate?

Outside of maps and lists, this functionality seems fairly irrelevant. Sets as boolean maps don't really benefit from this complex behaviour. More exotic things like BitV's and PriorityQueues similarly don't really need this. So really I'd say this is for collections that implement Index<K, V>, where V is truly generic. You view on K to manipulate V in richer ways without incurring the indexing cost more than once. Maybe it should be tied to indexing somehow?

Contributor

Gankro commented Aug 29, 2014

Maps are the nicest example, since they often have complex search procedures. However there's nothing preventing this from being implemented on index-based structures. The only API difference (at least with the minimal version posted just above) would be view, which would I guess take an int in that case. For the more complex version, you could have keyish things yield an int, I guess?

I could also see having a view_when type thing that maybe searches the structure with a predicate?

Outside of maps and lists, this functionality seems fairly irrelevant. Sets as boolean maps don't really benefit from this complex behaviour. More exotic things like BitV's and PriorityQueues similarly don't really need this. So really I'd say this is for collections that implement Index<K, V>, where V is truly generic. You view on K to manipulate V in richer ways without incurring the indexing cost more than once. Maybe it should be tied to indexing somehow?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Aug 29, 2014

Contributor

On second thought, that needs to be made more extreme: it should also be non-trivial to predetermine if a key will yield a value. If you can know trivially, then get_mut is just as expressive (modulo take). In the case of our List structures, it is trivial, as any index in-bounds is known to be occupied. Thus, Views really are only appropriate for Maps. (Maaaybe DList just to perform take more efficiently)

Contributor

Gankro commented Aug 29, 2014

On second thought, that needs to be made more extreme: it should also be non-trivial to predetermine if a key will yield a value. If you can know trivially, then get_mut is just as expressive (modulo take). In the case of our List structures, it is trivial, as any index in-bounds is known to be occupied. Thus, Views really are only appropriate for Maps. (Maaaybe DList just to perform take more efficiently)

@gsingh93

This comment has been minimized.

Show comment
Hide comment
@gsingh93

gsingh93 Aug 29, 2014

@Gankro
"Outside of maps and lists, this functionality seems fairly irrelevant. Sets as boolean maps don't really benefit from this complex behaviour".

Hmm, I'd still like to see this for sets. There's no guarantee that two elements of a set are "exactly" equal from a perspective of what data they contain even if the eq method says they're equal. Thus, with the view API, you'd be able to update set elements without one call to remove() and another call to insert(). Sure, you could probably switch over your set to a map in many cases, but I think it's still nice to have and it makes things consistent.

"More exotic things like BitV's and PriorityQueues similarly don't really need this."

Maybe they don't need it, but it again would be nice to have. I was implementing Prim's the other day, and was wishing that the PriorityQueue API let me update the edge weights I was putting in the queue. Currently, you have to add new edges to the graph and just leave the old ones in there, which slows down the algorithm a bit.

I think the RFC should specifically state which collections this will be implemented for, just so we're all clear on this.

@Gankro
"Outside of maps and lists, this functionality seems fairly irrelevant. Sets as boolean maps don't really benefit from this complex behaviour".

Hmm, I'd still like to see this for sets. There's no guarantee that two elements of a set are "exactly" equal from a perspective of what data they contain even if the eq method says they're equal. Thus, with the view API, you'd be able to update set elements without one call to remove() and another call to insert(). Sure, you could probably switch over your set to a map in many cases, but I think it's still nice to have and it makes things consistent.

"More exotic things like BitV's and PriorityQueues similarly don't really need this."

Maybe they don't need it, but it again would be nice to have. I was implementing Prim's the other day, and was wishing that the PriorityQueue API let me update the edge weights I was putting in the queue. Currently, you have to add new edges to the graph and just leave the old ones in there, which slows down the algorithm a bit.

I think the RFC should specifically state which collections this will be implemented for, just so we're all clear on this.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Aug 29, 2014

Contributor

@gsingh93 Unfortunately, all of our sets are invariably just a thin wrapper around Map<T, ()>. Maps, meanwhile treat their keys like trash. Almost every operation that displaces a key just throws it away. Consequently, as currently designed, you can't ever see an item in a Set<T> outside of iterators. You can only ask "do you contain me?" and get a boolean back. Literally the only way to move a value out of a Set is with a move_iter.

Further, you can't perform an in-place replacement on a Set because it's structured on its values, unlike a map. To change the value would change where it "goes" in the structure, making the set operation the same as a full insert and remove.

In theory, I could flesh out my minimal enum-style design to regain the notion of "keys", and we could treat a Set as View<T,()>. Then we could provide a swap_keys() method on OccupiedEntry that swaps the guarantor with the key stored in the collection. This would be safe and efficient.

For priority queues it sounds like you want increase_key and decrease_key, which are a whole different ballpark of crazy to implement in Rust. Especially with a Binary Heap like we have, which are notoriously bad at those two operations. Regardless of the heap, to find an element requires a linear search of all the contents, unless you already have a pointer into the structure. But you can't have structure pointers in an array-based binary heap (without a lot of indirection and information duplication). It sounds like you want a pairing heap, which we don't currently provide, and would have serious safety issues to implement efficiently.

Contributor

Gankro commented Aug 29, 2014

@gsingh93 Unfortunately, all of our sets are invariably just a thin wrapper around Map<T, ()>. Maps, meanwhile treat their keys like trash. Almost every operation that displaces a key just throws it away. Consequently, as currently designed, you can't ever see an item in a Set<T> outside of iterators. You can only ask "do you contain me?" and get a boolean back. Literally the only way to move a value out of a Set is with a move_iter.

Further, you can't perform an in-place replacement on a Set because it's structured on its values, unlike a map. To change the value would change where it "goes" in the structure, making the set operation the same as a full insert and remove.

In theory, I could flesh out my minimal enum-style design to regain the notion of "keys", and we could treat a Set as View<T,()>. Then we could provide a swap_keys() method on OccupiedEntry that swaps the guarantor with the key stored in the collection. This would be safe and efficient.

For priority queues it sounds like you want increase_key and decrease_key, which are a whole different ballpark of crazy to implement in Rust. Especially with a Binary Heap like we have, which are notoriously bad at those two operations. Regardless of the heap, to find an element requires a linear search of all the contents, unless you already have a pointer into the structure. But you can't have structure pointers in an array-based binary heap (without a lot of indirection and information duplication). It sounds like you want a pairing heap, which we don't currently provide, and would have serious safety issues to implement efficiently.

@brendanzab

This comment has been minimized.

Show comment
Hide comment
@brendanzab

brendanzab Sep 1, 2014

Member

Looks nice.

I agree with @huonw that we should look at lenses to see if it can inform any design decisions, seeing as they have spent so much effort developing a formal basis for it.

Member

brendanzab commented Sep 1, 2014

Looks nice.

I agree with @huonw that we should look at lenses to see if it can inform any design decisions, seeing as they have spent so much effort developing a formal basis for it.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 11, 2014

I've been playing with another related idea using Zippers, Editors, and Contexts, but I'm hitting the issue that pub trait A: B {} pub trait B: A {} causes a rustc stack overflow.

reem commented Sep 11, 2014

I've been playing with another related idea using Zippers, Editors, and Contexts, but I'm hitting the issue that pub trait A: B {} pub trait B: A {} causes a rustc stack overflow.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 11, 2014

Contributor

@reem Why are you doing that? AFAIK even if that worked, there's no benefit to doing that. Since every implementor of A must implement B, and every implementor of B must implement A, then you should just use a single trait with all the methods from both A and B.

In any case, this is covered under rust-lang/rust#12511, any commentary on this stack overflow should go there.

Contributor

kballard commented Sep 11, 2014

@reem Why are you doing that? AFAIK even if that worked, there's no benefit to doing that. Since every implementor of A must implement B, and every implementor of B must implement A, then you should just use a single trait with all the methods from both A and B.

In any case, this is covered under rust-lang/rust#12511, any commentary on this stack overflow should go there.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 11, 2014

What I'm actually doing involves a slightly more complex recursion, where it's not a bound on Self but on one of the type parameters of the trait. Basically, a Context has to be able to produce an Editor, and an Editor has to be able to produce a Context.

reem commented Sep 11, 2014

What I'm actually doing involves a slightly more complex recursion, where it's not a bound on Self but on one of the type parameters of the trait. Basically, a Context has to be able to produce an Editor, and an Editor has to be able to produce a Context.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Sep 11, 2014

Contributor

@reem Sounds like rust-lang/rust#12644 then.

Contributor

kballard commented Sep 11, 2014

@reem Sounds like rust-lang/rust#12644 then.

active/0000-collection-views.md
+
+We replace all the internal mutation methods with a single method on a collection: `view`.
+The signature of `view` will depend on the specific collection, but generally it will be similar to
+the signature for searching in that structure. `view` will in turn return a `View` object, which

This comment has been minimized.

@wackywendell

wackywendell Sep 15, 2014

A View object? Do you mean an Entry object, judging from below?

@wackywendell

wackywendell Sep 15, 2014

A View object? Do you mean an Entry object, judging from below?

This comment has been minimized.

@Gankro

Gankro Sep 15, 2014

Contributor

View is to Iterator as Entry is to Entries. View is the abstract notion, where Entry is the concrete implementation.

@Gankro

Gankro Sep 15, 2014

Contributor

View is to Iterator as Entry is to Entries. View is the abstract notion, where Entry is the concrete implementation.

This comment has been minimized.

@wackywendell

wackywendell Sep 15, 2014

Ah, that makes sense!

@wackywendell

wackywendell Sep 15, 2014

Ah, that makes sense!

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro This is really great work.

I agree that @pczarn's variant is cleaner; can you update the RFC with that as the main proposal? For history, you can keep the original proposal as an alternative. (I think we should use the keyless variant in particular, which is squeaky-clean.)

Really, my only hesitation is the name. Would you consider using "Cursor"? I'm imagining the term as generically referring to an object that points into some other collection, allowing inspection and mutation, but not necessarily navigation.

As far as adapting to other collections: I see no problem with landing this for our map types to begin with, and then being on the lookout for similar opportunities elsewhere. This API will likely remain as experimental for some time in any case, and my general feeling is that we should only standardize general APIs when we have many good concrete instances in hand.

Member

aturon commented Sep 15, 2014

@Gankro This is really great work.

I agree that @pczarn's variant is cleaner; can you update the RFC with that as the main proposal? For history, you can keep the original proposal as an alternative. (I think we should use the keyless variant in particular, which is squeaky-clean.)

Really, my only hesitation is the name. Would you consider using "Cursor"? I'm imagining the term as generically referring to an object that points into some other collection, allowing inspection and mutation, but not necessarily navigation.

As far as adapting to other collections: I see no problem with landing this for our map types to begin with, and then being on the lookout for similar opportunities elsewhere. This API will likely remain as experimental for some time in any case, and my general feeling is that we should only standardize general APIs when we have many good concrete instances in hand.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

@aturon So we're not interested in providing keys for giving some functionality to Sets, then? I suppose key stuff can be bolted on after, anyway.

Do we want to leverage ToOwned here, or can it wait and we'll just upgrade it if the collections reform goes through? If we're using ToOwned what's the semantics we want here? If there's already a key, should we avoid applying that transform on insertion on the assumption that Eq keys are indistinguishable? Or maybe we should provide set as lazy and swap as aggresive wrt keys? Probably too complicated. I've been leaning towards treating keys as indistinguishable. We can add backcompat methods for "when you care" later.

Contributor

Gankro commented Sep 15, 2014

@aturon So we're not interested in providing keys for giving some functionality to Sets, then? I suppose key stuff can be bolted on after, anyway.

Do we want to leverage ToOwned here, or can it wait and we'll just upgrade it if the collections reform goes through? If we're using ToOwned what's the semantics we want here? If there's already a key, should we avoid applying that transform on insertion on the assumption that Eq keys are indistinguishable? Or maybe we should provide set as lazy and swap as aggresive wrt keys? Probably too complicated. I've been leaning towards treating keys as indistinguishable. We can add backcompat methods for "when you care" later.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro

So we're not interested in providing keys for giving some functionality to Sets, then? I suppose key stuff can be bolted on after, anyway.

Yeah, that was my thought: we can always add the key manipulation functionality later, if it turns out to be strongly desired. I understand the hypothetical arguments, but I have yet to see a compelling, concrete example where you'd really want that functionality.

Do we want to leverage ToOwned here, or can it wait and we'll just upgrade it if the collections reform goes through? If we're using ToOwned what's the semantics we want here? If there's already a key, should we avoid applying that transform on insertion on the assumption that Eq keys are indistinguishable? Or maybe we should provide set as lazy and swap as aggresive wrt keys? Probably too complicated. I've been leaning towards treating keys as indistinguishable. We can add backcompat methods for "when you care" later.

I'd like to keep the RFCs separate for now, and this one's likely to land first. So let's revise with ToOwned later, if needed.

As to the ownership semantics in general, the easiest solution is the API you've given: take an owned key up front, even though if the item exists it isn't needed.

An alternative that might not be bad is: take a borrowed key for view, and then have the set method on VacantEntry take an owned key. Upsides: save an allocation/copy. Downsides: less convenient, and opens the door to trouble if the owned and borrowed keys hash differently. We could check for the latter, but that's performance penality...

So I think for now, I'd stick with today's ownership story as you have been.

Member

aturon commented Sep 15, 2014

@Gankro

So we're not interested in providing keys for giving some functionality to Sets, then? I suppose key stuff can be bolted on after, anyway.

Yeah, that was my thought: we can always add the key manipulation functionality later, if it turns out to be strongly desired. I understand the hypothetical arguments, but I have yet to see a compelling, concrete example where you'd really want that functionality.

Do we want to leverage ToOwned here, or can it wait and we'll just upgrade it if the collections reform goes through? If we're using ToOwned what's the semantics we want here? If there's already a key, should we avoid applying that transform on insertion on the assumption that Eq keys are indistinguishable? Or maybe we should provide set as lazy and swap as aggresive wrt keys? Probably too complicated. I've been leaning towards treating keys as indistinguishable. We can add backcompat methods for "when you care" later.

I'd like to keep the RFCs separate for now, and this one's likely to land first. So let's revise with ToOwned later, if needed.

As to the ownership semantics in general, the easiest solution is the API you've given: take an owned key up front, even though if the item exists it isn't needed.

An alternative that might not be bad is: take a borrowed key for view, and then have the set method on VacantEntry take an owned key. Upsides: save an allocation/copy. Downsides: less convenient, and opens the door to trouble if the owned and borrowed keys hash differently. We could check for the latter, but that's performance penality...

So I think for now, I'd stick with today's ownership story as you have been.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

@aturon sounds good, go for works-right-now back-compat minimalism.

I'm not a huge fan of calling these things Cursors though, since the "real" Cursor design we've been working on has radically different semantics. To conflate the two seems unhelpful. Unless you want Cursors to have a different name?

Contributor

Gankro commented Sep 15, 2014

@aturon sounds good, go for works-right-now back-compat minimalism.

I'm not a huge fan of calling these things Cursors though, since the "real" Cursor design we've been working on has radically different semantics. To conflate the two seems unhelpful. Unless you want Cursors to have a different name?

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro The difference doesn't seem so radical to me. But actually, the only place the name "view" is used is the view method, which returns an Entry. What about calling that method entry?

Member

aturon commented Sep 15, 2014

@Gankro The difference doesn't seem so radical to me. But actually, the only place the name "view" is used is the view method, which returns an Entry. What about calling that method entry?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

@aturon that would be a bit like renaming iter to entries, no?

Contributor

Gankro commented Sep 15, 2014

@aturon that would be a bit like renaming iter to entries, no?

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro I'm actually planning to propose that all iterator types be renamed to match their corresponding methods (so Entries would become Iter), to drastically simplify the conventions. So, yes :-)

Member

aturon commented Sep 15, 2014

@Gankro I'm actually planning to propose that all iterator types be renamed to match their corresponding methods (so Entries would become Iter), to drastically simplify the conventions. So, yes :-)

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro less glibly, I actually think that some_map.entry("some key") is more immediately suggestive than view.

Member

aturon commented Sep 15, 2014

@Gankro less glibly, I actually think that some_map.entry("some key") is more immediately suggestive than view.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

Alright, works for me. I'm digging back into this for the first time in a while, and I just found that in my pczarn design I tossed in an impl for insert_or_update_with on the enum itself as a proof-of-concept. Presumably we want to scrap that?

Contributor

Gankro commented Sep 15, 2014

Alright, works for me. I'm digging back into this for the first time in a while, and I just found that in my pczarn design I tossed in an impl for insert_or_update_with on the enum itself as a proof-of-concept. Presumably we want to scrap that?

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@Gankro Yes please.

Member

aturon commented Sep 15, 2014

@Gankro Yes please.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 15, 2014

I think we need to more carefully consider how this might interact with other ideas for interacting with data structures - the lack of an ability to move from a view in something like a binary tree or linked-list is a huge drawback as is their lack of applicability to persistent collections.

This and a hypothetical cursor-like structure or trait are connected and we should probably consider them together.

I've been considering something like this (doesn't compile because of recursive bounds) that might address some of these issues:

//! Zippers for walking and editing data structures.

pub trait Zipper<Direction> {
    /// Moves the Zipper to a new location. Should return false if the move
    /// is impossible.
    fn go(&mut self, Direction) -> bool;
}

pub trait Accessor<Data, Direction>: Zipper<Direction> {
    /// Get an immutable reference to the data at this point.
    fn focus(&self) -> &Data;
}

pub trait AccessorMut<Data, Direction>: Accessor<Data, Direction> {
    /// Get a mutable reference to the data at this point.
    fn focus_mut(&mut self) -> &mut Data;
}

pub trait Editor<Data, Direction, Cntx>: Zipper<Direction>
where Cntx: Context<Data, Direction, Self> {
    /// Try to remove the data at this point, possibly creating a hole.
    fn remove(self, Direction) -> Result<(Data, Cntx), Self>;

    /// Move the data at this point in the specified direction, creating a hole.
    fn shove(self, Direction) -> Cntx;
}

pub trait Context<Data, Direction, Edtr>: Zipper<Direction>
where Edtr: Editor<Data, Direction, Self> {
    /// Insert data in the hole in this context, closing it.
    fn insert(self, Data) -> Edtr;

    /// Move the hole in the specified direction, shifting the rest
    /// of the data and possibly plugging the hole.
    fn plug(self, Direction) -> Result<Edtr, Self>;
}

/// A data structure which can be edited using Context's and Editor's.
pub trait Editable<Data, Direction, Edtr, Cntx>
where Edtr: Editor<Data, Direction, Cntx>,
      Cntx: Context<Data, Direction, Edtr> {
    /// Edit this data structure through an Editor
    fn deconstruct(self) -> Edtr;

    /// Reconstruct this data structure from an Editor
    fn reconstruct(Edtr) -> Self;
}

I've also pushed this to a repo here: https://github.com/reem/rust-zipper

reem commented Sep 15, 2014

I think we need to more carefully consider how this might interact with other ideas for interacting with data structures - the lack of an ability to move from a view in something like a binary tree or linked-list is a huge drawback as is their lack of applicability to persistent collections.

This and a hypothetical cursor-like structure or trait are connected and we should probably consider them together.

I've been considering something like this (doesn't compile because of recursive bounds) that might address some of these issues:

//! Zippers for walking and editing data structures.

pub trait Zipper<Direction> {
    /// Moves the Zipper to a new location. Should return false if the move
    /// is impossible.
    fn go(&mut self, Direction) -> bool;
}

pub trait Accessor<Data, Direction>: Zipper<Direction> {
    /// Get an immutable reference to the data at this point.
    fn focus(&self) -> &Data;
}

pub trait AccessorMut<Data, Direction>: Accessor<Data, Direction> {
    /// Get a mutable reference to the data at this point.
    fn focus_mut(&mut self) -> &mut Data;
}

pub trait Editor<Data, Direction, Cntx>: Zipper<Direction>
where Cntx: Context<Data, Direction, Self> {
    /// Try to remove the data at this point, possibly creating a hole.
    fn remove(self, Direction) -> Result<(Data, Cntx), Self>;

    /// Move the data at this point in the specified direction, creating a hole.
    fn shove(self, Direction) -> Cntx;
}

pub trait Context<Data, Direction, Edtr>: Zipper<Direction>
where Edtr: Editor<Data, Direction, Self> {
    /// Insert data in the hole in this context, closing it.
    fn insert(self, Data) -> Edtr;

    /// Move the hole in the specified direction, shifting the rest
    /// of the data and possibly plugging the hole.
    fn plug(self, Direction) -> Result<Edtr, Self>;
}

/// A data structure which can be edited using Context's and Editor's.
pub trait Editable<Data, Direction, Edtr, Cntx>
where Edtr: Editor<Data, Direction, Cntx>,
      Cntx: Context<Data, Direction, Edtr> {
    /// Edit this data structure through an Editor
    fn deconstruct(self) -> Edtr;

    /// Reconstruct this data structure from an Editor
    fn reconstruct(Edtr) -> Self;
}

I've also pushed this to a repo here: https://github.com/reem/rust-zipper

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 15, 2014

I'm not sure if remove/shove/plug/insert have exactly the correct API, but I think this is an interesting direction to explore as it works for both persistent collections and non-persistent collections.

reem commented Sep 15, 2014

I'm not sure if remove/shove/plug/insert have exactly the correct API, but I think this is an interesting direction to explore as it works for both persistent collections and non-persistent collections.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

the lack of an ability to move from a view in something like a binary tree or linked-list is a huge drawback...

Can you elaborate on the inability to move? The API here provides a take method that allows moving out the value of the current entry (and hence removing it).

Member

aturon commented Sep 15, 2014

the lack of an ability to move from a view in something like a binary tree or linked-list is a huge drawback...

Can you elaborate on the inability to move? The API here provides a take method that allows moving out the value of the current entry (and hence removing it).

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 15, 2014

Oh, that's a typo. I meant move a view i.e. go to the next node or go to the child, or for something like a HashMap even move to another key/value pair.

My main gripe is that this won't work at all with persistent collections, which will have to invent their own interface rather than us providing a single unified interface for efficiently editing collections.

reem commented Sep 15, 2014

Oh, that's a typo. I meant move a view i.e. go to the next node or go to the child, or for something like a HashMap even move to another key/value pair.

My main gripe is that this won't work at all with persistent collections, which will have to invent their own interface rather than us providing a single unified interface for efficiently editing collections.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

@reem one reason for this design is simply efficiency. If the cursor/entry/zipper is still good after an insertion/deletion, that means we're maintaining a buffer of actions, or actively tracking where the head should go after each operation. Since this interface is meant as an optimization and replacement for insert_or_update_with, doing lots of extra work like that is a problem.

Contributor

Gankro commented Sep 15, 2014

@reem one reason for this design is simply efficiency. If the cursor/entry/zipper is still good after an insertion/deletion, that means we're maintaining a buffer of actions, or actively tracking where the head should go after each operation. Since this interface is meant as an optimization and replacement for insert_or_update_with, doing lots of extra work like that is a problem.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

That said: that's a super interesting design! However I'm not sure we should be thinking too hard about persistent structures since those aren't a thing yet.

Contributor

Gankro commented Sep 15, 2014

That said: that's a super interesting design! However I'm not sure we should be thinking too hard about persistent structures since those aren't a thing yet.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@reem This is a pretty interesting design, but one worry is that deconstruct takes self by value. How do you envision that working with a mutable data structure, which you may have only mutably borrowed? Would you implement the trait on &mut T?

Member

aturon commented Sep 15, 2014

@reem This is a pretty interesting design, but one worry is that deconstruct takes self by value. How do you envision that working with a mutable data structure, which you may have only mutably borrowed? Would you implement the trait on &mut T?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 15, 2014

Contributor

Oh, left out something: the reason we can't move is because we're using the provided value as a guarantor of the entry. Without the guarantor we'd have to validate the input, which would be doing even more work (especially since hashing can be very expensive!).

Contributor

Gankro commented Sep 15, 2014

Oh, left out something: the reason we can't move is because we're using the provided value as a guarantor of the entry. Without the guarantor we'd have to validate the input, which would be doing even more work (especially since hashing can be very expensive!).

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 15, 2014

Since the cursor/editor can own the underlying data structure, a "work-queue" like that is only needed for persistent structures - a mutable data structure can just me mutated by the insert/remove methods directly and then unwrapped in reconstruct.

@aturon I imagined that it would probably be implemented for T, &T, and/or &mut T for mutable data structures and &T, Arc<T> or Rc<T> for Persistent structures.

Since Direction would be an input parameter, this also allows multiple different zipper types for the same data structures.

@Gankro I'm not clear on how a guarantor prevents you from having to hash the key and such when looking in a HashMap.

Notably, this still provides two interfaces (Accessor and Context/Editor) which I think can be reconciled.

For HashMap, using Accessor would look like this could look like:

let key_editor = hashmap.deconstruct();
key_editor.go("some_key");
if let Some(val) = key_editor.focus_mut() {
    *val = "whatever";
}
let hashmap = key_editor.reconstruct();

and using Context/Editor would look like this:

let hashmap = match hashmap.deconstruct().remove("some_key") {
  Ok(_, ctx) => ctx.insert("whatever");
  Err(edtr) => edtr;
}.reconstruct();

reem commented Sep 15, 2014

Since the cursor/editor can own the underlying data structure, a "work-queue" like that is only needed for persistent structures - a mutable data structure can just me mutated by the insert/remove methods directly and then unwrapped in reconstruct.

@aturon I imagined that it would probably be implemented for T, &T, and/or &mut T for mutable data structures and &T, Arc<T> or Rc<T> for Persistent structures.

Since Direction would be an input parameter, this also allows multiple different zipper types for the same data structures.

@Gankro I'm not clear on how a guarantor prevents you from having to hash the key and such when looking in a HashMap.

Notably, this still provides two interfaces (Accessor and Context/Editor) which I think can be reconciled.

For HashMap, using Accessor would look like this could look like:

let key_editor = hashmap.deconstruct();
key_editor.go("some_key");
if let Some(val) = key_editor.focus_mut() {
    *val = "whatever";
}
let hashmap = key_editor.reconstruct();

and using Context/Editor would look like this:

let hashmap = match hashmap.deconstruct().remove("some_key") {
  Ok(_, ctx) => ctx.insert("whatever");
  Err(edtr) => edtr;
}.reconstruct();
@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 15, 2014

Member

@reem OK, that sounds plausible.

My other main concerns are:

  1. This proposal is substantially more general -- and therefore more complex -- than the API @Gankro put forward. I worry that, in making the API consistent across mutable and persistent collections, we'll muddy the waters. How important is it to be able to program generically against a collection that you're not sure is mutable?

    On the other hand, it seems plausible that you could provide impls of the various traits you're proposing on top of "views". Do you see any fundamental problems with "future proofing" in that style?

  2. It seems plausible to provide a variant of "views" that can move around in a binary tree or linked list.

  3. In general, I agree with @Gankro's worry that we don't have a lot of experience with persistent data structures in Rust. My feeling is that the design proposed in this RFC (in particular here) is a clear improvement over today's soup of find_ and insert_ variants, mainly because it is simpler. I'm hesistent to pursue a more general (and therefore complex) design without a solid set of persistent data structures that call for it (and that people are using).

In short, I'm not strongly convinced that there's a forwards-compatibility hazard here. And even if we do eventually want a set of persistent/mutable collection traits with a uniform API for this zipper-like structures, I still think it's worth presenting a specialized, simpler API like the one this RFC proposes.

Member

aturon commented Sep 15, 2014

@reem OK, that sounds plausible.

My other main concerns are:

  1. This proposal is substantially more general -- and therefore more complex -- than the API @Gankro put forward. I worry that, in making the API consistent across mutable and persistent collections, we'll muddy the waters. How important is it to be able to program generically against a collection that you're not sure is mutable?

    On the other hand, it seems plausible that you could provide impls of the various traits you're proposing on top of "views". Do you see any fundamental problems with "future proofing" in that style?

  2. It seems plausible to provide a variant of "views" that can move around in a binary tree or linked list.

  3. In general, I agree with @Gankro's worry that we don't have a lot of experience with persistent data structures in Rust. My feeling is that the design proposed in this RFC (in particular here) is a clear improvement over today's soup of find_ and insert_ variants, mainly because it is simpler. I'm hesistent to pursue a more general (and therefore complex) design without a solid set of persistent data structures that call for it (and that people are using).

In short, I'm not strongly convinced that there's a forwards-compatibility hazard here. And even if we do eventually want a set of persistent/mutable collection traits with a uniform API for this zipper-like structures, I still think it's worth presenting a specialized, simpler API like the one this RFC proposes.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 15, 2014

I think the major worry is that specific designs like this won't be rolled into a future, HKT + Collection trait driven world where you get to write highly generic code instead of highly specific code and we will end up with two different ways to do the same thing. As the language approaches 1.0 and people will expect more stability from libraries, this is an extremely important thing to consider.

If we had, as part of this RFC, a "plan forward" for integrating with more general approaches, I think it would be fine to have these specific things. However, moving methods and such is a backwards compatibility hazard, so we have to be really really careful about how we organize this.

Just as a proof of concept, this allows you to write a generic replace, like so:

// Yes more parameters, but shortened for explanatory purposes
fn replace<T: Editable>(collection: T, dir: Direction, new: Data) -> T {
    match collection.deconstruct().remove(dir) {
      Ok(_, ctx) => ctx.insert(new),
      Err(edtr) => edtr
    }.reconstruct()
}

and this works for mutable data structures like this:

let hashmap = replace(hashmap, "key", "value");

or like this:

replace(&mut hashmap, "key", "value");

and for persistent structures like this:

let tree = replace(tree, "key", "value"); // where tree: Arc<BSTMap>

Methods like these could even go on the Editable or Editor traits like for Iterator to provide a really flexible but highly generic API for interacting with collections.

reem commented Sep 15, 2014

I think the major worry is that specific designs like this won't be rolled into a future, HKT + Collection trait driven world where you get to write highly generic code instead of highly specific code and we will end up with two different ways to do the same thing. As the language approaches 1.0 and people will expect more stability from libraries, this is an extremely important thing to consider.

If we had, as part of this RFC, a "plan forward" for integrating with more general approaches, I think it would be fine to have these specific things. However, moving methods and such is a backwards compatibility hazard, so we have to be really really careful about how we organize this.

Just as a proof of concept, this allows you to write a generic replace, like so:

// Yes more parameters, but shortened for explanatory purposes
fn replace<T: Editable>(collection: T, dir: Direction, new: Data) -> T {
    match collection.deconstruct().remove(dir) {
      Ok(_, ctx) => ctx.insert(new),
      Err(edtr) => edtr
    }.reconstruct()
}

and this works for mutable data structures like this:

let hashmap = replace(hashmap, "key", "value");

or like this:

replace(&mut hashmap, "key", "value");

and for persistent structures like this:

let tree = replace(tree, "key", "value"); // where tree: Arc<BSTMap>

Methods like these could even go on the Editable or Editor traits like for Iterator to provide a really flexible but highly generic API for interacting with collections.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 16, 2014

Contributor

I've rewritten the RFC to be based on the enum design. I've also refactored the wording to be more "RFCish" based on @pczarn's comments.

Contributor

Gankro commented Sep 16, 2014

I've rewritten the RFC to be based on the enum design. I've also refactored the wording to be more "RFCish" based on @pczarn's comments.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 16, 2014

Member

@reem

I think the major worry is that specific designs like this won't be rolled into a future, HKT + Collection trait driven world where you get to write highly generic code instead of highly specific code and we will end up with two different ways to do the same thing. As the language approaches 1.0 and people will expect more stability from libraries, this is an extremely important thing to consider.

If we had, as part of this RFC, a "plan forward" for integrating with more general approaches, I think it would be fine to have these specific things. However, moving methods and such is a backwards compatibility hazard, so we have to be really really careful about how we organize this.

I'm sympathetic to these concerns, and much of the Collections Reform RFC is geared toward this kind of conservative API design. But as with so many things, there's a balance to be struck.

In particular, I feel strongly that generic APIs should not come at the cost of clear and simple concrete APIs. In this particular example, the zipper-style interface makes many more distinctions than the Entry interface -- distinctions that make sense for a persistent data structure, but that don't exist for a mutable one. That's the price of generality.

Put another way, a newcomer seeing the Entry interface on a map will be able to very quickly make sense of it, just looking at the names and type signatures, because it's perfectly tailored to maps. But someone encountering a zipper-style interface for the first time, having never heard of the concept before, will likely have a much harder time understanding what's going on.

Given that maps are ubiquitous, and (I suspect) most programming against maps will be against a concrete version like HashMap, providing tailored APIs seems prudent.

On the other hand, having multiple ways to do something -- one concrete, tailored API, and one generic one through traits -- is not such a bad thing. We frequently offer convenience methods that are "redundant" in this sense, but aid ergonomics, performance, or understanding. If or when we add zippers, having them sit along side Entry as a more generic concept doesn't appear to be a great loss. This is, in fact, one of the benefits of the trait system: you can implement new generic interfaces after the fact.

All that said, while I'm hoping to stabilize much of the collections API as part of collections reform, I don't think we need to stabilize this Entry concept for 1.0, since it's a relatively uncommon case.

Member

aturon commented Sep 16, 2014

@reem

I think the major worry is that specific designs like this won't be rolled into a future, HKT + Collection trait driven world where you get to write highly generic code instead of highly specific code and we will end up with two different ways to do the same thing. As the language approaches 1.0 and people will expect more stability from libraries, this is an extremely important thing to consider.

If we had, as part of this RFC, a "plan forward" for integrating with more general approaches, I think it would be fine to have these specific things. However, moving methods and such is a backwards compatibility hazard, so we have to be really really careful about how we organize this.

I'm sympathetic to these concerns, and much of the Collections Reform RFC is geared toward this kind of conservative API design. But as with so many things, there's a balance to be struck.

In particular, I feel strongly that generic APIs should not come at the cost of clear and simple concrete APIs. In this particular example, the zipper-style interface makes many more distinctions than the Entry interface -- distinctions that make sense for a persistent data structure, but that don't exist for a mutable one. That's the price of generality.

Put another way, a newcomer seeing the Entry interface on a map will be able to very quickly make sense of it, just looking at the names and type signatures, because it's perfectly tailored to maps. But someone encountering a zipper-style interface for the first time, having never heard of the concept before, will likely have a much harder time understanding what's going on.

Given that maps are ubiquitous, and (I suspect) most programming against maps will be against a concrete version like HashMap, providing tailored APIs seems prudent.

On the other hand, having multiple ways to do something -- one concrete, tailored API, and one generic one through traits -- is not such a bad thing. We frequently offer convenience methods that are "redundant" in this sense, but aid ergonomics, performance, or understanding. If or when we add zippers, having them sit along side Entry as a more generic concept doesn't appear to be a great loss. This is, in fact, one of the benefits of the trait system: you can implement new generic interfaces after the fact.

All that said, while I'm hoping to stabilize much of the collections API as part of collections reform, I don't think we need to stabilize this Entry concept for 1.0, since it's a relatively uncommon case.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 16, 2014

I agree that the Entry interface is much simpler and is probably the way to go - for now. For clarity, I think that the Zipper interface would be used behind the curtains of more generic helpers - the same way most new users will never call next or create an Iterator, most users will never call go, shove, etc. they will instead call higher level helpers that generically combine the low level pieces into more interesting functions or methods.

reem commented Sep 16, 2014

I agree that the Entry interface is much simpler and is probably the way to go - for now. For clarity, I think that the Zipper interface would be used behind the curtains of more generic helpers - the same way most new users will never call next or create an Iterator, most users will never call go, shove, etc. they will instead call higher level helpers that generically combine the low level pieces into more interesting functions or methods.

@aturon aturon referenced this pull request in rust-lang/rust Sep 16, 2014

Closed

Implement "entry" API for maps #17320

@aturon aturon merged commit b175db9 into rust-lang:master Sep 16, 2014

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 16, 2014

Member

This RFC was discussed during a weekly meeting and accepted as-is.

Member

aturon commented Sep 16, 2014

This RFC was discussed during a weekly meeting and accepted as-is.

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Sep 16, 2014

@aturon link is broken

reem commented Sep 16, 2014

@aturon link is broken

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 16, 2014

Member

@reem Sorry, the minutes haven't been posted yet, but you can see them here for now: https://etherpad.mozilla.org/Rust-meeting-weekly

Member

aturon commented Sep 16, 2014

@reem Sorry, the minutes haven't been posted yet, but you can see them here for now: https://etherpad.mozilla.org/Rust-meeting-weekly

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 16, 2014

Contributor

\o/

Contributor

Gankro commented Sep 16, 2014

\o/

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 16, 2014

Member

Tracking issue here. @Gankro plans to implement.

Member

aturon commented Sep 16, 2014

Tracking issue here. @Gankro plans to implement.

@sfackler

This comment has been minimized.

Show comment
Hide comment
@sfackler

sfackler Sep 17, 2014

Member

I haven't been following this RFC so this may have been covered already, but what exactly are the costs involved in keeping OccupiedEntry valid after calling set?

Member

sfackler commented Sep 17, 2014

I haven't been following this RFC so this may have been covered already, but what exactly are the costs involved in keeping OccupiedEntry valid after calling set?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 17, 2014

Contributor

@sfackler For OccupiedEntry it should be free, since it's a swap, and you obviously have another Key afterwards. Destroying it in that case is partially legacy from the old more complicated design, and partially symmetry. I'm amenable to changing it, though I don't think it would be very valuable.

Actually, upon reflection, "set" is really a subset of the get_mut behaviour, modulo the Key getting swapped (which is likely uninteresting).

Contributor

Gankro commented Sep 17, 2014

@sfackler For OccupiedEntry it should be free, since it's a swap, and you obviously have another Key afterwards. Destroying it in that case is partially legacy from the old more complicated design, and partially symmetry. I'm amenable to changing it, though I don't think it would be very valuable.

Actually, upon reflection, "set" is really a subset of the get_mut behaviour, modulo the Key getting swapped (which is likely uninteresting).

@sfackler

This comment has been minimized.

Show comment
Hide comment
@sfackler

sfackler Sep 17, 2014

Member

let v = entry.set(v2) is basically equivalent to let v = mem::replace(entry.get_mut(), v2). I think it's still worth keeping set around since that's a bit of a mouthfull, but it'd be nice if set didn't consume entry, if only for consistency.

Member

sfackler commented Sep 17, 2014

let v = entry.set(v2) is basically equivalent to let v = mem::replace(entry.get_mut(), v2). I think it's still worth keeping set around since that's a bit of a mouthfull, but it'd be nice if set didn't consume entry, if only for consistency.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 17, 2014

Contributor

@sfackler I'm currently working on migrating all the code in Rust to use the new Entry API, so I'll get back to you once I have a better view of how this stuff is used. So far, it seems like set on an occupied entry isn't common. Everyone is basically using this stuff as an accumulator, in which case you just want get_mut.

The nasty case seems to be when they're just using this to guarantee there's something there (e.g. a collection), and then doing complex logic on it.

librustc/metadata/loader.rs in particular had this block:

        let slot = candidates.find_or_insert_with(hash.to_string(), |_| {
            (HashSet::new(), HashSet::new())
        });
        let (ref mut rlibs, ref mut dylibs) = *slot;
        if rlib {
            rlibs.insert(fs::realpath(path).unwrap());
        } else {
            dylibs.insert(fs::realpath(path).unwrap());
        }

which I could only port to

        let realpath = fs::realpath(path).unwrap();
        match candidates.entry(hash.to_string()) {
            Occupied(entry) => {
                let (ref mut rlibs, ref mut dylibs) = *entry.get_mut();
                if rlib {
                    rlibs.insert(realpath);
                } else {
                    dylibs.insert(realpath);
                }
            },
            Vacant(entry) => {
                let (rlibs, dylibs) = (HashSet::new(), HashSet::new());
                if rlib {
                    rlibs.insert(realpath);
                } else {
                    dylibs.insert(realpath);
                }
                entry.set((rlibs, dylibs))
            }
        }

Edit: for the most part though, I've had general code quality improvements, in my subjective opinion!

Contributor

Gankro commented Sep 17, 2014

@sfackler I'm currently working on migrating all the code in Rust to use the new Entry API, so I'll get back to you once I have a better view of how this stuff is used. So far, it seems like set on an occupied entry isn't common. Everyone is basically using this stuff as an accumulator, in which case you just want get_mut.

The nasty case seems to be when they're just using this to guarantee there's something there (e.g. a collection), and then doing complex logic on it.

librustc/metadata/loader.rs in particular had this block:

        let slot = candidates.find_or_insert_with(hash.to_string(), |_| {
            (HashSet::new(), HashSet::new())
        });
        let (ref mut rlibs, ref mut dylibs) = *slot;
        if rlib {
            rlibs.insert(fs::realpath(path).unwrap());
        } else {
            dylibs.insert(fs::realpath(path).unwrap());
        }

which I could only port to

        let realpath = fs::realpath(path).unwrap();
        match candidates.entry(hash.to_string()) {
            Occupied(entry) => {
                let (ref mut rlibs, ref mut dylibs) = *entry.get_mut();
                if rlib {
                    rlibs.insert(realpath);
                } else {
                    dylibs.insert(realpath);
                }
            },
            Vacant(entry) => {
                let (rlibs, dylibs) = (HashSet::new(), HashSet::new());
                if rlib {
                    rlibs.insert(realpath);
                } else {
                    dylibs.insert(realpath);
                }
                entry.set((rlibs, dylibs))
            }
        }

Edit: for the most part though, I've had general code quality improvements, in my subjective opinion!

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 17, 2014

Contributor

It's looking like I could probably eliminate all the troubles here by making VacantEntry.set yield a mutable reference to the inserted values. I know this can be done efficiently with a bit of unsafe code on HashMap (grab a raw ptr to where you're going to insert it), which is the primary use case. BTreeMap will struggle because you can't know where it will be inserted memory-wise until part-way through the operation. I can refactor a bit to expose that information as part of the internal insertion method, though. TreeMap I continue to avoid vehemently, but it shouldn't be too bad since once you've made the node for the element, you know where in memory to find the values.

@aturon thoughts?

Contributor

Gankro commented Sep 17, 2014

It's looking like I could probably eliminate all the troubles here by making VacantEntry.set yield a mutable reference to the inserted values. I know this can be done efficiently with a bit of unsafe code on HashMap (grab a raw ptr to where you're going to insert it), which is the primary use case. BTreeMap will struggle because you can't know where it will be inserted memory-wise until part-way through the operation. I can refactor a bit to expose that information as part of the internal insertion method, though. TreeMap I continue to avoid vehemently, but it shouldn't be too bad since once you've made the node for the element, you know where in memory to find the values.

@aturon thoughts?

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Sep 17, 2014

Member

@Gankro That sounds reasonable, but I wonder if we could/should go a step further:

impl<'a, K, V> VacantEntry<'a, K, V> {
    /// Set the value stored in this Entry
    pub fn set(self, value: V) -> OccupiedEntry<'a, K, V>;
}

That should make it very easy to deal with the kind of example you gave above, and it gives you the full suite of OccupiedEntry methods.

While this makes the signature slightly more complex, I think it's intuitive and of course you can always ignore the result.

You could also imagine doing something similar on the other side:

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Take the value stored in this Entry
    pub fn take(self) -> (V, VacantEntry<'a, K, V>);
}

though I don't think that change is very well-motivated, and it makes it (slightly) harder to use take for its main purpose: getting the value out.

Member

aturon commented Sep 17, 2014

@Gankro That sounds reasonable, but I wonder if we could/should go a step further:

impl<'a, K, V> VacantEntry<'a, K, V> {
    /// Set the value stored in this Entry
    pub fn set(self, value: V) -> OccupiedEntry<'a, K, V>;
}

That should make it very easy to deal with the kind of example you gave above, and it gives you the full suite of OccupiedEntry methods.

While this makes the signature slightly more complex, I think it's intuitive and of course you can always ignore the result.

You could also imagine doing something similar on the other side:

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Take the value stored in this Entry
    pub fn take(self) -> (V, VacantEntry<'a, K, V>);
}

though I don't think that change is very well-motivated, and it makes it (slightly) harder to use take for its main purpose: getting the value out.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 17, 2014

Contributor

@aturon Getting the reference is cheap and easy because we generally know exactly where it will be very soon in the operation, and you only need the address to get the ref. Constructing a full Occupied/VacantEntry afterwards would be much more difficult, and probably the only reasonable way would be to just construct it from scratch, which the user may as well do themselves.

For a hashmap you could definitely do it by just remembering the index and/or cloning the hash, but for the tree-based maps, you generally need a full search path to perform an insertion or deletion correctly. And after an insertion or deletion the search path will in general be quite different.

Contributor

Gankro commented Sep 17, 2014

@aturon Getting the reference is cheap and easy because we generally know exactly where it will be very soon in the operation, and you only need the address to get the ref. Constructing a full Occupied/VacantEntry afterwards would be much more difficult, and probably the only reasonable way would be to just construct it from scratch, which the user may as well do themselves.

For a hashmap you could definitely do it by just remembering the index and/or cloning the hash, but for the tree-based maps, you generally need a full search path to perform an insertion or deletion correctly. And after an insertion or deletion the search path will in general be quite different.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 17, 2014

Contributor

@sfackler After poking at your idea, I did run into one small problem in hashmap. Its internal API wants the items by-value to do the swap, but you can't take the values out by-value if the Entry's &mut. Of course, that can be bypassed with a bit of unsafe code, but it's still a bummer.

Contributor

Gankro commented Sep 17, 2014

@sfackler After poking at your idea, I did run into one small problem in hashmap. Its internal API wants the items by-value to do the swap, but you can't take the values out by-value if the Entry's &mut. Of course, that can be bypassed with a bit of unsafe code, but it's still a bummer.

@sfackler

This comment has been minimized.

Show comment
Hide comment
@sfackler

sfackler Sep 17, 2014

Member

Oh well. It seems reasonable to implement it as accepted and see if this actually ends up being a pain point. We'll have time later to tweak the API before it stabilizes.

Member

sfackler commented Sep 17, 2014

Oh well. It seems reasonable to implement it as accepted and see if this actually ends up being a pain point. We'll have time later to tweak the API before it stabilizes.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 18, 2014

Contributor

Ahh, much better.

        let slot = match candidates.entry(hash.to_string()) {
            Occupied(entry) => entry.get_mut(),
            Vacant(entry) => entry.set((HashSet::new(), HashSet::new())),
        };
        let (ref mut rlibs, ref mut dylibs) = *slot;
        if rlib {
            rlibs.insert(fs::realpath(path).unwrap());
        } else {
            dylibs.insert(fs::realpath(path).unwrap());
        }
Contributor

Gankro commented Sep 18, 2014

Ahh, much better.

        let slot = match candidates.entry(hash.to_string()) {
            Occupied(entry) => entry.get_mut(),
            Vacant(entry) => entry.set((HashSet::new(), HashSet::new())),
        };
        let (ref mut rlibs, ref mut dylibs) = *slot;
        if rlib {
            rlibs.insert(fs::realpath(path).unwrap());
        } else {
            dylibs.insert(fs::realpath(path).unwrap());
        }
@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 18, 2014

Contributor

Obvious oversight: get_mut has to borrow the entry, but that means get_mut can't outlive the entry, which is necessary for this pattern.

Need:

    /// Convert the OccupiedEntry into a mutable reference to the value in the entry
    /// with a lifetime bound to the map itself
    pub fn into_mut(self) -> &'a mut V;

Easy to provide.

Contributor

Gankro commented Sep 18, 2014

Obvious oversight: get_mut has to borrow the entry, but that means get_mut can't outlive the entry, which is necessary for this pattern.

Need:

    /// Convert the OccupiedEntry into a mutable reference to the value in the entry
    /// with a lifetime bound to the map itself
    pub fn into_mut(self) -> &'a mut V;

Easy to provide.

@gereeter gereeter referenced this pull request in rust-lang/rust Sep 28, 2014

Closed

add find_or_insert method to the Map trait #5568

@michaelsproul

This comment has been minimized.

Show comment
Hide comment
@michaelsproul

michaelsproul Oct 8, 2014

I am loving this new API. I just used it to drastically improve (and speed-up) my Trie's remove method.

It went from this to this.

There's some other junk that I cleaned up, but the main improvement ccomes from being able to find hashmap entries and remove them later without re-finding (I used OccupiedEntry::take).

I am loving this new API. I just used it to drastically improve (and speed-up) my Trie's remove method.

It went from this to this.

There's some other junk that I cleaned up, but the main improvement ccomes from being able to find hashmap entries and remove them later without re-finding (I used OccupiedEntry::take).

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Oct 8, 2014

Contributor

@michaelsproul Awesome! I didn't think anyone would actually have a use for take, glad to see I was wrong! 😍

Since you seem to be a Trie wizard, would you be interested in implementing this API on our TrieMap? I'm a bit too swamped with school work and writing RFCs to tackle this on all of our maps myself atm. 😢

Contributor

Gankro commented Oct 8, 2014

@michaelsproul Awesome! I didn't think anyone would actually have a use for take, glad to see I was wrong! 😍

Since you seem to be a Trie wizard, would you be interested in implementing this API on our TrieMap? I'm a bit too swamped with school work and writing RFCs to tackle this on all of our maps myself atm. 😢

@michaelsproul

This comment has been minimized.

Show comment
Hide comment
@michaelsproul

michaelsproul Oct 8, 2014

@Gankro: Oooh, I'd love to. I've got a bit of uni work at the moment too, but I'll give it a shot.

@Gankro: Oooh, I'd love to. I've got a bit of uni work at the moment too, but I'll give it a shot.

@Gankro Gankro referenced this pull request in rust-lang/rust Jan 16, 2015

Closed

HashSet should have a get function #21234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment