Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Add item recovery collection APIs #1194
Conversation
Gankro
self-assigned this
Jul 8, 2015
Gankro
added
the
T-libs
label
Jul 8, 2015
This comment has been minimized.
This comment has been minimized.
|
CC @cmr @eddyb @bluss @seppo0010 (can't remember all the people interested...) |
pnkfelix
reviewed
Jul 9, 2015
|
|
||
| # Alternatives | ||
|
|
||
| Do nothing. |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Jul 9, 2015
Member
hold on, to be clear: "Do nothing" here means "Do nothing and let users write such caches via, e.g., HashMap<T, ()> " ... right?
I don't particular mind adding the functionality described here to HashSet, but I'm also not sure its strictly necessary, unless I have missed something with how HashMap<T, ()> would work.
Update: Ah, re-reading the RFC, I now see that our current HashMap API would not support that
This comment has been minimized.
This comment has been minimized.
apasel422
Jul 9, 2015
Author
Member
It's not possible to use HashMap that way, because it doesn't provide any methods that return &K (or K) other than via its iterators.
Gankro
referenced this pull request
Jul 11, 2015
Open
Add hash_map::Entry.or_insert_with_key() method #1202
This comment has been minimized.
This comment has been minimized.
|
It'd be cool if you could add another copy of the code that demonstrates the problem but showing how to use your proposed APIs to resolve it. |
This comment has been minimized.
This comment has been minimized.
|
@blaenk I'd actually like to use a more concrete motivating example, but I'll add the revised code as well. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro Do you have any ideas for a better motivating example, like an algorithm that uses a set as a cache? |
This comment has been minimized.
This comment has been minimized.
|
@apasel422 that's the usecase in the compiler: sets of hundreds of thousands of elements, used for interning/caching, that have to be identity maps right now, wasting some memory space. |
This comment has been minimized.
This comment has been minimized.
|
@eddyb Does that mean, if this RFC is implemented, the compiler can be tweaked to use less memory? |
This comment has been minimized.
This comment has been minimized.
|
I've added a WIP implementation of this RFC for |
This comment has been minimized.
This comment has been minimized.
|
@apasel422 It could be argued by metaphor to the current naming in #1195 that these methods could just called |
This comment has been minimized.
This comment has been minimized.
|
This is a bit more dubious for HashMap; but not crazy. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro I actually had that same thought a little while ago, but I don't think it fully translates, and would be even weirder for entries: impl<'a, K, V> OccupiedEntry<'a, K, V> {
fn get(&self) -> &V;
fn get_eq(&self) -> (&K, &V); // what does eq have to do with this?
...
}
|
This comment has been minimized.
This comment has been minimized.
|
Hmm... Entry does seem to mess things up. That said... is it a tragedy if it's a bit misaligned from everything else? |
This comment has been minimized.
This comment has been minimized.
|
I think they should be consistent. Here are some options that work for both maps and occupied entries (in addition to
And here are some options for sets (assuming that the changes in rust-lang/rust#27135 canonicalize "element" over "value" when referring to sets):
|
apasel422
referenced this pull request
Jul 20, 2015
Closed
VacantEntry should provide an accessor for the key #18323
apasel422
added some commits
Jul 20, 2015
This comment has been minimized.
This comment has been minimized.
i30817
commented
Jul 23, 2015
|
One common optimization that can't be done in java because of set item recovery is to just store hashes instead of elements, for the case where identity-mapping is not desirable. Are you going to give up this special case? |
This comment has been minimized.
This comment has been minimized.
|
@i30817 Rust's |
This comment has been minimized.
This comment has been minimized.
i30817
commented
Jul 23, 2015
|
Mmm makes sense. Still, it's a somewhat common optimization, maybe a bloom filter type could be added to the language. |
This comment has been minimized.
This comment has been minimized.
|
@i30817 Off-topic, but [https://crates.io/search?q=bloom filter](https://crates.io/search?q=bloom filter) |
This comment has been minimized.
This comment has been minimized.
|
I'm definitely in favor of the ideas laid out here. I have the same motivating problem - a cache of strings. I've used some unsafe code to avoid double-allocating the strings, but I still have essentially a |
This comment has been minimized.
This comment has been minimized.
bkoropoff
commented
Jul 28, 2015
|
How would you feel about modifying |
This comment has been minimized.
This comment has been minimized.
|
@bkoropoff That could be done, but it has the problem that a new key is only present for I'm therefore more inclined to add that kind of key-recovery functionality as pub enum Entry<'a, K: 'a, V: 'a> {
Occupied(OccupiedEntry<'a, K, V>, K),
Vacant(VacantEntry<'a, K, V>),
}instead of pub struct OccupiedEntry<'a, K: 'a, V: 'a> {
new_key: K,
// ...
}
impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Returns the key that was used to acquire this entry.
// This could return `Option<K>` in order to better model the `max_entry` situation
pub fn into_new_key(self) -> K { self.key }
}
but that would not be a backwards-compatible change. Additionally, storing the new key in the struct itself has the benefit of allowing us to provide an additional impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Replaces the entry's key with the one that was used to acquire this entry, if any, and
/// returns the old key.
///
/// This method always return `None` after the first call to it and for all entries
/// acquired through `max_entry` etc.
pub fn replace_key(&mut self) -> Option<K>;
}This adds some complexity to the API surface and makes it harder to reason about what the behavior is. It's possible that we could add what you're proposing in a subsequent RFC instead. |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
Sorry to be late to this party (I also had to miss the libs meeting this week). I'm on board with the basic motivation here, and regret the stabilization of the That said, I feel like the RFC is proposing significantly more API expansion than is actually needed to solve the original problem -- in particular, I don't see why any changes to the entry API are needed. Could we instead take the following as a starting point (bikesheds painted in my favorite colors): impl<T> Set<T> {
// Like `contains`, but returns a reference to the element if the set contains it.
fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;
// Like `remove`, but returns the element if the set contained it.
fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;
// Like `insert`, but replaces the element with the given one and returns the previous element
// if the set contained it.
fn replace(&mut self, element: T) -> Option<T>;
}
impl<K, V> Map<K, V> {
// Like `get`, but additionally returns a reference to the entry's key.
fn key_value<Q: ?Sized>(&self, key: &Q) -> Option<(&K, &V)>;
// Like `get_mut`, but additionally returns a reference to the entry's key.
fn key_value_mut<Q: ?Sized>(&mut self, key: &Q) -> Option<(&K, &mut V)>;
// Like `remove`, but additionally returns the entry's key.
fn remove_key_value<Q: ?Sized>(&mut self, key: &Q) -> Option<(K, V)>;
// Like `insert`, but additionally replaces the key with the given one and returns the previous
// key and value if the map contained it.
fn replace(&mut self, key: K, value: V) -> Option<(K, V)>;
}In particular, the fact that the entry APIs need an owned key to use (today, at least) seems to make the key-accessing functionality questionable. But maybe I'm missing something? |
This comment has been minimized.
This comment has been minimized.
|
@aturon We will presumably want the entry methods once #1195 is accepted, but they could be omitted for now. I think that both RFCs need to be considered together though, and it probably makes sense to avoid a proliferation of impl<K, V> Map<K, V> {
fn get_pair(&self, key: &Q) -> Option<(&K, &V)>;
fn get_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>;
fn replace(&mut self, key: K, val: V) -> Option<(K, V)>;
fn get_max(&self) -> Option<(&K, &V)>;
fn max_entry(&mut self) -> Option<OccupiedEntry<K, V>>>;
fn get_lt(&self, key: &Q) -> Option<(&K, &V)>;
fn lt_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>>;
// get_* and *_entry for le, ge, gt, min
}
impl<'a, K, V> OccupiedEntry<'a, K, V> {
fn pair(&self) -> (&K, &V);
fn pair_mut(&mut self) -> (&K, &mut V);
fn into_pair_mut(self) -> (&'a K, &'a mut V);
fn take(self) -> (K, V);
} |
This comment has been minimized.
This comment has been minimized.
|
The libs team discussed this RFC today, and our conclusion was that it may be best to hone this down to what's precisely necessary to satisfy the motivation in the outset. To that end would it be possible to only include the set methods? Specifically: impl<T> Set<T> {
fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;
fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;
fn replace(&mut self, element: T) -> Option<T>;
} |
This comment has been minimized.
This comment has been minimized.
|
Specifically, I believe the supporting map methods were also decided to just be |
This comment has been minimized.
This comment has been minimized.
|
I'd personally prefer the methods to be freestanding in the module so they're private to the outside world but public to the crate rather than having them in the inherent API at all. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton How does that work? Privacy can only reach up, and not down or sideways. Maps and Sets are defined in sibling modules. |
This comment has been minimized.
This comment has been minimized.
|
They could be defined in a crate private trait. That's how I've gotten On Wed, Aug 12, 2015, 8:01 PM Alexis Beingessner notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
@apasel422 Can you amend the RFC to be minimal per aturon's request? I think we're good to go when that's done. |
This comment has been minimized.
This comment has been minimized.
seppo0010
commented
Aug 20, 2015
|
I don't understand the motivation to allow item recover from a I was actually expecting that feature to move items from one |
This comment has been minimized.
This comment has been minimized.
|
Same here, the use case that got me here was with a |
This comment has been minimized.
This comment has been minimized.
|
@seppo0010 The usecase I hit that needed the feature for |
This comment has been minimized.
This comment has been minimized.
|
@apasel422 ping about the RFC update, would love to merge! |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I haven't updated yet because it seems like there's still some dissent, based on the last few comments. |
This comment has been minimized.
This comment has been minimized.
|
As another voice, only having it on sets would be acceptable for me. I am in the same boat as @eddyb — a cache. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Updated. |
alexcrichton
merged commit b8648e4
into
rust-lang:master
Aug 27, 2015
This comment has been minimized.
This comment has been minimized.
|
Ok, thanks @apasel422! The consensus of the libs team is that this is a great step forward for sets and we can continue to explore the problem space for maps as the needs arise, but it seems like the most pressing parts to work with are sets today. And of course, thanks again for the RFC @apasel422! |
apasel422
deleted the
apasel422:collection-recovery
branch
Aug 28, 2015
apasel422
referenced this pull request
May 13, 2016
Closed
HashMap::extend_with() to handle collisions #33618
mbrubeck
unassigned
Gankro
Apr 17, 2017
This comment has been minimized.
This comment has been minimized.
rpjohnst
commented
Oct 3, 2017
|
I find myself needing this for maps. In my case, I am building a string interner using a The Alternatively, it would be useful for the What is the best route forward here? Should I write up a new RFC? |
fschutt
added a commit
to maps4print/polyclip
that referenced
this pull request
Dec 16, 2017
This comment has been minimized.
This comment has been minimized.
fschutt
commented
Dec 16, 2017
|
Well, I needed this for a function where I insert into a set, but then I immediately want an iterator to that last, inserted element in the set. Since a The application is a scanline algorithm, where the set consists out of ordered points. I need to insert a point into a scanline and then know where it has been inserted (the position), so that I can construct an iterator to the next and previous point in the (ordered) scanline. So for now I've forked the |
apasel422 commentedJul 8, 2015
•
edited by mbrubeck
Rendered