Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upWIP: add raw_entry API to HashMap #50821
Conversation
rust-highfive
assigned
withoutboats
May 17, 2018
This comment has been minimized.
This comment has been minimized.
|
(rust_highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
label
May 17, 2018
This comment has been minimized.
This comment has been minimized.
|
@Gankro pick a reviewer who is competent pls |
This comment has been minimized.
This comment has been minimized.
|
The actual impl is ~trivial since everything that was needed is already in hashmap for implementation reasons. So this mostly just needs API review from the @rust-lang/libs team. |
Gankro
requested a review
from
alexcrichton
May 17, 2018
This comment has been minimized.
This comment has been minimized.
|
I'm a bit worried removing the |
This comment has been minimized.
This comment has been minimized.
|
Oh. Not requiring |
This comment has been minimized.
This comment has been minimized.
rust-highfive
assigned
alexcrichton
and unassigned
withoutboats
May 21, 2018
alexcrichton
reviewed
May 25, 2018
|
Thanks and sorry for the delay @Gankro! Since these are all starting as unstable it's ok to not get a full libs-team sign-off because we'll do that before stabilization anyway. I do think that we'll want these in the long run so it seems good to add them to libstd to start experimenting with them. I'll admit though that I haven't followed the RFC too too closely so this was the first time I was looking at a number of these APIs. I was expecting something to be |
| /// | ||
| /// Immutable raw entries have very limited use; you might instead want `raw_entry`. | ||
| #[unstable(feature = "raw_entry", issue = "42069")] | ||
| pub fn raw_entry_immut(&self) -> RawImmutableEntryBuilder<K, V, S> { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 25, 2018
Member
Naming-wise could this perhaps be raw_entry and the one above be raw_entry_mut?
This comment has been minimized.
This comment has been minimized.
Gankro
May 25, 2018
Author
Contributor
Yeah I'm split on it
raw_entry/raw_entry_mut is absolutely more idiomatic, but entry matches raw_entry_mut and raw_entry_mut is the really important one.
I'm definitely being "weird" here, and am willing to relent if anyone feels strong about it.
| /// assert_eq!(map["poneyland"], 22); | ||
| /// ``` | ||
| #[unstable(feature = "raw_entry", issue = "42069")] | ||
| pub fn or_insert(self, default_key: K, default_val: V) -> (&'a mut K, &'a mut V) { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 25, 2018
Member
FWIW Entry::or_insert only returns &mut V so this is somewhat inconsistent with that, albeit more general
This comment has been minimized.
This comment has been minimized.
Gankro
May 25, 2018
Author
Contributor
that's definitely an intentional design decision, but I could be convinced to go back
| /// use std::collections::hash_map::Entry; | ||
| /// | ||
| /// let mut map: HashMap<&str, u32> = HashMap::new(); | ||
| /// map.entry("poneyland").or_insert(12); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 25, 2018
Member
A number of these doc strings I've noticed are using entry, but I think they may want to move towards raw_entry?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
TimNN
added
S-waiting-on-author
and removed
S-waiting-on-review
labels
May 29, 2018
This comment has been minimized.
This comment has been minimized.
|
Ping from triage, @Gankro ! You got some test failures and had a question from your reviewer. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro: We haven't heard from you in two weeks, so we're closing this PR for now. Feel free to re-open in the future. |
TimNN
closed this
Jun 12, 2018
This comment has been minimized.
This comment has been minimized.
|
will pick this up again in a bit, just had a productive meeting with @aturon:
|
This comment has been minimized.
This comment has been minimized.
|
@Gankro Are you still working on this? I am interested in taking over if you are too busy to continue working on it. |
fintelia
reviewed
Jul 3, 2018
| /// In particular, the hash used to initialized the raw entry must still be | ||
| /// consistent with the hash of the key that is ultimately stored in the entry. | ||
| /// This is because implementations of HashMap may need to recompute hashes | ||
| /// when resizing, at which point only the keys are available. |
This comment has been minimized.
This comment has been minimized.
fintelia
Jul 3, 2018
Contributor
It is worth pointing out that this specific case isn't actually a concern for the current implementation because it always stores the full hash. Rather the main issue is the failure case described below in which the entry is in the wrong place and thus becomes "lost".
This comment has been minimized.
This comment has been minimized.
Amanieu
Jul 3, 2018
Contributor
While this is true for the current implementation, this might change in the future and code should not rely on that.
This comment has been minimized.
This comment has been minimized.
fintelia
Jul 3, 2018
Contributor
We should make sure that this section doesn't come across as saying that bogus hashes are OK as long as the HashMap isn't resized.
Actually, reading that paragraph again, it isn't clear to me why the case described would be a problem: if the HashMap is current in an invalid state due to bogus hashes then rehashing everything would fix it, not break it further
This comment has been minimized.
This comment has been minimized.
Gankro
Jul 5, 2018
Author
Contributor
Consider a type Foo(u32, u32), who's Hash impl is just Foo.0.hash(), and you use this API to actually feed in Foo.1.hash().
If resizing triggers a rehash, then your code will work perfectly reasonably until a resize triggers, at which point everything will be moved to the location indicated by Foo.0, while you're still performing lookups with Foo.1. All existing keys will appear to "vanish" from the map.
This comment has been minimized.
This comment has been minimized.
fintelia
Jul 5, 2018
Contributor
Ah, I see what you are saying. If you exclusively use the raw entry API with a bogus hash function things will work until elements are rehashed using the real hash function. Using the normal get/insert/entry functions would be have the opposite problem of working only once the resize happened.
This comment has been minimized.
This comment has been minimized.
|
@Amanieu please do! |
This comment has been minimized.
This comment has been minimized.
|
Also here is an example of usage: https://gist.github.com/Gankro/fb0bfe6f6770aba09b9a1cdf0ecf47e0 |
fintelia
reviewed
Jul 5, 2018
| mem::replace(self.get_mut(), value) | ||
| } | ||
|
|
||
| /// Sets the value of the entry, and returns the entry's old value. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
How would people feel extending this API to provide a This addition would probably also require the |
This comment has been minimized.
This comment has been minimized.
|
Is this API available in some extern crate by any chance? :-) I'd like to see if I can implement a string interner on top of it as struct Interner {
/// Concatenation of all interned strings
data: String
/// A `HashSet<&'data str>` of interned strings,
/// but without those annoying ( :) ) lifetimes.
mapping: HashSet<(u32, u32)>
} |
This comment has been minimized.
This comment has been minimized.
|
@fintelia no that would expose too many implementation details that we're not willing to commit to. |
kennytm
added
S-inactive-closed
and removed
S-waiting-on-author
labels
Jul 10, 2018
This comment has been minimized.
This comment has been minimized.
|
@Gankro could you clarify which implementation details you'd be concerned about committing to? The very first line of the HashMap documentation already states that it is:
Unless I'm missing something, it seems like that should be sufficient to guarantee that there is at least some way to map from indices to raw entries. |
This comment has been minimized.
This comment has been minimized.
|
That sentence is documenting the current implementation, not guaranteeing that will be the implementation for all time. |
This comment has been minimized.
This comment has been minimized.
|
@Amanieu Have you had a chance to look at taking this PR and fixing the remaining problems with it? |
This comment has been minimized.
This comment has been minimized.
|
@Mark-Simulacrum Actually after talking with @Gankro I won't be working on this. Someone else can take over if they want. |
This comment has been minimized.
This comment has been minimized.
fintelia
reviewed
Jul 26, 2018
| where M: DerefMut<Target = RawTable<K, V>>, | ||
| F: FnMut(&mut K) -> bool | ||
| { | ||
| // This is the only function where capacity can be zero. To avoid |
This comment has been minimized.
This comment has been minimized.
fintelia
Jul 26, 2018
Contributor
This function and the original search_hashed functions are the only two where capacity can be zero. Both comments should be updated...
This comment has been minimized.
This comment has been minimized.
|
I'm willing to take over this PR. What is still remaining to do? @jonhoo sadly, the PR in the current form cannot support rahashmap's central feature: there is no way to get a random element from the map. Specifically, a raw_entry can't be constructed without knowing its exact hash in advance, though if RawVacantEntry were to have a next() function that might be enough... |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton @Gankro What is required for me to take over this PR? |
This comment has been minimized.
This comment has been minimized.
|
@fintelia The easiest is probably to fetch this PR’s branch, rebase/rework it as needed, open a new PR, and link the new PR from here. Thanks for volunteering! |
This comment has been minimized.
This comment has been minimized.
|
I think my comments/checklist hold accurate |
Gankro commentedMay 17, 2018
•
edited
This is an implementation of https://internals.rust-lang.org/t/pre-rfc-abandonning-morals-in-the-name-of-performance-the-raw-entry-api/7043 with some minor tweaks.
TODO:
K: Eqrequirement from raw_entry? (currently exists to satisfy internal APIs, but I don't think it's strictly needed)