Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for HashMap::raw_entry #56167

Open
sfackler opened this issue Nov 22, 2018 · 32 comments
Open

Tracking issue for HashMap::raw_entry #56167

sfackler opened this issue Nov 22, 2018 · 32 comments
Labels
A-collections B-unstable C-tracking-issue Libs-Tracked S-tracking-design-concerns T-libs-api

Comments

@sfackler
Copy link
Member

@sfackler sfackler commented Nov 22, 2018

Added in #54043.


As of 6ecad33 / 2019-01-09, this feature covers:

impl<K, V, S> HashMap<K, V, S>
    where K: Eq + Hash,
          S: BuildHasher
{
    pub fn raw_entry(&self) -> RawEntryBuilder<K, V, S> {…}
    pub fn raw_entry_mut(&mut self) -> RawEntryBuilderMut<K, V, S> {…}
}

pub struct RawEntryBuilder<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return Option<(&'a K, &'a V)>
pub struct RawEntryBuilderMut<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return RawEntryMut<'a, K, V, S>
pub enum RawEntryMut<'a, K: 'a, V: 'a, S: 'a> {
    Occupied(RawOccupiedEntryMut<'a, K, V>),
    Vacant(RawVacantEntryMut<'a, K, V, S>),
}
pub struct RawOccupiedEntryMut<'a, K: 'a, V: 'a> {…}
pub struct RawVacantEntryMut<'a, K: 'a, V: 'a, S: 'a> {…}

… as well as Debug impls for each 5 new types, and their inherent methods.

@sfackler sfackler added T-libs-api B-unstable C-tracking-issue labels Nov 22, 2018
@Amanieu
Copy link
Member

@Amanieu Amanieu commented Nov 26, 2018

What is the motivation for having separate from_hash and search_bucket methods? It seems that the only difference is whether the hash value is checked before calling is_match. However if the table does not store full hashes (i.e. hashbrown) then there is no difference between these methods.

Could we consider merging these methods into a single one? Or is there some use case where the difference in behavior is useful?

@Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 27, 2018

I am also extremely confused by this distinction, as my original designs didn't include them (I think?) and the documentation that was written is very unclear.

@Amanieu
Copy link
Member

@Amanieu Amanieu commented Nov 27, 2018

@fintelia
Copy link
Contributor

@fintelia fintelia commented Nov 27, 2018

The reason I added search_bucket was because I wanted to be able to delete a random element from a HashMap in O(1) time, without storing an extra copy of all the keys. Basically, instead of doing something like this:

let key = map.iter().nth(rand() % map.len()).0.clone();
map.remove(&key);

I wanted to just be able to pick a random "bucket" and then get an entry/raw entry to the first element in it if any:

loop {
    if let Occupied(o) = map.raw_entry_mut().search_bucket(rand(), || true) {
        o.remove();
        break;
    }
}

(the probabilities aren't uniform in the second version, but close enough for my purposes)

@Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 28, 2018

I continue to not want to support the "random deletion" usecase in std's HashMap. You really, really, really, should be using a linked hashmap or otherwise ordered map for that.

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 8, 2018
It doesn't work in hashbrown anyways (see rust-lang#56167)
@Amanieu
Copy link
Member

@Amanieu Amanieu commented Dec 9, 2018

I have removed this method in the hashbrown PR (#56241). Your code snippet for random deletion won't work with hashbrown anyways since it always checks the hash as part of the search process.

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 11, 2018
It doesn't work in hashbrown anyways (see rust-lang#56167)
@gdzx
Copy link

@gdzx gdzx commented Mar 1, 2019

I can avoid unnecessary clones inherent to the original entry API which is nice. But unless I'm mistaken, the current raw_entry API seems to hash the keys twice in this simple use case:

#![feature(hash_raw_entry)]

use std::collections::HashMap;

let mut map = HashMap::new();

map.raw_entry_mut()
   .from_key("poneyland")
   .or_insert("poneyland", 3);

Currently I use the following function to hash once and automatically provide an owned key if necessary (somewhat similar to what was discussed in rust-lang/rfcs#1769):

use std::borrow::Borrow;
use std::collections::hash_map::RawEntryMut;
use std::hash::{BuildHasher, Hash, Hasher};

fn get_mut_or_insert_with<'a, K, V, Q, F>(
    map: &'a mut HashMap<K, V>,
    key: &Q,
    default: F,
) -> &'a mut V
where
    K: Eq + Hash + Borrow<Q>,
    Q: Eq + Hash + ToOwned<Owned = K>,
    F: FnOnce() -> V,
{
    let mut hasher = map.hasher().build_hasher();
    key.hash(&mut hasher);
    let hash = hasher.finish();

    match map.raw_entry_mut().from_key_hashed_nocheck(hash, key) {
        RawEntryMut::Occupied(entry) => entry.into_mut(),
        RawEntryMut::Vacant(entry) => {
            entry
                .insert_hashed_nocheck(hash, key.to_owned(), default())
                .1
        }
    }
}

Given k1 and k2 with the same type K such that hash(k1) != hash(k2), is there a use-case for calling RawEntryBuilderMut::from_key_hashed_nocheck with hash(k1), &k1 and then inserting with RawVacantEntry::or_insert using k2 ?

If there isn't, why not saving the hash in RawVacantEntryMut and using it inside RawVacantEntryMut::insert ? It would even be possible to assert in debug builds that the owned key has indeed the same hash as the borrowed key used to lookup the entry.

@timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Apr 13, 2019

I'm not yet very familiar with this API, but what @gdouezangrard suggested seems like a great idea to me. What even happens currently if the two hashes don't match, is the key-value pair then inserted into the wrong bucket? It's not clear to me from (quickly) reading the source code.

@sujayakar
Copy link

@sujayakar sujayakar commented Apr 26, 2019

I submitted rust-lang/hashbrown#54 to support using a K that doesn't implement Hash via the raw entry API. See rust-lang/hashbrown#44 for the original motivation. Now that hashbrown is merged into std, could we expose this functionality on the std::collections::hash_map types as well?

If so, I'd be happy to submit a PR!

@thomcc
Copy link
Contributor

@thomcc thomcc commented Apr 11, 2020

This is a really great API, it's also what keeps crates (hashlink for example) using hashbrown instead of the stdlib hash map -- since hashbrown exposes this.

What could be next steps here towards stabilization?

@KodrAus KodrAus added I-nominated Libs-Tracked labels Jul 29, 2020
@sanbox-irl
Copy link

@sanbox-irl sanbox-irl commented Nov 7, 2020

Just gonna add another ping here -- what's blocking this right now?

@Amanieu
Copy link
Member

@Amanieu Amanieu commented Nov 12, 2020

I see a few things that need to be resolved:

I would recommend prototyping in the hashbrown crate first, which can then be ported back in the the std HashMap.

@xfix
Copy link
Contributor

@xfix xfix commented Feb 4, 2021

I find raw_entry and raw_entry_mut methods unnecessary - unlike entry method, they don't take any parameters, they just provide access to methods that could as well be in HashMap itself. I think I would consider getting rid of those and putting raw entry APIs directly in HashMap. .raw_entry().from_key(...) is also unnecessary, unless I'm missing something it's identical to already stabilized .get_key_value(...).

I also would like to point out that RawVacantEntryMut doesn't really do much other than providing an API that allows insertion which provides a reference to inserted key and value. This structure doesn't store anything other than a mutable reference to a hash map. This particular API can be used to create unrelated keys, like in this example.

#![feature(hash_raw_entry)]

use std::collections::HashMap;

fn main() {
    let mut map = HashMap::new();
    map.raw_entry_mut().from_key(&42).or_insert(1, 2);
    println!("{}", map[&1]);
}

This is a bit like calling insert after determining an entry is vacant. I think raw_entry_mut APIs could return Options just like raw_entry APIs.

#![feature(hash_raw_entry)]

use std::collections::hash_map::{HashMap, RawEntryMut};

fn main() {
    let mut map = HashMap::new();
    if let RawEntryMut::Vacant(_) = map.raw_entry_mut().from_key(&42) {
        map.insert(1, 2);
    }
    println!("{}", map[&1]);
}

I think raw entry API is useful, but I don't think its API should be conflated with entry API.

@tkaitchuck
Copy link
Contributor

@tkaitchuck tkaitchuck commented Mar 28, 2021

As discussed here: rust-lang/hashbrown#232
Allowing the user to specify the hashed value with the contract that it is generated in the same way that the map computes that hash has two drawbacks:

  1. It locks in the implementation of the Hashmap to never changing how that code is invoked. In particular this prohibits hashmap from ever using specialization. This is leaving significant performance gains on the table for types with fixed lengths and short strings. (This makes raw_entry a non-zero-cost-abstraction because the cost is incurred even if the feature is not used.)
  2. It creates an opportunity for a bugs in applications that accidently do something different. If for example an application takes advantage of this to create a memoized hash or similar, and their calculation is different in some cases the results will be unexpected and lack a clear error message.

If the feature of a user specified hash is needed, it may be useful to instead provide a method on the raw entry to hash a key. That way the hashmap can implement this however it sees fit and the application code is less error prone because there is an unambiguous way to obtain the hash value if it is not known in advance.

@umanwizard
Copy link
Contributor

@umanwizard umanwizard commented Mar 30, 2021

There is a very simple and common use case that, as far as I can tell, is not currently possible with the standard HashMap, which covers most of the times I have wanted raw_entry_mut . Is it possible to support this without the full generality and complexity of the raw entries API?

Description of the use case:

Imagine I have HashMap<Vec<u8>, usize>. What I want to do is:

(1) check if some key exists,
(2) if so, get the corresponding value; if not, write a new value.

There are two options I can see, each of which has unnecessary overhead.

First option:

let x = my_map.entry(key.clone().or_insert(42);

Second option:

let x = if let Some(x) = my_map.get(&key) {
    *x
} else {
    my_map.insert(key.clone(), 42);
    42
};

The defect of the first option is that it requires a clone of the key even when no insertion is necessary. The defect of the second option is that it requires two hash computations and lookups when an insertion is necessary.

I find myself wanting to do this practically every time I use hash maps, so it's possible this is an already supported basic API and I'm just missing it in the docs.

@thomcc
Copy link
Contributor

@thomcc thomcc commented Mar 30, 2021

I find myself wanting to do this practically every time I use hash maps, so it's possible this is an already supported basic API and I'm just missing it in the docs.

No, this requires raw_entry_mut.

@umanwizard
Copy link
Contributor

@umanwizard umanwizard commented Mar 30, 2021

Thanks for confirming. Has the idea of adding an API for this specific use case (e.g., get_or_clone_key_and_insert_with, which would of course only be available when K: Clone) been discussed? I feel like it would be much simpler than nailing down the fully general raw_entry_mut .

@thomcc
Copy link
Contributor

@thomcc thomcc commented Mar 30, 2021

Yes, there was a great deal of discussion about this in the past.

Note that K: Clone isn't quite enough for common str/String or [T]/Vec<T> cases. It would need to be ToOwned, probably at the very least.

@umanwizard
Copy link
Contributor

@umanwizard umanwizard commented Mar 30, 2021

Do you know where to find the past discussion? I'm interested in learning whether/why that idea was rejected. After that I will stop spamming this issue; sorry!

@Amanieu
Copy link
Member

@Amanieu Amanieu commented Mar 30, 2021

IMO raw_entry is a poor API because it is trying to do too much at once. It is trying to serve two different types of use cases:

  • High level cases where you just want to avoid having to pass ownership of the key into entry. These might be better served with an entry_ref API that takes a T: ToOwned. Or perhaps allowing the user to provide the key separately when inserting (if it doesn't match the key used for the lookup then that's a user error similar to a misbehaving Eq impl).
  • Low level cases where you are doing fancy things like hash memoization, value interning or dealing with values that don't implement Eq/Hash because they refer to data outside the HashMap (e.g. IndexMap). These might be better served by using a lower-level hash table API such as hashbrown's RawTable which has methods like this:
// - Just a single T instead of (K, V).
// - No bounds on T.
// - Hashing and equality checking is completely user-controlled.
// - A hasher is provided by the caller for functions that might cause a table resize.

pub fn get_mut(
    &mut self,
    hash: u64,
    eq: impl FnMut(&T) -> bool
) -> Option<&mut T>;

pub fn insert_entry(
    &mut self,
    hash: u64,
    value: T,
    hasher: impl Fn(&T) -> u64
) -> &mut T;

@umanwizard
Copy link
Contributor

@umanwizard umanwizard commented Mar 30, 2021

Reading the past discussions posted by Thom, I didn't find where entry_ref, entry_or_clone and similar ideas were clearly rejected -- just that discussion on them has fizzled out because raw_entry will subsume those use cases.

It has been five years - perhaps it's time to land those so people can start using HashMaps now without performance overhead, rather than waiting for raw_entry_mut to maybe someday be stabilized.

@SkiFire13
Copy link
Contributor

@SkiFire13 SkiFire13 commented May 25, 2021

Low level cases where you are doing fancy things like hash memoization, value interning or dealing with values that don't implement Eq/Hash because they refer to data outside the HashMap (e.g. IndexMap). These might be better served by using a lower-level hash table API such as hashbrown's RawTable which has methods like this:

This applies to BTreeMap and BinaryHeap too. Maybe we could have a Raw* version of each collection?

@Amanieu
Copy link
Member

@Amanieu Amanieu commented Sep 1, 2021

We reviewed this in today's @rust-lang/libs-api meeting and agreed with the summary in #56167 (comment).

We'd like to see an implementation of entry_ref which allows an Entry to be created from just a reference to a key, which would be held in a Cow internally and upgraded to an owned key when necessary for an insertion.

We feel that any more advanced use cases are better off using hashbrown's RawTable API directly which is much more flexible than the HashMap API.

@rfcbot fcp close

@rfcbot
Copy link

@rfcbot rfcbot commented Sep 1, 2021

Team member @Amanieu has proposed to close this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period disposition-close labels Sep 1, 2021
@Stargateur
Copy link
Contributor

@Stargateur Stargateur commented Nov 6, 2021

I'm fine with the conclusion but I would like we have the alternative in nightly before remove raw_entry() from nightly we loose nothing wait for entry_ref and several code rely on this feature, I'm sure people are fine remove raw_entry from their code but they will want the alternative.

@dtolnay
Copy link
Member

@dtolnay dtolnay commented Nov 10, 2021

I agree with #56167 (comment), don't think there is any rush to delete this. Keeping it in front of people will help people form an opinion of how to serve each of the two use cases better.

@rfcbot concern keep unstable for now

@ajtribick
Copy link
Contributor

@ajtribick ajtribick commented Jan 18, 2022

I created an initial implementation of the entry_ref API (rust-lang/hashbrown#301) which is available in version 0.12 of hashbrown in case people want to try it out.

@m-ou-se m-ou-se added the S-blocked label Jan 19, 2022
@joshtriplett joshtriplett added S-tracking-design-concerns and removed S-blocked labels Jan 19, 2022
@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Jan 19, 2022

@rfcbot cancel

We've discussed this in the libs api meeting. We'll keep this around as unstable until we have something better.

@rfcbot
Copy link

@rfcbot rfcbot commented Jan 19, 2022

@m-ou-se proposal cancelled.

@rfcbot rfcbot removed proposed-final-comment-period disposition-close labels Jan 19, 2022
@scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 1, 2022

One thing I wanted to record for consideration in whatever this becomes:

It's certainly true that for unknown generic types, one cannot trust the Hash impl, and thus can't trust HashMap for arbitrary generic types when it comes to soundness.

But it might be nice to be able to trust, say, a HashMap<usize> (type and hasher both from the standard library) that someone gave you in your unsafe code actually only has distinct items in it. And that would imply that the various unchecked raw APIs would need to be unsafe or a different type.

For example, imagine a fn deref_mut_multiple<T>(slice: &mut [T], indexes: HashSet<usize>) -> impl Iterator<Item = &mut T>. Without the raw methods, that can be sound without extra checks, just mapping over get_unchecked. But the existence of the raw methods makes that unsound, and in a way that can't be solved by something like TrustedHash/TrustedEq marker unsafe traits.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Mar 3, 2022

I’ve replaced a use of entry with raw_entry_mut in order to avoid memory allocation for keys already present in a HashMap. To simplify, keys in the map are Cow<'a, [u8]> and the input is &'b [u8]. In some cases lifetimes match so Cow::Borrowed can be used, but in other cases Cow::Owned is needed.

entry_ref is not a good fit since it relies on From impls to convert to an owned key, so I’d need two wrapper types for the two cases. (Here "owned" is Cow<[u8]> as opposed to &[u8], not Cow::Owned.) raw_entry_mut provides more flexibility on how to do that conversion.

RawTable provides all of the flexibility but much lower-level, so switching would be a far-reaching change. Even in if that still makes sense (removing the lifetime parameter in maps and associated owning-ref-like hacks may be worth using indices into side storage instead of Cow::Borrowed), RawTable has some rough edges: no documentation, seemingly-unnecessarily unsafe iterators, etc.

Perhaps a middle ground would be to change entry_ref to remove its use of From, and instead have APIs where the owned key is passed explicitly like in raw_entry_mut?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections B-unstable C-tracking-issue Libs-Tracked S-tracking-design-concerns T-libs-api
Projects
None yet
Development

No branches or pull requests