Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for HashMap::raw_entry #56167

Open
sfackler opened this issue Nov 22, 2018 · 9 comments

Comments

@sfackler
Copy link
Member

commented Nov 22, 2018

Added in #54043.


As of 6ecad33 / 2019-01-09, this feature covers:

impl<K, V, S> HashMap<K, V, S>
    where K: Eq + Hash,
          S: BuildHasher
{
    pub fn raw_entry(&self) -> RawEntryBuilder<K, V, S> {…}
    pub fn raw_entry_mut(&mut self) -> RawEntryBuilderMut<K, V, S> {…}
}

pub struct RawEntryBuilder<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return Option<(&'a K, &'a V)>
pub struct RawEntryBuilderMut<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return RawEntryMut<'a, K, V, S>
pub enum RawEntryMut<'a, K: 'a, V: 'a, S: 'a> {
    Occupied(RawOccupiedEntryMut<'a, K, V>),
    Vacant(RawVacantEntryMut<'a, K, V, S>),
}
pub struct RawOccupiedEntryMut<'a, K: 'a, V: 'a> {…}
pub struct RawVacantEntryMut<'a, K: 'a, V: 'a, S: 'a> {…}

… as well as Debug impls for each 5 new types, and their inherent methods.

@Amanieu

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2018

What is the motivation for having separate from_hash and search_bucket methods? It seems that the only difference is whether the hash value is checked before calling is_match. However if the table does not store full hashes (i.e. hashbrown) then there is no difference between these methods.

Could we consider merging these methods into a single one? Or is there some use case where the difference in behavior is useful?

@Gankra

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

I am also extremely confused by this distinction, as my original designs didn't include them (I think?) and the documentation that was written is very unclear.

@Amanieu

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

@fintelia

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

The reason I added search_bucket was because I wanted to be able to delete a random element from a HashMap in O(1) time, without storing an extra copy of all the keys. Basically, instead of doing something like this:

let key = map.iter().nth(rand() % map.len()).0.clone();
map.remove(&key);

I wanted to just be able to pick a random "bucket" and then get an entry/raw entry to the first element in it if any:

loop {
    if let Occupied(o) = map.raw_entry_mut().search_bucket(rand(), || true) {
        o.remove();
        break;
    }
}

(the probabilities aren't uniform in the second version, but close enough for my purposes)

@Gankra

This comment has been minimized.

Copy link
Contributor

commented Nov 28, 2018

I continue to not want to support the "random deletion" usecase in std's HashMap. You really, really, really, should be using a linked hashmap or otherwise ordered map for that.

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 8, 2018
It doesn't work in hashbrown anyways (see rust-lang#56167)
@Amanieu

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2018

I have removed this method in the hashbrown PR (#56241). Your code snippet for random deletion won't work with hashbrown anyways since it always checks the hash as part of the search process.

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 11, 2018
It doesn't work in hashbrown anyways (see rust-lang#56167)
@gdouezangrard

This comment has been minimized.

Copy link

commented Mar 1, 2019

I can avoid unnecessary clones inherent to the original entry API which is nice. But unless I'm mistaken, the current raw_entry API seems to hash the keys twice in this simple use case:

#![feature(hash_raw_entry)]

use std::collections::HashMap;

let mut map = HashMap::new();

map.raw_entry_mut()
   .from_key("poneyland")
   .or_insert("poneyland", 3);

Currently I use the following function to hash once and automatically provide an owned key if necessary (somewhat similar to what was discussed in rust-lang/rfcs#1769):

use std::borrow::Borrow;
use std::collections::hash_map::RawEntryMut;
use std::hash::{BuildHasher, Hash, Hasher};

fn get_mut_or_insert_with<'a, K, V, Q, F>(
    map: &'a mut HashMap<K, V>,
    key: &Q,
    default: F,
) -> &'a mut V
where
    K: Eq + Hash + Borrow<Q>,
    Q: Eq + Hash + ToOwned<Owned = K>,
    F: FnOnce() -> V,
{
    let mut hasher = map.hasher().build_hasher();
    key.hash(&mut hasher);
    let hash = hasher.finish();

    match map.raw_entry_mut().from_key_hashed_nocheck(hash, key) {
        RawEntryMut::Occupied(entry) => entry.into_mut(),
        RawEntryMut::Vacant(entry) => {
            entry
                .insert_hashed_nocheck(hash, key.to_owned(), default())
                .1
        }
    }
}

Given k1 and k2 with the same type K such that hash(k1) != hash(k2), is there a use-case for calling RawEntryBuilderMut::from_key_hashed_nocheck with hash(k1), &k1 and then inserting with RawVacantEntry::or_insert using k2 ?

If there isn't, why not saving the hash in RawVacantEntryMut and using it inside RawVacantEntryMut::insert ? It would even be possible to assert in debug builds that the owned key has indeed the same hash as the borrowed key used to lookup the entry.

@timvermeulen

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

I'm not yet very familiar with this API, but what @gdouezangrard suggested seems like a great idea to me. What even happens currently if the two hashes don't match, is the key-value pair then inserted into the wrong bucket? It's not clear to me from (quickly) reading the source code.

@sujayakar

This comment has been minimized.

Copy link

commented Apr 26, 2019

I submitted rust-lang/hashbrown#54 to support using a K that doesn't implement Hash via the raw entry API. See rust-lang/hashbrown#44 for the original motivation. Now that hashbrown is merged into std, could we expose this functionality on the std::collections::hash_map types as well?

If so, I'd be happy to submit a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.