Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retrieving an arbitrary element #43

Closed
jonhoo opened this issue Feb 7, 2019 · 6 comments
Closed

Support retrieving an arbitrary element #43

jonhoo opened this issue Feb 7, 2019 · 6 comments

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Feb 7, 2019

This may be an exceptionally niche use-case but I figured I'd bring it up anyway. In Noria, we use a hash table as a sort of cache, and occasionally we want to evict a random element from the map. Unfortunately, there is no good way of getting that efficiently (in hashbrown or in std::HashMap). map.keys().skip(rand(map.len())) for example is far too slow. @fintelia has a fork of std::HashMap in rahashmap that we're using, and it:

Together, these three operations let us efficiently find a random key/value pair in a map. I realize that this may be fair more internals than you'd like to expose, but some way of getting at/removing a random element would be really handy, as I don't think there's currently a way of doing it through the normally-exposed APIs. Is there any avenue whereby you could imagine a use-case like this being supported? I was toying with the idea of (ab)using RawEntryBuilder::from_hash, but guessing a hash is very unlikely to yield anything useful, so I think that's out...

@jonhoo
Copy link
Contributor Author

jonhoo commented Feb 7, 2019

cc @ms705

@Amanieu
Copy link
Member

Amanieu commented Feb 8, 2019

As you said, this is a very niche use case and I don't think that it is appropriate to support this API in hashbrown. Have you considered using IndexMap which does support your use case?

@fintelia
Copy link

fintelia commented Feb 8, 2019

Another option is to keep a long lived fork of hashbrown, which I think would be much less painful than a fork of the standard library. The standard hashmap uses a ton of unstable features, including a couple compiler internal features. My fork could thus only run on nightly and even then frequently breaks without warning because of some compiler change. The only way I've stood a chance at keeping it working was to signup for notification emails any time there was a change to the hash_map directory in the main Rust repository.

To me this points to a broader point that the standard library is too tightly coupled to the compiler. hashbrown gets better performance despite not relying on these internal features.

@jonhoo
Copy link
Contributor Author

jonhoo commented Feb 8, 2019

@Amanieu unfortunately it's a little complicated by the fact that we are accessing the map through evmap, and I don't think evmap will want to switch from hashbrown to IndexMap... That said, maybe the trick will be to somehow make evmap generic over the underlying map (if that doesn't require higher-kinded trait bounds that is)... Thanks anyway!

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 23, 2019

An alternative here I suppose is to emulate Python's dict::popitem which just removes and returns an arbitrary (key, value). This has the added benefit that you can do things like:

while let Some((key, value)) = map.popitem() {
    // ...
    map.insert(key2, value2);
}

The downside though is that if it removes, say, the first item by hash value, it will always remove the same key if you re-insert it, which makes it a poor choice for eviction. One workaround for that might be to remove the first element following a random starting position.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 23, 2019

Just for completeness and for others looking at this, this ties in to rust-lang/rust#27804 and rust-lang/rfcs#1800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants