-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-RFC: HashSet::pop() #1800
Comments
Only issue is that it'd have to search linearly through the table for a non-empty bucket, so calling it in a loop would be O(n^2)ish. |
@comex If you stored a "hint" index for the last location with an item, you should get O(n) amortized time to pop every item out of a set. If you had a loop that also inserted new items, you should still avoid quadratic behavior, as long as the hash doesn't place items immediately behind the hint index every time. |
@joshtriplett Sounds more like what a draining iterator might do. |
@eddyb, @joshtriplett specifically requested this so that they can add more items into the set during iteration, so, |
This is efficient (O(1) average) with a hashmap implementation like OrderMap's, but not with Rust's current HashMap. crates.io already provides ordermap and linked-hash-map for two different use cases, but both have O(1) pop. |
@bluss An OrderSet would solve this for me. It'd be nice to have those in core Rust, but I don't mind pulling in another crate. |
@joshtriplett These are all part of the contain-rs organisation, so, I'd talk to them about that. A lot of these data structures are deemed as non-essential and many were actually removed from libstd, but if you can make a compelling case otherwise you could probably write up an RFC for it. I don't know where the original discussion on this was but you could probably search for it. |
Triage ping @joshtriplett |
I'm still interested in this and would still like to see such a method exist, or some other alternative that allows iteration without holding a reference. |
@joshtriplett What I mean is: do you have any plans on making a PR / writing an RFC? 😃 |
@Centril Making a PR, no. Writing an RFC, maybe; depends heavily on whether one would be welcomed or not, and based on this pre-RFC I'm not sure. |
@joshtriplett Personally I like the idea. cc @rust-lang/libs -- what do you think? |
Does the switch to hashbrown affect this at all? Would that make this easier/harder? |
@joshtriplett I suspect it could make this significantly faster, at least the naive linear scan. |
Yes the linear scan would be a bit faster, but you would still end up with O(n^2) complexity for a pop loop. I don't think that this is something that we want to encourage. |
I wonder if it would be feasible to add this under a name like |
IMO anyone that needs this functionality should be using |
That's what I'm now using in the places where I can, and you're right that that's what to use where possible! It's perhaps somewhat less discoverable, so maybe we should point people towards that from the |
Unfortunately that's not very discoverable, if all you know is that you need a |
I'm nervous about such an innocuous sounding method being O(n) time too. After NLL lands then you could write
If you need several sequential pops, or need to skip some, then adapting this code snippet gives better performance. Could both it and |
On Tue, Apr 23, 2019 at 09:06:41AM -0700, Jeff Burdges wrote:
I'm nervous about such an innocuous sounding method being O(n) time too.
I would ideally like such a method to be (amortized) O(1), which AFAICT
would just require storing a hint value.
|
@joshtriplett I don't quite see how that's true? After the value in question has been removed, you need to search for the next hint value, which is the same cost as the original search anyway. |
Not quite; you can skip re-searching large empty areas of the table. That'd help avoid (I'd also be happy with an equivalent to |
I'd think some I'm worried that adding your hint hurts almost all Instead, one could maybe expose the index, like returning it from |
I wonder if it could work to have the drain iterator provide a couple of methods to modify the HashMap that it holds a mutable reference to? Suppose you could do this: let iter = foo.drain();
for x in iter {
// ...
if bar {
iter.modify(|foo| foo.insert(...))
}
} |
I don't think |
Wouldn't it be good if there existed a data structure like indexmap::IndexMap in Rust std library? I've always found it surprising that the default hashmap in Rust has outright bad performance for iteration. I suppose std HashMap is much faster for inserts and lookups, which means we can't just change std HashMap implementation to that of IndexMap. But, for certain kinds of problems, for instance breadth-first searches in graphs, the usecase where IndexMap excels is very common. It's unsatisfactory that standard C# beats Rust performance wise for these problems. Couldn't we arrange for IndexMap to be available in the standard library? And have the documentation for HashMap hint that if iterations is needed, IndexMap may give better performance? |
The As best I recall, every time I've wanted for (k, v) in map.entries().map(OccupiedEntry::remove_entry) {
...
} Or, to avoid borrowing while let Some((k, v)) = map.entries().map(OccupiedEntry::remove_entry).next() {
...
} In many cases of course |
(Originally filed as rust-lang/rust#37986, but a pre-RFC belongs in the RFC repo, not the rust repo. Re-filing it here.)
I'd like to have a
pop()
method available forHashSet
, which removes an item from the set if any, and returns anOption<T>
(returningNone
if empty). That would make it easy to iterate over a set withwhile let element = set.pop() { ... }
, without holding a mutable reference to the set asHashSet::drain()
does, so you can insert more items into the set as you iterate.Does that seem like a reasonable method to add to HashSet? Does this need an RFC, or just a patch to std?
The text was updated successfully, but these errors were encountered: