pre-RFC: HashSet::pop() #1800

joshtriplett · 2016-11-24T22:39:59Z

(Originally filed as rust-lang/rust#37986, but a pre-RFC belongs in the RFC repo, not the rust repo. Re-filing it here.)

I'd like to have a pop() method available for HashSet, which removes an item from the set if any, and returns an Option<T> (returning None if empty). That would make it easy to iterate over a set with while let element = set.pop() { ... }, without holding a mutable reference to the set as HashSet::drain() does, so you can insert more items into the set as you iterate.

Does that seem like a reasonable method to add to HashSet? Does this need an RFC, or just a patch to std?

The text was updated successfully, but these errors were encountered:

comex · 2016-11-25T00:02:25Z

Only issue is that it'd have to search linearly through the table for a non-empty bucket, so calling it in a loop would be O(n^2)ish.

joshtriplett · 2016-11-25T00:11:14Z

@comex If you stored a "hint" index for the last location with an item, you should get O(n) amortized time to pop every item out of a set.

If you had a loop that also inserted new items, you should still avoid quadratic behavior, as long as the hash doesn't place items immediately behind the hint index every time.

eddyb · 2016-11-25T00:19:42Z

@joshtriplett Sounds more like what a draining iterator might do.

clarfonthey · 2017-02-21T20:31:51Z

@eddyb, @joshtriplett specifically requested this so that they can add more items into the set during iteration, so, drain is out for this use case

bluss · 2017-02-21T20:46:31Z

This is efficient (O(1) average) with a hashmap implementation like OrderMap's, but not with Rust's current HashMap.

crates.io already provides ordermap and linked-hash-map for two different use cases, but both have O(1) pop.

joshtriplett · 2017-02-21T22:17:15Z

@bluss An OrderSet would solve this for me. It'd be nice to have those in core Rust, but I don't mind pulling in another crate.

clarfonthey · 2017-02-21T23:03:36Z

@joshtriplett These are all part of the contain-rs organisation, so, I'd talk to them about that. A lot of these data structures are deemed as non-essential and many were actually removed from libstd, but if you can make a compelling case otherwise you could probably write up an RFC for it.

I don't know where the original discussion on this was but you could probably search for it.

Centril · 2018-04-26T08:29:32Z

Triage ping @joshtriplett

joshtriplett · 2018-04-26T09:24:00Z

I'm still interested in this and would still like to see such a method exist, or some other alternative that allows iteration without holding a reference.

Centril · 2018-04-26T09:25:30Z

@joshtriplett What I mean is: do you have any plans on making a PR / writing an RFC? 😃

joshtriplett · 2018-04-26T17:01:41Z

@Centril Making a PR, no. Writing an RFC, maybe; depends heavily on whether one would be welcomed or not, and based on this pre-RFC I'm not sure.

Centril · 2018-04-26T17:04:00Z

@joshtriplett Personally I like the idea. cc @rust-lang/libs -- what do you think?

joshtriplett · 2019-04-02T01:08:05Z

Does the switch to hashbrown affect this at all? Would that make this easier/harder?

eddyb · 2019-04-20T16:06:14Z

@joshtriplett I suspect it could make this significantly faster, at least the naive linear scan.
cc @Amanieu

Amanieu · 2019-04-20T16:09:46Z

Yes the linear scan would be a bit faster, but you would still end up with O(n^2) complexity for a pop loop. I don't think that this is something that we want to encourage.

jonhoo · 2019-04-23T14:48:38Z

I wonder if it would be feasible to add this under a name like scan_remove_next to highlight the potential cost?

Amanieu · 2019-04-23T14:50:44Z

IMO anyone that needs this functionality should be using IndexMap instead, which guarantees O(1) pop. Incidentally this is similar to the hash table that Python uses for its dict.

jonhoo · 2019-04-23T15:09:50Z

That's what I'm now using in the places where I can, and you're right that that's what to use where possible! It's perhaps somewhat less discoverable, so maybe we should point people towards that from the HashMap docs (there are other "nice" features it adds that people may be used to coming from things like dict)? Unfortunately, IndexMap has the downside that items are popped in (reverse?) insertion order, which means it doesn't fit all use-cases (see, e.g., https://github.com/Amanieu/hashbrown/issues/43). That said, maybe those cases are too niche to really worry about.

Diggsey · 2019-04-23T15:50:01Z

IMO anyone that needs this functionality should be using IndexMap instead, which guarantees O(1) pop. Incidentally this is similar to the hash table that Python uses for its dict.

Unfortunately that's not very discoverable, if all you know is that you need a pop function.

burdges · 2019-04-23T16:06:29Z

I'm nervous about such an innocuous sounding method being O(n) time too.

After NLL lands then you could write pop() like

hm.keys().next().and_then( |k| hm.remove_entry(k) )

If you need several sequential pops, or need to skip some, then adapting this code snippet gives better performance. Could both it and IndexMap, etc. be mentioned in some book?

joshtriplett · 2019-04-23T21:52:51Z

On Tue, Apr 23, 2019 at 09:06:41AM -0700, Jeff Burdges wrote: I'm nervous about such an innocuous sounding method being O(n) time too.

I would ideally like such a method to be (amortized) O(1), which AFAICT would just require storing a hint value.

jonhoo · 2019-04-23T22:12:56Z

@joshtriplett I don't quite see how that's true? After the value in question has been removed, you need to search for the next hint value, which is the same cost as the original search anyway.

joshtriplett · 2019-04-23T22:25:21Z

Not quite; you can skip re-searching large empty areas of the table. That'd help avoid O(n**2) behavior when using pop in a loop, as long as you didn't happen to insert new elements right behind the hint every time.

(I'd also be happy with an equivalent to drain() that doesn't prevent adding new items in the body of the loop.)

burdges · 2019-04-24T07:01:12Z

I'd think some Drain::abort method achieves that easiest, no?

I'm worried that adding your hint hurts almost all HashMap usages. Also, inserting corrupts the hint for some use cases like resuming after a partial drain. It's kinda neat your hint makes pop like an undo for insert, except this sounds fragile and encourages bugs.

Instead, one could maybe expose the index, like returning it from Drain::abort, and provide some Drain::resume(HashMap<..>, usize) -> Drain method? In resume, there are explicitly no guarantees about behavior since resizing could make an index invalid, but if you want that behavior then the method exists.

joshtriplett · 2019-04-24T18:27:47Z

I wonder if it could work to have the drain iterator provide a couple of methods to modify the HashMap that it holds a mutable reference to?

Suppose you could do this:

let iter = foo.drain();
for x in iter {
    // ...
    if bar {
        iter.modify(|foo| foo.insert(...))
    }
}

Amanieu · 2019-04-24T19:10:03Z

I don't think Drain is the right place for this since it keeps the table in an invalid state while iterating (items have been moved out but not marked as such in the metadata).

avl · 2019-05-19T11:00:11Z

Wouldn't it be good if there existed a data structure like indexmap::IndexMap in Rust std library? I've always found it surprising that the default hashmap in Rust has outright bad performance for iteration.

I suppose std HashMap is much faster for inserts and lookups, which means we can't just change std HashMap implementation to that of IndexMap.

But, for certain kinds of problems, for instance breadth-first searches in graphs, the usecase where IndexMap excels is very common. It's unsatisfactory that standard C# beats Rust performance wise for these problems.

Couldn't we arrange for IndexMap to be available in the standard library? And have the documentation for HashMap hint that if iterations is needed, IndexMap may give better performance?

alecmocatta · 2020-07-04T12:01:55Z

The O(n^2) for a pop loop is sufficiently surprising that I don't believe pop should be in that form in std.

As best I recall, every time I've wanted pop it's been for a pop loop that may return early and can't take ownership of the remainder of the entries. Is it possible to expose iteration of OccupiedEntrys? In which case that use case could be handled like this:

for (k, v) in map.entries().map(OccupiedEntry::remove_entry) {
    ...
}

Or, to avoid borrowing map. O(n^2) again but at least it's more obvious:

while let Some((k, v)) = map.entries().map(OccupiedEntry::remove_entry).next() {
    ...
}

In many cases of course IndexSet/Map is more appropriate. Perhaps that crate can be referenced in the HashSet/Map docs as providing .pop()?

nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Nov 27, 2016

schulzch mentioned this issue Dec 22, 2018

add pop() to HashSet etc.? rust-lang/rust#27804

Open

jonhoo mentioned this issue Apr 23, 2019

Support retrieving an arbitrary element rust-lang/hashbrown#43

Closed

ssomers mentioned this issue Nov 9, 2019

proposal for BTreeMap/Set min/max, #62924 rust-lang/rust#65637

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-RFC: HashSet::pop() #1800

pre-RFC: HashSet::pop() #1800

joshtriplett commented Nov 24, 2016

comex commented Nov 25, 2016 •

edited

Loading

joshtriplett commented Nov 25, 2016

eddyb commented Nov 25, 2016

clarfonthey commented Feb 21, 2017

bluss commented Feb 21, 2017

joshtriplett commented Feb 21, 2017

clarfonthey commented Feb 21, 2017

Centril commented Apr 26, 2018

joshtriplett commented Apr 26, 2018

Centril commented Apr 26, 2018

joshtriplett commented Apr 26, 2018

Centril commented Apr 26, 2018

joshtriplett commented Apr 2, 2019

eddyb commented Apr 20, 2019

Amanieu commented Apr 20, 2019

jonhoo commented Apr 23, 2019

Amanieu commented Apr 23, 2019 •

edited

Loading

jonhoo commented Apr 23, 2019

Diggsey commented Apr 23, 2019 •

edited

Loading

burdges commented Apr 23, 2019 •

edited

Loading

joshtriplett commented Apr 23, 2019 via email

jonhoo commented Apr 23, 2019

joshtriplett commented Apr 23, 2019

burdges commented Apr 24, 2019

joshtriplett commented Apr 24, 2019 •

edited

Loading

Amanieu commented Apr 24, 2019

avl commented May 19, 2019

alecmocatta commented Jul 4, 2020

pre-RFC: HashSet::pop() #1800

pre-RFC: HashSet::pop() #1800

Comments

joshtriplett commented Nov 24, 2016

comex commented Nov 25, 2016 • edited Loading

joshtriplett commented Nov 25, 2016

eddyb commented Nov 25, 2016

clarfonthey commented Feb 21, 2017

bluss commented Feb 21, 2017

joshtriplett commented Feb 21, 2017

clarfonthey commented Feb 21, 2017

Centril commented Apr 26, 2018

joshtriplett commented Apr 26, 2018

Centril commented Apr 26, 2018

joshtriplett commented Apr 26, 2018

Centril commented Apr 26, 2018

joshtriplett commented Apr 2, 2019

eddyb commented Apr 20, 2019

Amanieu commented Apr 20, 2019

jonhoo commented Apr 23, 2019

Amanieu commented Apr 23, 2019 • edited Loading

jonhoo commented Apr 23, 2019

Diggsey commented Apr 23, 2019 • edited Loading

burdges commented Apr 23, 2019 • edited Loading

joshtriplett commented Apr 23, 2019 via email

jonhoo commented Apr 23, 2019

joshtriplett commented Apr 23, 2019

burdges commented Apr 24, 2019

joshtriplett commented Apr 24, 2019 • edited Loading

Amanieu commented Apr 24, 2019

avl commented May 19, 2019

alecmocatta commented Jul 4, 2020

comex commented Nov 25, 2016 •

edited

Loading

Amanieu commented Apr 23, 2019 •

edited

Loading

Diggsey commented Apr 23, 2019 •

edited

Loading

burdges commented Apr 23, 2019 •

edited

Loading

joshtriplett commented Apr 24, 2019 •

edited

Loading