Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Open
Gankro opened this Issue Jul 14, 2017 · 66 comments

Comments

Projects
None yet
@Gankro
Copy link
Contributor

Gankro commented Jul 14, 2017

    /// Creates an iterator which uses a closure to determine if an element should be removed.
    ///
    /// If the closure returns true, then the element is removed and yielded.
    /// If the closure returns false, it will try again, and call the closure
    /// on the next element, seeing if it passes the test.
    ///
    /// Using this method is equivalent to the following code:
    ///
    /// ```
    /// # let mut some_predicate = |x: &mut i32| { *x == 2 };
    /// # let mut vec = vec![1, 2, 3, 4, 5];
    /// let mut i = 0;
    /// while i != vec.len() {
    ///     if some_predicate(&mut vec[i]) {
    ///         let val = vec.remove(i);
    ///         // your code here
    ///     }
    ///     i += 1;
    /// }
    /// ```
    ///
    /// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
    /// because it can backshift the elements of the array in bulk.
    ///
    /// Note that `drain_filter` also lets you mutate ever element in the filter closure,
    /// regardless of whether you choose to keep or remove it.
    ///
    ///
    /// # Examples
    ///
    /// Splitting an array into evens and odds, reusing the original allocation:
    ///
    /// ```
    /// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15];
    ///
    /// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
    /// let odds = numbers;
    ///
    /// assert_eq!(evens, vec![2, 4, 6, 8, 14]);
    /// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]);
    /// ```
    fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<T, F>
        where F: FnMut(&mut T) -> bool,
    { ... }

I'm sure there's an issue for this somewhere, but I can't find it. Someone nerd sniped me into implementing it. PR incoming.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jul 31, 2017

Related issues:
#25477
#34265

bors added a commit that referenced this issue Aug 15, 2017

Auto merge of #43245 - Gankro:drain-filter, r=sfackler
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 4, 2017

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Sep 22, 2017

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Sep 23, 2017

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 23, 2017

.remove_if has been a prior suggestion too

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Sep 25, 2017

There is no precedent for using _where or _if anywhere in the standard library.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Sep 25, 2017

@Gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

@crlf0710

This comment has been minimized.

Copy link
Contributor

crlf0710 commented Oct 23, 2017

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

@thegranddesign

This comment has been minimized.

Copy link

thegranddesign commented Oct 25, 2017

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

@thegranddesign

This comment has been minimized.

Copy link

thegranddesign commented Oct 25, 2017

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

@rpjohnst

This comment has been minimized.

Copy link
Contributor

rpjohnst commented Oct 25, 2017

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@thegranddesign

This comment has been minimized.

Copy link

thegranddesign commented Oct 25, 2017

@rpjohnst so I searched here, am I missing filter somewhere?

@rpjohnst

This comment has been minimized.

Copy link
Contributor

rpjohnst commented Oct 25, 2017

Yes, it's a member of Iterator, not Vec.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Oct 25, 2017

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

@kennytm kennytm changed the title Tracking issue for Vec::drain_filter Tracking issue for Vec::drain_filter and LinkedList::drain_filter Nov 27, 2017

@polarathene

This comment has been minimized.

Copy link

polarathene commented Dec 3, 2017

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Dec 3, 2017

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

@tmccombs

This comment has been minimized.

Copy link
Contributor

tmccombs commented Dec 7, 2017

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Dec 7, 2017

Yes

bors added a commit that referenced this issue Dec 9, 2017

Auto merge of #46581 - tmccombs:drain_filter_drop, r=sfackler
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)

bors added a commit that referenced this issue Dec 9, 2017

Auto merge of #46581 - tmccombs:drain_filter_drop, r=sfackler
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
@shepmaster

This comment has been minimized.

Copy link
Member

shepmaster commented Apr 22, 2018

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

How and where does the team foresee this type of experimentation happening?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Apr 22, 2018

How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr and Vec::set_len). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.

@Emerentius

This comment has been minimized.

Copy link
Contributor

Emerentius commented Apr 25, 2018

I've been playing around with this for a bit. I'll open a thread on internals in the next days.

@Boscop

This comment has been minimized.

Copy link

Boscop commented Apr 25, 2018

I think a general API that works like this makes sense:

    v.drain(a..b).where(pred)

So it's a builder-style API: If .where(pred) is not appended, it will drain the whole range unconditionally.
This covers the capabilities of the current .drain(a..b) method as well as .drain_filter(pred).

If the name drain can't be used because it's already in use, it should be a similar name like drain_iter.

The where method shouldn't be named *_filter to avoid confusion with filtering the resulting iterator, especially when where and filter are used in combination like this:

    v.drain(..).where(pred1).filter(pred2)

Here, it will use pred1 to decide what will be drained (and passed on in the iterator) and pred2 is used to filter the resulting iterator.
Any elements that pred1 returns true for but pred2 returns false for will still get drained from v but won't get yielded by this combined iterator.

What do you think about this kind of builder-style API approach?

@Boscop

This comment has been minimized.

Copy link

Boscop commented Apr 25, 2018

For a second I forgot that where can't be used as function name because it's already a keyword :/

And drain is already stabilized so the name can't be used either..

Then I think the second best overall option is to keep the current drain and rename drain_filter to drain_where, to avoid the confusion with .drain(..).filter().

(As jonhoo said above: )

what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@Emerentius

This comment has been minimized.

Copy link
Contributor

Emerentius commented May 3, 2018

I've opened a thread on internals.
The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter sooner rather than later with a RangeBounds parameter. Unless someone has a good idea for solving the issues outlined there.

Edit: I've uploaded my experimental code: drain experiments
There are also drain and clearing benches and some tests but don't expect clean code.

@Popog

This comment has been minimized.

Copy link

Popog commented May 11, 2018

Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator. View it here.

@Boscop

This comment has been minimized.

Copy link

Boscop commented May 11, 2018

@Emerentius but then we should at least rename drain_filter to drain_where, to indicate that the closure has to return true to remove the element!

@Emerentius

This comment has been minimized.

Copy link
Contributor

Emerentius commented May 12, 2018

@Boscop Both imply the same 'polarity' of true => yield. I personally don't care whether it's called drain_filter or drain_where.

@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).

@askeksa

This comment has been minimized.

Copy link

askeksa commented May 26, 2018

How about drain_or_retain? It's a grammatically meaningful action, and it signals that it does one or the other.

@Boscop

This comment has been minimized.

Copy link

Boscop commented Jun 2, 2018

@askeksa But that doesn't make it clear whether returning true from the closure means "drain" or "retain".
I think with a name like drain_where, it's very clear that returning true drains it, and it should be clear to everyone that the elements that aren't drained are retained.

@mjbshaw

This comment has been minimized.

Copy link
Contributor

mjbshaw commented Jun 2, 2018

It would be nice if there was some way to limit/stop/cancel/abort the drain. For example, if I wanted to drain the first N even numbers, it would be nice to be able to just do vec.drain_filter(|x| *x % 2 == 0).take(N).collect() (or some variant of that).

As it's currently implemented, DrainFilter's drop method will always run the drain to completion; it can't be aborted (at least I haven't figured out any trick that would do that).

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jun 4, 2018

If you want that behaviour you should just close over some state that tracks how many you've seen and start returning false. Running to completion on drop is necessary to make adaptors behave reasonably.

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Jun 12, 2018

I just noticed that the way drain_filter is currently implemented is not unwind safe but
actually a safety hazard wrt. unwind + resume safety. Additionally it easily causes an abord, both
of which are behaviours a method in std really shouldn't have. And while writing this I noticed
that it's current implementation is unsafe

I know that Vec is by default not unwind safe, but the behaviour of drain_filer when the
predicate panics is well surprising because:

  1. it will continue calling the closure which paniced when drop
    if the closure panics again this will cause an aboard and while some people
    like all panics to be aboard other work with error-kernel patterns and for them
    ending up with a aboard is quite bad
  2. if will not correctly continue the draining potentially one value
    and containing one value already dropped potentially leading to use after free

An example of this behaviour is here:
play.rust-lang.org

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Jun 12, 2018

While the 2. point should be solvable I think the first point on itself should
lead to an reconsideration of the behaviour of DrainFilter to run to completation
on drop, reasons for changing this include:

  • iterators are lazy in rust, executing an iterator when dropping is kinda unexpected behaviour
    deriving from what is normally expected
  • the predicate passed to drain_filter might panic under some circumstances (e.g. a lock
    got poisoned) in which case it's likely-ish to panic again when called during drop leading
    to an double panic and therefore aboard, which is quite bad for anyone using error kernel
    patterns or at last wanting to shut down in a controlled way, it's fine if you use panic=aboard anyway
  • if you have side effects in the predicate and don't run DrainFilter to completion you might get
    surprising bugs when it is then run to completion when dropped (but you might have done
    other thinks between draining it to a point and it being dropped)
  • you can not opt-out of this behaviour without modifying the predicate passed to it, which you
    might not be able to do without wrapping it, on the other hand you can always opt-in to run
    it to completion by just running the iterator to completion (yes this last argument is a bit
    handwavey)

Arguments for running to completion include:

  • drain_filter is similar to ratain which is a function, so people might be surprised when they
    "just" drop DrainFilter instead of running it to completion
    • this argument was countered many times in other RFC's and is why #[unused_must_use]
      exist's, which in some situations already recommend to use .for_each(drop) which ironically
      happens to be what DrainFilter does on drop
  • drain_filter is often used for it's side effect only, so it's to verbose
    • using it that way makes it rougly equal to retain
      • but retain use &T, drain_filter used &mut T
  • others??
  • [EDIT, ADDED LATER, THX @tmccombs ]: not completing on drop can be very confusing when combined with adapters like find, all, any which I quite a good reason to keep the current behaviour.

It might be just me or I missed some point but changing the Drop behaviour and
adding #[unused_must_use] seems to be preferable?

If .for_each(drop) is to long we might instead consider to add an RFC for iterators meant for
there side effect adding a method like complete() to the iterator (or well drain() but this
is a complete different discussion)

@tmccombs

This comment has been minimized.

Copy link
Contributor

tmccombs commented Jun 12, 2018

others??

I can't find the original reasoning, but I remember there was also some problem with adapters working with a DrainFilter that doesn't run to completion.

See also #43244 (comment)

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Jun 12, 2018

Good point, e.g. find would cause drain to drain just until it hit's the first
match, similar all, any do short circuit, which can be quite confusing
wrt. drain.

Hm, maybe I should change my opinion. Through this might be a general problem
with iterators having side-effects and maybe we should consider a general solution
(independent of this tracking issue) like a .allways_complete() adapter.

@Emerentius

This comment has been minimized.

Copy link
Contributor

Emerentius commented Jun 12, 2018

I have personally not found any safety reason why drain needs to run to completion but as I've written here a couple posts above, the side-effects on next() interact in a suboptimal way with adapters such as take_while, peekable and skip_while.

This also brings up the same issues as my RFC on non-selfexhausting drain and its companion selfexhausting iter adapter RFC.

It's true that drain_filter can easily cause aborts but can you show an example of where it violates safety?

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Jun 12, 2018

Yup, I already did: play.rust-lang.org

Which is this:

#![feature(drain_filter)]

use std::panic::catch_unwind;

struct PrintOnDrop {
    id: u8
}

impl Drop for PrintOnDrop {
    fn drop(&mut self) {
        println!("dropped: {}", self.id)
    }
}

fn main() {
    println!("-- start --");
    let _ = catch_unwind(move || {
        let mut a: Vec<_> = [0, 1, 4, 5, 6].iter()
            .map(|&id| PrintOnDrop { id })
            .collect::<Vec<_>>();
        
        let drain = a.drain_filter(|dc| {
            if dc.id == 4 { panic!("let's say a unwrap went wrong"); }
            dc.id < 4
        });
        
        drain.for_each(::std::mem::drop);
    });
    println!("-- end --");
    //output:
    // -- start --
    // dropped: 0    <-\
    // dropped: 1       \_ this is a double drop
    // dropped: 0  _  <-/
    // dropped: 5   \------ here 4 got leaked (kind fine)  
    // dropped: 6
    // -- end --
    
}

But that's an implementation internal think, which went wrong.
Basically the open question is how to handle the panic of an predicate function:

  1. skip the element it panicked on, leak it and increase the del counter
    • requires some form of panic detection
  2. do not advance idx before calling the predicate
    • but this means on drop will call it again with the same predicate

Another question is if it's a good idea to run functions which can be seen as api user input on drop
in general, but then this is the only way not to make find, any, etc. behave confusing.

Maybe a consideration could be something like:

  1. set a flag when entering next, unset it before returning from next
  2. on drop if the flag is still set we know we paniced and hence leak
    the remaining items OR drop all remaining items
    1. can be quite a big leak with unexpected side effects if you e.g. leak an Arc
    2. can be very surprising if you have Arc and Weak's

Maybe there is an better solution.
Through whichever it is it should be documented in rustdoc once implemented.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jun 13, 2018

@dathinab

Yup, I already did

Leaking is undesirable but fine and may be hard to avoid here, but a double-drop is definitely not. Good catch! Would you like to report a separate issue about this safety problem?

@vityafx

This comment has been minimized.

Copy link

vityafx commented Aug 10, 2018

Does drain_filter do reallocations every time it removes an item from collection? Or it does reallocate only once and works like std::remove and std::erase (in pair) in C++? I'd prefer such behavior because of exactly one allocation: we simply put our elements to the end of collection and then removes shrink it to proper size.

Also, why there is no try_drain_filter ? Which returns Option type, and None value if we should stop? I have a very big collection and it is meaningless to continue for me when I have already got what I needed.

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Aug 10, 2018

@vityafx

This comment has been minimized.

Copy link

vityafx commented Aug 10, 2018

@rustonaut thanks. What is your opinion about try_drain_filter? :)

P.S. Just looked at the code too, it looks as it works the way we wanted.

@rustonaut

This comment has been minimized.

Copy link

rustonaut commented Aug 10, 2018

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Oct 26, 2018

I think this would be more versatile:

fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>
@azriel91

This comment has been minimized.

Copy link

azriel91 commented Feb 12, 2019

Hi, I was searching for the drain_filter functionality for HashMap but it doesn't exist, and was asked to open an issue when I found this one. Should it be in a separate issue?

@ExpHP

This comment has been minimized.

Copy link
Contributor

ExpHP commented Apr 1, 2019

Is anything currently blocking this from stabilization? Is it still unwind-unsafe as reported above?

This seems like a pretty small feature, and it has been in limbo for over a year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.