Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Open
Gankra opened this issue Jul 14, 2017 · 115 comments
Open

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Gankra opened this issue Jul 14, 2017 · 115 comments

Comments

@Gankra
Copy link
Contributor

@Gankra Gankra commented Jul 14, 2017

    /// Creates an iterator which uses a closure to determine if an element should be removed.
    ///
    /// If the closure returns true, then the element is removed and yielded.
    /// If the closure returns false, it will try again, and call the closure
    /// on the next element, seeing if it passes the test.
    ///
    /// Using this method is equivalent to the following code:
    ///
    /// ```
    /// # let mut some_predicate = |x: &mut i32| { *x == 2 };
    /// # let mut vec = vec![1, 2, 3, 4, 5];
    /// let mut i = 0;
    /// while i != vec.len() {
    ///     if some_predicate(&mut vec[i]) {
    ///         let val = vec.remove(i);
    ///         // your code here
    ///     }
    ///     i += 1;
    /// }
    /// ```
    ///
    /// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
    /// because it can backshift the elements of the array in bulk.
    ///
    /// Note that `drain_filter` also lets you mutate ever element in the filter closure,
    /// regardless of whether you choose to keep or remove it.
    ///
    ///
    /// # Examples
    ///
    /// Splitting an array into evens and odds, reusing the original allocation:
    ///
    /// ```
    /// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15];
    ///
    /// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
    /// let odds = numbers;
    ///
    /// assert_eq!(evens, vec![2, 4, 6, 8, 14]);
    /// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]);
    /// ```
    fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<T, F>
        where F: FnMut(&mut T) -> bool,
    { ... }

I'm sure there's an issue for this somewhere, but I can't find it. Someone nerd sniped me into implementing it. PR incoming.

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Jul 31, 2017

Related issues:
#25477
#34265

bors added a commit that referenced this issue Aug 15, 2017
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bluss
Copy link
Member

@bluss bluss commented Sep 4, 2017

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

@rustonaut
Copy link

@rustonaut rustonaut commented Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

@rustonaut
Copy link

@rustonaut rustonaut commented Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

@jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Sep 22, 2017

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@rustonaut
Copy link

@rustonaut rustonaut commented Sep 23, 2017

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

@bluss
Copy link
Member

@bluss bluss commented Sep 23, 2017

.remove_if has been a prior suggestion too

@rustonaut
Copy link

@rustonaut rustonaut commented Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

@Gankra
Copy link
Contributor Author

@Gankra Gankra commented Sep 25, 2017

There is no precedent for using _where or _if anywhere in the standard library.

@jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Sep 25, 2017

@gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

@crlf0710
Copy link
Contributor

@crlf0710 crlf0710 commented Oct 23, 2017

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

@thegranddesign
Copy link

@thegranddesign thegranddesign commented Oct 25, 2017

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

@thegranddesign
Copy link

@thegranddesign thegranddesign commented Oct 25, 2017

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

@rpjohnst
Copy link
Contributor

@rpjohnst rpjohnst commented Oct 25, 2017

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@thegranddesign
Copy link

@thegranddesign thegranddesign commented Oct 25, 2017

@rpjohnst so I searched here, am I missing filter somewhere?

@rpjohnst
Copy link
Contributor

@rpjohnst rpjohnst commented Oct 25, 2017

Yes, it's a member of Iterator, not Vec.

@Gankra
Copy link
Contributor Author

@Gankra Gankra commented Oct 25, 2017

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

@kennytm kennytm changed the title Tracking issue for Vec::drain_filter Tracking issue for Vec::drain_filter and LinkedList::drain_filter Nov 27, 2017
@polarathene
Copy link

@polarathene polarathene commented Dec 3, 2017

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

@jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Dec 3, 2017

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

@tmccombs
Copy link
Contributor

@tmccombs tmccombs commented Dec 7, 2017

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

@Gankra
Copy link
Contributor Author

@Gankra Gankra commented Dec 7, 2017

Yes

bors added a commit that referenced this issue Dec 9, 2017
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
bors added a commit that referenced this issue Dec 9, 2017
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
@johnw42
Copy link

@johnw42 johnw42 commented Feb 17, 2020

Almost, the more general version would take an FnMut(T) -> Option<U>, like Iterator::{filter_map, find_map, map_while} do. I have no idea if it's worth to generalize filter_map in this way, but it might be worth considering.

The function has to return Option<T> because the values it produces are stored in a Vec<T>.

@timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Feb 17, 2020

@johnw42 I'm not sure I follow, wouldn't the Some value immediately be yielded by the iterator?

Actually, I guess the input value of that function still needs to be &T or &mut T instead of T in case you don't want to drain it. Or maybe the function could be something like FnMut(T) -> Result<U, T>. But I don't see why the item type couldn't be some other type.

@johnw42
Copy link

@johnw42 johnw42 commented Feb 18, 2020

@timvermeulen I think we're interpreting the proposal differently.

The way I interpreted it, the Some value is stored back into the Vec, and None means the original value is yielded by the iterator. That allows the closure to either update the value in place, or move it out of the Vec. In writing this, I realized my version doesn't really add anything because you can implement it in terms of drain_filter:

fn drain_filter_map<F>(
    &mut self,
    mut f: F,
) -> DrainFilter<T, impl FnMut(&mut T) -> bool>
where
    F: FnMut(&T) -> Option<T>,
{
    self.drain_filter(move |value| match f(value) {
        Some(new_value) => {
            *value = new_value;
            false
        }
        None => true,
    })
}

Conversely, I was thinking your interpretation isn't very useful because it's equivalent to mapping the result of drain_filter, but I tried to write it, and it's not, for the same reason filter_map isn't equivalent to calling filter followed by map.

@timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Feb 19, 2020

@johnw42 Ah, yeah, I thought you wanted None to mean that the value should stay in the Vec.

So it seems that FnMut(T) -> Result<U, T> would be the most general, though it's probably not very ergonomic. FnMut(&mut T) -> Option<U> isn't really an option actually because that wouldn't allow you to take ownership of the T in the general case. I think FnMut(T) -> Result<U, T> and FnMut(&mut T) -> bool are the only options.

@johnw42
Copy link

@johnw42 johnw42 commented Feb 25, 2020

@timvermeulen I started to say something earlier about a "most general" solution, and my "most general" solution was different from yours, but I reached the same conclusion, which is that trying to make a function too general results in a something you wouldn't actually want to use.

Although maybe there's still some value in making a very general method that advanced users can build nicer abstractions on top of. As far as I can tell, the point of drain and drain_filter isn't that they're particularly ergonomic APIs--they're not--but that they support use cases that occur in practice, and which can't be written any other way without a lot of redundant moves (or using unsafe operations).

With drain, you get the following nice properties:

  • Any contiguous selection of elements can be removed.
  • Dropping the removed items is as simple as discarding the returned iterator.
  • The removed items don't have to be dropped; the caller can examine each one individually and choose what to do with it.
  • The contents of the Vec need not support Copy or Clone.
  • No memory for the Vec itself needs to be allocated or freed.
  • Values left in the Vec are moved at most once.

With drain_filter, you gain the ability to remove an arbitrary set of items from the Vec rather than just a contiguous range. A less obvious advantage is that even if a contiguous range of items is removed, drain_filter can still offer a performance boost if finding the range to pass to drain would involve making a separate pass over the Vec to inspect its contents. Because the argument to the closure is a &mut T, it's even possible to update the items left in the Vec. Hooray!

Here are some other things you might want to do with an in-place operation like drain_filter:

  1. Transform the removed items before returning them through the iterator.
  2. Abort the operation early and report an error.
  3. Instead of just removing the item or leaving it in place (while possibly mutating it), add the ability to remove the item and replace it with a new value (which might be a clone of the original value, or something else entirely).
  4. Replace the removed item with multiple new items.
  5. After doing something with the current item, skip over some number of following items, leaving them in place.
  6. After doing something with the current item, remove some number of following items without inspecting them first.

And here's my analysis of each:

  1. This doesn't add anything useful because the caller can already transform the items as they're returned by the iterator. It also kind of defeats the purpose of the iterator, which is to avoid cloning values by delivering them to the caller only after they've been removed from the Vec.
  2. Being able to abort early could potentially improve the asymptotic complexity in some cases. Reporting an error as part of the API adds nothing new because you could do the same thing by having the closure mutate a captured variable, and it's not clear how to even do it, because the value won't be produced until after the iterator has been consumed.
  3. This, I think, adds some real generality.
  4. This is what I was originally going to propose as the "kitchen sink" option, but I decided it's not helpful, because if a single item can be replaced by multiple items, it's impossible to maintain the property that items in the Vec are moved at most once, and it might be necessary to re-allocate the buffer. If you need to do something like that, it's not necessarily more efficient than just building a whole new Vec, and it could be worse.
  5. This could be helpful if the Vec is organized in such a way that you can just skip over a large portion of the items without stopping to inspect them. I didn't include it in my sample code below, but it could be supported by changing the closure to return an additional usize specifying how many of the following items to jump over before continuing.
  6. This seems complementary to item 5, but it's not very useful if you still need to return the removed items through the iterator. It might still be a useful optimization, though, if the items you've removing have no destructor and you just want to make them disappear. In that case case, the usize above could be replaced with a choice of Keep(usize) or Drop(usize) (where Keep(0) and Drop(0) are semantically equivalent).

I think we can support the essential use cases by having the closure return an enum with 4 cases:

fn super_drain(&mut self, f: F) -> SuperDrainIter<T>
    where F: FnMut(&mut T) -> DrainAction<T>;

enum DrainAction<T>  {
    /// Leave the item in the Vec and don't return anything through
    /// the iterator.
    Keep,

    /// Remove the item from the Vec and return it through the
    /// iterator.
    Remove,

    /// Remove the item from the Vec, return it through the iterator,
    /// and swap a new value into the location of the removed item.
    Replace(T),
    
    /// Leave the item in place, don't return any more items through
    /// the iterator, and don't call the closure again.
    Stop,
}

One last option I'd like to present is to get rid of the iterator entirely, pass items to the closure by value, and allow the caller to leave an item unchanged by replacing it with itself:

fn super_drain_by_value(&mut self, f: F)
    where F: FnMut(T) -> DrainAction<T>;

enum DrainAction<T>  {
    /// Don't replace the item removed from the Vec.
    Remove,

    /// Replace the item removed from the Vec which a new item.
    Replace(T),
    
    Stop,
}

I like this approach best because it's simple and it supports all the same use cases. The potential downside is that even if most of the items are left in place, they still need to be moved into the closure's stack frame and then moved back when the closure returns. One would hope those moves could be reliably optimized away when the closure just returns its argument, but I'm not sure if that's something we should count on. If other people like it enough to include it, I think update would be a good name for it, because if I'm not mistaken, it can be used to implement any single-pass in-place update of a Vec's contents.

(BTW, I completely ignored linked lists above because I forgot about them entirely until I looked at the title of this issue. If we're talking about a linked list, it changes the analysis of points 4-6, so I think a different API would be appropriate for linked lists.)

@jplatte
Copy link
Contributor

@jplatte jplatte commented Feb 25, 2020

@johnw42 you can already do 3. if you have a mutable reference, by using mem::replace or mem::take.

@zserik
Copy link

@zserik zserik commented Feb 25, 2020

@johnw42 @jplatte

(3) only really makes sense if we allow the item type of the returned Iterator to be different from the item type of the collection.
(3) is a special case, because you both return the element from the Iterator, and put a new element back into the Vec.

Bikeshedding: I would kinda reverse the function of Replace(T) and replace it with PushOut(T), with the purpose to "submit" the inner value of PushOut onto the iterator, while keeping the original (parameter) item in the Vec.

Stop should probably carry the ability to return an Error type (or work a bit like try_fold?).

@johnw42
Copy link

@johnw42 johnw42 commented Feb 25, 2020

I implemented my super_drain_by_value function last night, and I learned several things.

The headline item should probably be that, at least w.r.t. Vec, everything we're talking about here is in the "nice to have" category (as opposed to adding a fundamentally new capability), because Vec already essentially provides direct read and write access to all of its fields through the existing API. In the stable version, there's a small caveat that you can't observe the pointer field of an empty Vec, but the unstable into_raw_parts method removes that restriction. What we're really talking about is expanding the set of operations that can be performed efficiently by safe code.

In terms of code generation, I found that in the easy cases (e.g. Vec<i32>), redundant moves in and out of the Vec are a non-issue, and calls that amount to simple things like a no-op or truncating the Vec are transformed into code that's too simple to be improved upon (zero and three instructions, respectively). The bad news is that for harder cases, both my proposal and and the drain_filter method do a lot of unnecessary copying, largely defeating the purpose of the methods. I tested this by looking at the assembly code generated for a Vec<[u8; 1024]>, and in both cases, each iteration has two calls to memcpy that aren't optimized away. Even a no-op call ends up copying the entire buffer twice!

In terms of ergonomics, my API, which looks pretty nice at first glance, isn't so nice in practice; returning an enum value from the closure becomes pretty verbose in all but the simplest cases, and the variant I proposed where the closure returns a pair of enum values is even uglier.

I also tried extending DrainAction::Stop to carry an R value that's returned from super_drain_by_value as an Option<R>, and that's even worse, because in the (presumably typical) case where the returned value isn't needed, the compiler can't infer R and you have to explicitly annotate the type of a value you're not even using. For this reason, I don't think it's a good idea to support returning a value from the closure to the caller of super_drain_by_value; it's roughly analogous to why a loop {} construct can return a value, but any other kind of loop evaluates to ().

Regarding generality, I realized the there are actually two cases for premature termination: one where the rest of the Vec is dropped, and another there it's left in place. If premature termination doesn't carry a value (as I think it shouldn't), it becomes semantically equivalent to returning Keep(n) or Drop(n), where n is the number of items not yet examined. I do, however, think premature termination should be treated as a separate case, because compared to using Keep/Drop, it's easier to use goes through a simpler code path.

To make the API a bit more friendly, I think a better option would be to make the closure return () and pass it a helper object (which I'll refer to here as an "updater") that can be used to inspect each element of the Vec and control what happens to it. These methods can have familiar names like borrow, borrow_mut, and take, with extra methods like keep_next(n) or drop_remainder(). Using this kind of API, the closure is much simpler in simple cases and no more complex in the complex cases. By having most of the updater's methods take self by value, it's easy to prevent the caller from doing things like calling take more than once, or giving conflicting instructions for what to do in subsequent iterations.

But we can still do better! I realized this morning that, as is so often the case, this problem is analogous to one that has been definitively solved in functional languages, and we can solve it with an analogous solution. I'm talking about "zipper" APIs, first described in this short paper with sample code in OCaml, and described here with Haskell code and links to other relevant papers. Zippers provide a very general way to traverse a data structure and update it "in place" using whatever operations that particular data structure supports. Another way to think of it is that a zipper is a sort of turbocharged iterator with extra methods for performing operations on a specific type of data structure.

In Haskell, you get "in place" semantics by making the zipper a monad; in Rust, you can do the same thing using lifetimes by having the zipper hold a mut reference to the Vec. A zipper for a Vec is very similar to the updater I described above, except that instead of passing it repeatedly to a closure, the Vec just provides a method to create a zipper on itself, and the zipper has exclusive access to the Vec as long as it exists. The caller then becomes responsible to writing a loop to traverse the array, calling a method at each step to either remove the current item from the Vec, or leave it in place. Early termination can be implemented by calling a method that consumes the zipper. Because the loop is under the caller's control, it becomes possible to do things like process more than one item in each iteration of the loop, or handle a fixed number of items without using a loop at all.

Here's a very contrived example showing some of the things a zipper can do:

/// Keep the first 100 items of `v`.  In the next 100 items of `v`,
/// double the even values, unconditionally keep anything following an
/// even value, discard negative values, and move odd values into a
/// new Vec.  Leave the rest of `v` unchanged.  Return the odd values
/// that were removed, along with a boolean flag indicating whether
/// the loop terminated early.
fn silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
    let mut odds = Vec::new();
    // Create a zipper, which get exclusive access to `v`.
    let mut z = v.zipper();
    // Skip over the first 100 items, leaving them unchanged.
    z.keep_next(100);
    let stopped_early = loop {
        if let Some(item /* &mut i32 */) = z.current_mut() {
            if *item < 0 {
                // Discard the value and advance the zipper.
                z.take();
            } else if *item % 2 == 0 {
                // Update the item in place.
                *item *= 2;

                // Leave the updated item in `v`.  This has the
                // side-effect of advancing `z` to the next item.
                z.keep();

                // If there's another value, keep it regardless of
                // what it is.
                if z.current().is_some() {
                    z.keep();
                }
            } else {
                // Move an odd value out of `v`.
                odds.push(z.take());
            }
            if z.position() >= 200 {
                // This consumes `z`, so we must break out of the
                // loop!
                z.keep_rest();
                break true;
            }
        } else {
            // We've reached the end of `v`.
            break false;
        }
    }
    (stopped_early, odds)

    // If the zipper wasn't already consumed by calling
    // `z.keep_rest()`, the zipper is dropped here, which will shift
    // the contents of `v` to fill in any gaps created by removing
    // values.
}

For comparison, here's more or less the same function using drain_filter, except it only pretends to stop early. It's about the same amount of code, but IMHO it's a lot harder to read because the meaning of the value returned by the closure isn't obvious, and it's using mutable boolean flags to carry information from one iteration to the next, where the zipper version achieves the same thing with control flow. Because removed items are always yielded by the iterator, we need a separate filter step to remove negative values from the output, which means we need to check for negative values in two places instead of one. It's also kind of ugly that it has to keep track of the position in v; the implementation of drain_filter has that information, but the caller has no access to it.

fn drain_filter_silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
    let mut position: usize = 0;
    let mut keep_next = false;
    let mut stopped_early = false;
    let removed = v.drain_filter(|item| {
        position += 1;
        if position <= 100 {
            false
        } else if position > 200 {
            stopped_early = true;
            false
        } else if keep_next {
            keep_next = false;
            false
        } else if *item >= 0 && *item % 2 == 0 {
            *item *= 2;
            false
        } else {
            true
        }
    }).filter(|item| item >= 0).collect();
    (stopped_early, removed)
}
@timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Feb 25, 2020

@johnw42 Your earlier post reminded me of the scanmut crate, specifically the Remover struct, and the "zipper" concept you mentioned seems very similar! That does seem a lot more ergonomic than a method that takes a closure for when you want total control.

Either way, this is probably not very relevant to whether drain_filter should be stabilized, since we can always swap out the internals later. drain_filter itself will always be very useful because of how convenient it is. The only change I'd still like to see before stabilization is a RangeBounds parameter.

@Boscop
Copy link

@Boscop Boscop commented Feb 26, 2020

@timvermeulen I think it makes sense to add a RangeBounds parameter, but keep the current closure signature (F: FnMut(&mut T) -> bool).
You can always post-process the drained elements with filter_map or whatever you want.
(For me, it's very important that the closure allows mutating the element, because retain doesn't allow it (it was stabilized before this mistake was discovered).)

@timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Feb 26, 2020

Yeah, that seems to be the perfect balance between convenience and usefulness.

@johnw42
Copy link

@johnw42 johnw42 commented Feb 27, 2020

@timvermeulen Yeah, I was straying rather far off the main topic.

One thing I noticed that is relevant to the original topic is that it's kind of hard to remember what the return value of the closure means--is it saying whether to keep the item or to remove it? I think it would be helpful for the docs to point out that v.drain_filter(p) is equivalent to v.iter().filter(p) with side-effects.

With filter, using a boolean value is still less than ideal for clarity, but it's a very well-known function, and IMHO it's at least somewhat intuitive that the predicate answers the question "should I keep this?" rather than "should I discard this?" With drain_filter, the same logic applies if you think about it from the perspective of the iterator, but if you think about it from the perspective of the input Vec, the question becomes "should I NOT keep this?"

As for the exact wording, I propose renaming the filter parameter to predicate (to match Iterator::filter) and adding this sentence somewhere in the description:

To remember how the return value of predicate is used, it may be helpful to keep in mind that drain_filter is identical to Iterator::filter with the additional side-effect of removing the selected items from self.

@Boscop
Copy link

@Boscop Boscop commented Feb 28, 2020

@johnw42 Yes, good point. I think a name like drain_where would be much clearer.

@shepmaster
Copy link
Member

@shepmaster shepmaster commented Feb 28, 2020

If you are going to get into naming bikeshedding; please make sure you’ve read all the comments; even hidden ones. Many variants have been proposed already, e.g. #43244 (comment)

@BartMassey
Copy link

@BartMassey BartMassey commented Mar 17, 2020

But… it has to be named draintain()! No other name is as beautiful!

@negamartin
Copy link

@negamartin negamartin commented Jun 9, 2020

I'm quite interested in this issue, and I read the whole thread, so I might as well try to summarize what everyone said, in the hopes of helping this get stabilized. I've added some of my own comments along the way, but I've tried to keep them as neutral as possible.

Naming

Here is a non-opinionated summary of the names I saw proposed:

  • drain_filter: The name used in the current implementation. Consistent with other names suchs as filter_map. Has the advantage of being analogous to drain().filter(), but with more side-effects.
  • drain_where: Has the benefit of indicating whether true results in draining out or filtering in, which can be hard to remember with other names. There is no precedent in std for the _where suffix, but there's plenty of precedents for similar suffixes.
  • A variation of drain().where(), since where is already a keyword.
  • drain_retain: Consistent with retain, but retain and drain have opposite interpretations of the boolean values returned by the closure, which may be confusing.
  • filtered_drain
  • drain_if
  • drain_when
  • remove_if

Parameters

It might be worth adding a range argument for consistency with drain.

Two closure formats have been suggested, FnMut(&mut T) -> bool and FnMut(T) -> Result<T, U>. The latter is more flexible, but also clumsier.

Reversing the boolean condition (true means "keep in the Vec") to be consistent with retain was discussed, but then it wouldn't be consistent with drain (true means "drain from the Vec").

Unwinding

When the filter closure panics, the DrainFilter iterator is dropped. The iterator then should finish draining the Vec, but to do so it must call the filter closure again, risking a double panic. There are some solutions, but all of them are compromises:

  • Don't finish draining on drop. This is pretty counterintuitive when used with adaptors such as find or all. Besides, it renders the v.drain_filter(...); idiom useless since iterators are lazy.

  • Always finish draining on drop. This risks double panics (which result in aborts), but makes behaviour consistent.

  • Only finish draining on drop if not currently unwinding. This fixes double panics entirely, but makes the behaviour of drain_filter unpredictable: dropping DrainFilter in a destructor might sometimes not actually do its job.

  • Only finish draining on drop if the filter closure did not panic. This is the current compromise made by drain_filter. A nice property of this approach is that panics in the filter closure "short-circuit", which arguably is quite intuitive.

Note that the current implementation is sound and never leaks as long as the DrainFilter struct is dropped (although it may cause an abort). Previous implementations were not safe/leak-free though.

Drain-on-drop

DrainIter could either finish draining the source vector when it's dropped, or it could only drain when next is called (lazy iteration).

Arguments in favor of drain-on-drop:

  • Consistent with the behaviour of drain.

  • Interacts well with other adaptors such as all, any, find, etc...

  • Enables the vec.drain_filter(...); idiom.

  • The lazy functionality could be explicitly enabled through drain_lazy-style methods or a lazy() adapter on DrainIter (and even on Drain, since it's backwards-compatible to add methods).

Arguments in favor of lazy iteration:

  • Consistent with almost all other iterators.

  • The "drain-on-drop" functionality could be explicitly enabled through adapters on DrainIter, or even through a general Iterator::exhausting adapter (see RFC #2370).

I might have missed some stuff, but at least I hope it helps newcomers when skimming the thread.

@aloucks
Copy link
Contributor

@aloucks aloucks commented Jun 9, 2020

@negamartin

Wouldn't the drain-on-drop option require that the iterator returns a reference to the item instead of the owned value? I think that would make it impossible to use drain_filter as a mechanism to remove and take ownership of items matching a specific condition (which was my original use-case).

@negamartin
Copy link

@negamartin negamartin commented Jun 11, 2020

I don't think so, since the behaviour of the current implementation is precisely drain-on-drop while yielding owned values. Either way, I don't see how drain-on-drop would require borrowing elements, so I think we have two different ideas on what drain-on-drop means.

Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed: Should all items matching the closure be drained even if the iterator isn't fully consumed? Or only up to the element that was consumed, leaving the rest untouched?

In particular, it's the difference between:

let mut v = vec![1, 5, 3, 6, 4, 7];
v.drain_where(|e| *e > 4).find(|e| *e == 6);

// Drain-on-drop
assert_eq!(v, &[1, 3, 4]);

// Lazy
assert_eq!(v, &[1, 3, 4, 7]);
@tmccombs
Copy link
Contributor

@tmccombs tmccombs commented Jun 11, 2020

Just throwing out an idea, but another possible API could be something like:

 fn drain_filter_into<F, D>(&mut self, filter: F, drain: D)
        where F: FnMut(&mut T) -> bool, 
                   D: Extend<T>
    { ... }

It's less flexible than the other options, but does avoid the problem of what to do when DrainFilter is dropped.

@BartMassey
Copy link

@BartMassey BartMassey commented Jun 11, 2020

It feels to me like this whole thing is looking to me less and less like retain_mut() (retain() with a mutable reference passed to the closure), which is what it was first and foremost intended to provide. Could we provide retain_mut() for now in addition to working on the design of filtered drain thing? Or am I missing something?

@aloucks
Copy link
Contributor

@aloucks aloucks commented Jun 11, 2020

@BartMassey

which is what it was first and foremost intended to provide.

I don't think that's the case. I specifically use drain_filter to take ownership of items based on the filter criteria. Drain and DrainFilter yeild the item where as retain does not.

@negamartin

Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed

Ah, Ok. That's my mistake. I misunderstood your definition. I had interpreted it as "nothing is removed from the vec until dropped", which doesn't really make any sense.

Arguments in favor of lazy iteration

I think it needs be consistent with drain. The Iterator::exhausting RFC was not accepted and it would be very odd if drain and drain_filter had seemingly opposite drain behaviors.

@Boscop
Copy link

@Boscop Boscop commented Jun 12, 2020

@negamartin

drain_filter: The name used in the current implementation. Consistent with other names suchs as filter_map. Has the advantage of being analogous to drain().filter(), but with more side-effects.

It is not analogous (that's why we need retain_mut/drain_filter):
drain().filter() would drain even those elements for which the filter closure returns false!

@negamartin
Copy link

@negamartin negamartin commented Jun 12, 2020

I just noticed a small line in a comment by the lib team in #RFC 2870:

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.

(The builder pattern is a bit unintuitive with iterators, since methods on iterators are usually adaptors, not behaviour-changers. Besides, there are no precedents, for example chunks and chunks_exact are two separate methods, not a chunks().exact() combo.)

@rustonaut
Copy link

@rustonaut rustonaut commented Jun 12, 2020

@jplatte
Copy link
Contributor

@jplatte jplatte commented Jun 12, 2020

Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.

No, since it breaks type inference in some cases. E.g. foo.method(bar.into()) will work with a conrete type argument, but not with a generic argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.