New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Open
Gankro opened this Issue Jul 14, 2017 · 63 comments

Comments

Projects
None yet
@Gankro
Contributor

Gankro commented Jul 14, 2017

    /// Creates an iterator which uses a closure to determine if an element should be removed.
    ///
    /// If the closure returns true, then the element is removed and yielded.
    /// If the closure returns false, it will try again, and call the closure
    /// on the next element, seeing if it passes the test.
    ///
    /// Using this method is equivalent to the following code:
    ///
    /// ```
    /// # let mut some_predicate = |x: &mut i32| { *x == 2 };
    /// # let mut vec = vec![1, 2, 3, 4, 5];
    /// let mut i = 0;
    /// while i != vec.len() {
    ///     if some_predicate(&mut vec[i]) {
    ///         let val = vec.remove(i);
    ///         // your code here
    ///     }
    ///     i += 1;
    /// }
    /// ```
    ///
    /// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
    /// because it can backshift the elements of the array in bulk.
    ///
    /// Note that `drain_filter` also lets you mutate ever element in the filter closure,
    /// regardless of whether you choose to keep or remove it.
    ///
    ///
    /// # Examples
    ///
    /// Splitting an array into evens and odds, reusing the original allocation:
    ///
    /// ```
    /// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15];
    ///
    /// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
    /// let odds = numbers;
    ///
    /// assert_eq!(evens, vec![2, 4, 6, 8, 14]);
    /// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]);
    /// ```
    fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<T, F>
        where F: FnMut(&mut T) -> bool,
    { ... }

I'm sure there's an issue for this somewhere, but I can't find it. Someone nerd sniped me into implementing it. PR incoming.

@RalfJung

This comment has been minimized.

Show comment
Hide comment
@RalfJung

RalfJung Jul 31, 2017

Member

Related issues:
#25477
#34265

Member

RalfJung commented Jul 31, 2017

Related issues:
#25477
#34265

bors added a commit that referenced this issue Aug 15, 2017

Auto merge of #43245 - Gankro:drain-filter, r=sfackler
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 4, 2017

Contributor

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

Contributor

bluss commented Sep 4, 2017

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

rustonaut commented Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

rustonaut commented Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

@jonhoo

This comment has been minimized.

Show comment
Hide comment
@jonhoo

jonhoo Sep 22, 2017

Contributor

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

Contributor

jonhoo commented Sep 22, 2017

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Sep 23, 2017

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

rustonaut commented Sep 23, 2017

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 23, 2017

Contributor

.remove_if has been a prior suggestion too

Contributor

bluss commented Sep 23, 2017

.remove_if has been a prior suggestion too

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

rustonaut commented Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Sep 25, 2017

Contributor

There is no precedent for using _where or _if anywhere in the standard library.

Contributor

Gankro commented Sep 25, 2017

There is no precedent for using _where or _if anywhere in the standard library.

@jonhoo

This comment has been minimized.

Show comment
Hide comment
@jonhoo

jonhoo Sep 25, 2017

Contributor

@Gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

Contributor

jonhoo commented Sep 25, 2017

@Gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

@crlf0710

This comment has been minimized.

Show comment
Hide comment
@crlf0710

crlf0710 Oct 23, 2017

Contributor

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

Contributor

crlf0710 commented Oct 23, 2017

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

@thegranddesign

This comment has been minimized.

Show comment
Hide comment
@thegranddesign

thegranddesign Oct 25, 2017

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

thegranddesign commented Oct 25, 2017

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

@thegranddesign

This comment has been minimized.

Show comment
Hide comment
@thegranddesign

thegranddesign Oct 25, 2017

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

thegranddesign commented Oct 25, 2017

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

@rpjohnst

This comment has been minimized.

Show comment
Hide comment
@rpjohnst

rpjohnst Oct 25, 2017

Contributor

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

Contributor

rpjohnst commented Oct 25, 2017

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@thegranddesign

This comment has been minimized.

Show comment
Hide comment
@thegranddesign

thegranddesign Oct 25, 2017

@rpjohnst so I searched here, am I missing filter somewhere?

thegranddesign commented Oct 25, 2017

@rpjohnst so I searched here, am I missing filter somewhere?

@rpjohnst

This comment has been minimized.

Show comment
Hide comment
@rpjohnst

rpjohnst Oct 25, 2017

Contributor

Yes, it's a member of Iterator, not Vec.

Contributor

rpjohnst commented Oct 25, 2017

Yes, it's a member of Iterator, not Vec.

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Oct 25, 2017

Contributor

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

Contributor

Gankro commented Oct 25, 2017

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

@kennytm kennytm changed the title from Tracking issue for Vec::drain_filter to Tracking issue for Vec::drain_filter and LinkedList::drain_filter Nov 27, 2017

@polarathene

This comment has been minimized.

Show comment
Hide comment
@polarathene

polarathene Dec 3, 2017

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

polarathene commented Dec 3, 2017

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

@jonhoo

This comment has been minimized.

Show comment
Hide comment
@jonhoo

jonhoo Dec 3, 2017

Contributor

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

Contributor

jonhoo commented Dec 3, 2017

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

@tmccombs

This comment has been minimized.

Show comment
Hide comment
@tmccombs

tmccombs Dec 7, 2017

Contributor

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

Contributor

tmccombs commented Dec 7, 2017

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Dec 7, 2017

Contributor

Yes

Contributor

Gankro commented Dec 7, 2017

Yes

bors added a commit that referenced this issue Dec 9, 2017

Auto merge of #46581 - tmccombs:drain_filter_drop, r=sfackler
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)

bors added a commit that referenced this issue Dec 9, 2017

Auto merge of #46581 - tmccombs:drain_filter_drop, r=sfackler
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
@Emerentius

This comment has been minimized.

Show comment
Hide comment
@Emerentius

Emerentius Feb 20, 2018

Contributor

Why exactly does dropping the iterator cause it to run through to the end? I think that's surprising behaviour for an iterator and it could also be, if desired, done explicitly. The inverse of taking only as many elements out as you need is impossible on the other hand because mem::forgeting the iterator runs into leak amplification.

Contributor

Emerentius commented Feb 20, 2018

Why exactly does dropping the iterator cause it to run through to the end? I think that's surprising behaviour for an iterator and it could also be, if desired, done explicitly. The inverse of taking only as many elements out as you need is impossible on the other hand because mem::forgeting the iterator runs into leak amplification.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 25, 2018

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().
On an intuitive logical level, it would make more sense to return true for entries I want to keep, does anyone else feel this way? (Especially considering that retain() already works this way)
Why not do that, and rename drain_filter() to retain_iter() or retain_drain() (or drain_retain())?
Then it would also mirror retain() more closely!

Boscop commented Feb 25, 2018

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().
On an intuitive logical level, it would make more sense to return true for entries I want to keep, does anyone else feel this way? (Especially considering that retain() already works this way)
Why not do that, and rename drain_filter() to retain_iter() or retain_drain() (or drain_retain())?
Then it would also mirror retain() more closely!

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 25, 2018

rustonaut commented Feb 25, 2018

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 25, 2018

But with drain_where() the closure would still have to return true for elements that should be removed, which is the opposite of retain() which makes it inconsistent..
Maybe retain_where?
But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain() makes the most sense: It's like drain() but retaining the elements where the closure returns true.

Boscop commented Feb 25, 2018

But with drain_where() the closure would still have to return true for elements that should be removed, which is the opposite of retain() which makes it inconsistent..
Maybe retain_where?
But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain() makes the most sense: It's like drain() but retaining the elements where the closure returns true.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 25, 2018

rustonaut commented Feb 25, 2018

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 25, 2018

But how often would you migrate from drain() to drain_filter()?
In all cases until now, I migrated from retain() to drain_filter() because there is no retain_mut() in std and I need to mutate the element! So then I had to invert the closure return value..
I think drain_retain() makes sense because the drain() method drains unconditionally all elements in the range, whereas drain_retain() retains the elements where the closure returns true, it combines the effects of the drain() and retain() methods.

Boscop commented Feb 25, 2018

But how often would you migrate from drain() to drain_filter()?
In all cases until now, I migrated from retain() to drain_filter() because there is no retain_mut() in std and I need to mutate the element! So then I had to invert the closure return value..
I think drain_retain() makes sense because the drain() method drains unconditionally all elements in the range, whereas drain_retain() retains the elements where the closure returns true, it combines the effects of the drain() and retain() methods.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 25, 2018

rustonaut commented Feb 25, 2018

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 25, 2018

Ah yes, but I think the "price" of inverting the closures in current code that uses drain_filter() is worth it, to get a consistent and intuitive API in std and then in stable.
It's only a small fixed cost (and eased by the fact that it would go along with a renaming of the function, so the compiler error could tell the user that the closure has to be inverted, so it wouldn't silently introduce a bug), compared to the cost of standardizing drain_filter() and then people always having to invert the closure when migrating from retain() to drain_filter().. (on top of the mental cost of remembering to do that, and the costs of making it harder to find it in the docs, coming from retain() and searching for "something like retain() but passing &mut to its closure, which is why I think it makes sense that the new name of this function has "retain" in the name, so that people find it when searching in the docs).

Some anecdotal data: In my code, I always only needed the retain_mut() aspect of drain_filter() (often they were using retain() before), I never had use cases where I needed to process the drained values. I think this will also the most common use case for others in the future (since retain() doesn't pass &mut to its closure so that drain_filter() has to cover that use case, too, and it's a more common use case than needing to process the drained values).

Boscop commented Feb 25, 2018

Ah yes, but I think the "price" of inverting the closures in current code that uses drain_filter() is worth it, to get a consistent and intuitive API in std and then in stable.
It's only a small fixed cost (and eased by the fact that it would go along with a renaming of the function, so the compiler error could tell the user that the closure has to be inverted, so it wouldn't silently introduce a bug), compared to the cost of standardizing drain_filter() and then people always having to invert the closure when migrating from retain() to drain_filter().. (on top of the mental cost of remembering to do that, and the costs of making it harder to find it in the docs, coming from retain() and searching for "something like retain() but passing &mut to its closure, which is why I think it makes sense that the new name of this function has "retain" in the name, so that people find it when searching in the docs).

Some anecdotal data: In my code, I always only needed the retain_mut() aspect of drain_filter() (often they were using retain() before), I never had use cases where I needed to process the drained values. I think this will also the most common use case for others in the future (since retain() doesn't pass &mut to its closure so that drain_filter() has to cover that use case, too, and it's a more common use case than needing to process the drained values).

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 25, 2018

The reason why I'm agains drain_retain is because of the way names are currently used in std wrt. collections:

  1. you have function names using predicates which have producing/consuming concepts associated with them (wrt. rust, iterations). For example drain, collect, fold, all, take, ...
  2. this predicates have sometimes modifiers e.g. *_where, *_while
  3. you have function names using predicates which have modifying properties (map, filter, skip, ...)
    • here it's vague if it is element or iteration modifying (map vs. filter/skip)
  4. function names chaining multiple predicates using modifying properties e.g. filter_map
    • having a concept of roughly apply modifier_1 and then apply modifier_2, just that it's faster or more flexible doing this in one step

You sometimes might have:

  1. function names combining producing/consuming predicates with modifying ones (e.g. drain_filter)
    • but most times it is better/less confusing to combine them with modifiers (e.g. drain_where)

You normally do not have:

  1. two of the produceing/consuming predicates combined into one name, i.e. we do not have thinks like take_collect as it is easily confusing

drain_retain does kinda make sense but falls in the last category, while you probably can guess what it does it basically says remove and return all elements "somehow specified" and then keep all elements "somehow specified" discarding other elements.


One the other hand I don't know why there should not be a retain_mut maybe opening a quick RFC introducing retain_mut as a efficient way to combine modify + retain I have a hunch it might be faster
stabilized then this function. Until then you could consider writing a extension trait providing
you own retain_mut using iter_mut + a bool-array (or bitarray, or...) to keep track of which elements
have to be reomved. Or providing your own drain_retain which internally uses drain_filer/drain_where
but wraps the predicate into a not |ele| !predicate(ele).

rustonaut commented Feb 25, 2018

The reason why I'm agains drain_retain is because of the way names are currently used in std wrt. collections:

  1. you have function names using predicates which have producing/consuming concepts associated with them (wrt. rust, iterations). For example drain, collect, fold, all, take, ...
  2. this predicates have sometimes modifiers e.g. *_where, *_while
  3. you have function names using predicates which have modifying properties (map, filter, skip, ...)
    • here it's vague if it is element or iteration modifying (map vs. filter/skip)
  4. function names chaining multiple predicates using modifying properties e.g. filter_map
    • having a concept of roughly apply modifier_1 and then apply modifier_2, just that it's faster or more flexible doing this in one step

You sometimes might have:

  1. function names combining producing/consuming predicates with modifying ones (e.g. drain_filter)
    • but most times it is better/less confusing to combine them with modifiers (e.g. drain_where)

You normally do not have:

  1. two of the produceing/consuming predicates combined into one name, i.e. we do not have thinks like take_collect as it is easily confusing

drain_retain does kinda make sense but falls in the last category, while you probably can guess what it does it basically says remove and return all elements "somehow specified" and then keep all elements "somehow specified" discarding other elements.


One the other hand I don't know why there should not be a retain_mut maybe opening a quick RFC introducing retain_mut as a efficient way to combine modify + retain I have a hunch it might be faster
stabilized then this function. Until then you could consider writing a extension trait providing
you own retain_mut using iter_mut + a bool-array (or bitarray, or...) to keep track of which elements
have to be reomved. Or providing your own drain_retain which internally uses drain_filer/drain_where
but wraps the predicate into a not |ele| !predicate(ele).

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 25, 2018

@dathinab

  1. We are talking about a method on collections here, not on Iterator. map, filter, filter_map, skip, take_while etc are all methods on Iterator. Btw, which methods do you mean that use *_where?
    So we have to compare the naming scheme to methods that already exist on collections, e.g. retain(), drain(). There is no confusion with Iterator methods which transform one iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std because drain_filter() will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain() to drain_filter() being very common, so it should have a similar name and API (closure returning true means keep the entry)..

Boscop commented Feb 25, 2018

@dathinab

  1. We are talking about a method on collections here, not on Iterator. map, filter, filter_map, skip, take_while etc are all methods on Iterator. Btw, which methods do you mean that use *_where?
    So we have to compare the naming scheme to methods that already exist on collections, e.g. retain(), drain(). There is no confusion with Iterator methods which transform one iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std because drain_filter() will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain() to drain_filter() being very common, so it should have a similar name and API (closure returning true means keep the entry)..
@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 26, 2018

rustonaut commented Feb 26, 2018

@RalfJung

This comment has been minimized.

Show comment
Hide comment
@RalfJung

RalfJung Feb 26, 2018

Member

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().

FWIW, I think retain is the counter-intuitive name here. I usually find myself wanting to delete certain elements from a vector, and with retain I always have to invert that logic.

Member

RalfJung commented Feb 26, 2018

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().

FWIW, I think retain is the counter-intuitive name here. I usually find myself wanting to delete certain elements from a vector, and with retain I always have to invert that logic.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 27, 2018

But retain() is already in stable, so we have to live with it.. And so my intuition got used to that..

Boscop commented Feb 27, 2018

But retain() is already in stable, so we have to live with it.. And so my intuition got used to that..

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Feb 28, 2018

@Boscop: and so is drain which is the inverse of retain but also returns the removed elements and the usage of suffixes like _until,_while for making functions available which are just a slightly modified version of a existing functionality.

I mean if I would describe drain it would be something like:

remove and return all elements specified "in some way", keep all other elements
where "in some way" is "by slicing" for all sliceable collection types and "all" for the rest.

The description for the function discussed here is the same except that
"in some way" is "where a given predicate returns true".

One the other hand the description I would give retain is:
only retain (i.e. keep) elements where a given predicate returns true, discard the rest

(yes, retain could have been used in a way where it does not discard the rest, sadly it wasn't)


I do think that it would have been really nice if retain would have
passed &mut T to the predicate and maybe returned the removed values.
Because I think retain is a more intuitive name base.

But independent of this I also think that both drain_filter/drain_retain are suboptimal
as they do not make it clear if the predicate has to return true/false to keep/drain a entry.
(drain indicates true does remove it as it speaks about removing elements while filter
and retrain speaks about which elements to keep, at last in rust)


In the end it's not that important which of the names is used, it would be just nice if it gets stabilized.

Doing a poll and/or letting someone from the language team decide might be the best way to move thinks forward?

rustonaut commented Feb 28, 2018

@Boscop: and so is drain which is the inverse of retain but also returns the removed elements and the usage of suffixes like _until,_while for making functions available which are just a slightly modified version of a existing functionality.

I mean if I would describe drain it would be something like:

remove and return all elements specified "in some way", keep all other elements
where "in some way" is "by slicing" for all sliceable collection types and "all" for the rest.

The description for the function discussed here is the same except that
"in some way" is "where a given predicate returns true".

One the other hand the description I would give retain is:
only retain (i.e. keep) elements where a given predicate returns true, discard the rest

(yes, retain could have been used in a way where it does not discard the rest, sadly it wasn't)


I do think that it would have been really nice if retain would have
passed &mut T to the predicate and maybe returned the removed values.
Because I think retain is a more intuitive name base.

But independent of this I also think that both drain_filter/drain_retain are suboptimal
as they do not make it clear if the predicate has to return true/false to keep/drain a entry.
(drain indicates true does remove it as it speaks about removing elements while filter
and retrain speaks about which elements to keep, at last in rust)


In the end it's not that important which of the names is used, it would be just nice if it gets stabilized.

Doing a poll and/or letting someone from the language team decide might be the best way to move thinks forward?

@tmccombs

This comment has been minimized.

Show comment
Hide comment
@tmccombs

tmccombs Feb 28, 2018

Contributor

I think something like drain_where, drain_if, or drain_when, is much clearer than drain_filter.

Contributor

tmccombs commented Feb 28, 2018

I think something like drain_where, drain_if, or drain_when, is much clearer than drain_filter.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Feb 28, 2018

@tmccombs Out of those 3, I think drain_where makes the most sense. (Because if implies either do the whole thing (in this case draining) or not and when is temporal.)
Compared to drain_filter the closure return value is the same with drain_where (true to remove an element) but that fact is made clearer/explicit by the name, so it eliminates the risk of accidentally interpreting the meaning of the closure return value wrongly.

Boscop commented Feb 28, 2018

@tmccombs Out of those 3, I think drain_where makes the most sense. (Because if implies either do the whole thing (in this case draining) or not and when is temporal.)
Compared to drain_filter the closure return value is the same with drain_where (true to remove an element) but that fact is made clearer/explicit by the name, so it eliminates the risk of accidentally interpreting the meaning of the closure return value wrongly.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Mar 17, 2018

Contributor

I think it’s more than time to stabilize. Summary of this thread:

  • Should a R: RangeArgument parameter be added?
  • Should the boolean value be inverted? (I think the current logic makes sense: returning true from the callback causes that item to be included in the iterator.)
  • Naming. (I like drain_where.)

@Gankro, what do you think?

Contributor

SimonSapin commented Mar 17, 2018

I think it’s more than time to stabilize. Summary of this thread:

  • Should a R: RangeArgument parameter be added?
  • Should the boolean value be inverted? (I think the current logic makes sense: returning true from the callback causes that item to be included in the iterator.)
  • Naming. (I like drain_where.)

@Gankro, what do you think?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Mar 28, 2018

Contributor

The libs team discussed this and the consensus was to not stabilize more drain-like methods at the moment. (The existing drain_filter method can stay in Nightly as unstable.) rust-lang/rfcs#2369 proposes adding another drain-like iterator that doesn’t do anything when dropped (as opposed to consuming the iterator to its end).

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

  • A sub-range (through RangeArgument a.k.a. RangeBounds) v.s. the entire collection (though the latter could be achieved by passing .., a value of type RangeFull).
  • Draining everything (possibly within that range) v.s. only elements that match a boolean predicate
  • Self-exhausting on drop v.s. not (leaving the rest of the elements in the collection).

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

One constraint is that the drain method is stable. It can possibly be generalized, but only in backward-compatible ways.

Contributor

SimonSapin commented Mar 28, 2018

The libs team discussed this and the consensus was to not stabilize more drain-like methods at the moment. (The existing drain_filter method can stay in Nightly as unstable.) rust-lang/rfcs#2369 proposes adding another drain-like iterator that doesn’t do anything when dropped (as opposed to consuming the iterator to its end).

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

  • A sub-range (through RangeArgument a.k.a. RangeBounds) v.s. the entire collection (though the latter could be achieved by passing .., a value of type RangeFull).
  • Draining everything (possibly within that range) v.s. only elements that match a boolean predicate
  • Self-exhausting on drop v.s. not (leaving the rest of the elements in the collection).

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

One constraint is that the drain method is stable. It can possibly be generalized, but only in backward-compatible ways.

@shepmaster

This comment has been minimized.

Show comment
Hide comment
@shepmaster

shepmaster Apr 22, 2018

Member

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

How and where does the team foresee this type of experimentation happening?

Member

shepmaster commented Apr 22, 2018

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

How and where does the team foresee this type of experimentation happening?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Apr 22, 2018

Contributor

How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr and Vec::set_len). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.

Contributor

SimonSapin commented Apr 22, 2018

How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr and Vec::set_len). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.

@Emerentius

This comment has been minimized.

Show comment
Hide comment
@Emerentius

Emerentius Apr 25, 2018

Contributor

I've been playing around with this for a bit. I'll open a thread on internals in the next days.

Contributor

Emerentius commented Apr 25, 2018

I've been playing around with this for a bit. I'll open a thread on internals in the next days.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Apr 25, 2018

I think a general API that works like this makes sense:

    v.drain(a..b).where(pred)

So it's a builder-style API: If .where(pred) is not appended, it will drain the whole range unconditionally.
This covers the capabilities of the current .drain(a..b) method as well as .drain_filter(pred).

If the name drain can't be used because it's already in use, it should be a similar name like drain_iter.

The where method shouldn't be named *_filter to avoid confusion with filtering the resulting iterator, especially when where and filter are used in combination like this:

    v.drain(..).where(pred1).filter(pred2)

Here, it will use pred1 to decide what will be drained (and passed on in the iterator) and pred2 is used to filter the resulting iterator.
Any elements that pred1 returns true for but pred2 returns false for will still get drained from v but won't get yielded by this combined iterator.

What do you think about this kind of builder-style API approach?

Boscop commented Apr 25, 2018

I think a general API that works like this makes sense:

    v.drain(a..b).where(pred)

So it's a builder-style API: If .where(pred) is not appended, it will drain the whole range unconditionally.
This covers the capabilities of the current .drain(a..b) method as well as .drain_filter(pred).

If the name drain can't be used because it's already in use, it should be a similar name like drain_iter.

The where method shouldn't be named *_filter to avoid confusion with filtering the resulting iterator, especially when where and filter are used in combination like this:

    v.drain(..).where(pred1).filter(pred2)

Here, it will use pred1 to decide what will be drained (and passed on in the iterator) and pred2 is used to filter the resulting iterator.
Any elements that pred1 returns true for but pred2 returns false for will still get drained from v but won't get yielded by this combined iterator.

What do you think about this kind of builder-style API approach?

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Apr 25, 2018

For a second I forgot that where can't be used as function name because it's already a keyword :/

And drain is already stabilized so the name can't be used either..

Then I think the second best overall option is to keep the current drain and rename drain_filter to drain_where, to avoid the confusion with .drain(..).filter().

(As jonhoo said above: )

what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

Boscop commented Apr 25, 2018

For a second I forgot that where can't be used as function name because it's already a keyword :/

And drain is already stabilized so the name can't be used either..

Then I think the second best overall option is to keep the current drain and rename drain_filter to drain_where, to avoid the confusion with .drain(..).filter().

(As jonhoo said above: )

what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@Emerentius

This comment has been minimized.

Show comment
Hide comment
@Emerentius

Emerentius May 3, 2018

Contributor

I've opened a thread on internals.
The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter sooner rather than later with a RangeBounds parameter. Unless someone has a good idea for solving the issues outlined there.

Edit: I've uploaded my experimental code: drain experiments
There are also drain and clearing benches and some tests but don't expect clean code.

Contributor

Emerentius commented May 3, 2018

I've opened a thread on internals.
The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter sooner rather than later with a RangeBounds parameter. Unless someone has a good idea for solving the issues outlined there.

Edit: I've uploaded my experimental code: drain experiments
There are also drain and clearing benches and some tests but don't expect clean code.

@Popog

This comment has been minimized.

Show comment
Hide comment
@Popog

Popog May 11, 2018

Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator. View it here.

Popog commented May 11, 2018

Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator. View it here.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop May 11, 2018

@Emerentius but then we should at least rename drain_filter to drain_where, to indicate that the closure has to return true to remove the element!

Boscop commented May 11, 2018

@Emerentius but then we should at least rename drain_filter to drain_where, to indicate that the closure has to return true to remove the element!

@Emerentius

This comment has been minimized.

Show comment
Hide comment
@Emerentius

Emerentius May 12, 2018

Contributor

@Boscop Both imply the same 'polarity' of true => yield. I personally don't care whether it's called drain_filter or drain_where.

@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).

Contributor

Emerentius commented May 12, 2018

@Boscop Both imply the same 'polarity' of true => yield. I personally don't care whether it's called drain_filter or drain_where.

@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).

@askeksa

This comment has been minimized.

Show comment
Hide comment
@askeksa

askeksa May 26, 2018

How about drain_or_retain? It's a grammatically meaningful action, and it signals that it does one or the other.

askeksa commented May 26, 2018

How about drain_or_retain? It's a grammatically meaningful action, and it signals that it does one or the other.

@Boscop

This comment has been minimized.

Show comment
Hide comment
@Boscop

Boscop Jun 2, 2018

@askeksa But that doesn't make it clear whether returning true from the closure means "drain" or "retain".
I think with a name like drain_where, it's very clear that returning true drains it, and it should be clear to everyone that the elements that aren't drained are retained.

Boscop commented Jun 2, 2018

@askeksa But that doesn't make it clear whether returning true from the closure means "drain" or "retain".
I think with a name like drain_where, it's very clear that returning true drains it, and it should be clear to everyone that the elements that aren't drained are retained.

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 2, 2018

It would be nice if there was some way to limit/stop/cancel/abort the drain. For example, if I wanted to drain the first N even numbers, it would be nice to be able to just do vec.drain_filter(|x| *x % 2 == 0).take(N).collect() (or some variant of that).

As it's currently implemented, DrainFilter's drop method will always run the drain to completion; it can't be aborted (at least I haven't figured out any trick that would do that).

mjbshaw commented Jun 2, 2018

It would be nice if there was some way to limit/stop/cancel/abort the drain. For example, if I wanted to drain the first N even numbers, it would be nice to be able to just do vec.drain_filter(|x| *x % 2 == 0).take(N).collect() (or some variant of that).

As it's currently implemented, DrainFilter's drop method will always run the drain to completion; it can't be aborted (at least I haven't figured out any trick that would do that).

@Gankro

This comment has been minimized.

Show comment
Hide comment
@Gankro

Gankro Jun 4, 2018

Contributor

If you want that behaviour you should just close over some state that tracks how many you've seen and start returning false. Running to completion on drop is necessary to make adaptors behave reasonably.

Contributor

Gankro commented Jun 4, 2018

If you want that behaviour you should just close over some state that tracks how many you've seen and start returning false. Running to completion on drop is necessary to make adaptors behave reasonably.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Jun 12, 2018

I just noticed that the way drain_filter is currently implemented is not unwind safe but
actually a safety hazard wrt. unwind + resume safety. Additionally it easily causes an abord, both
of which are behaviours a method in std really shouldn't have. And while writing this I noticed
that it's current implementation is unsafe

I know that Vec is by default not unwind safe, but the behaviour of drain_filer when the
predicate panics is well surprising because:

  1. it will continue calling the closure which paniced when drop
    if the closure panics again this will cause an aboard and while some people
    like all panics to be aboard other work with error-kernel patterns and for them
    ending up with a aboard is quite bad
  2. if will not correctly continue the draining potentially one value
    and containing one value already dropped potentially leading to use after free

An example of this behaviour is here:
play.rust-lang.org

rustonaut commented Jun 12, 2018

I just noticed that the way drain_filter is currently implemented is not unwind safe but
actually a safety hazard wrt. unwind + resume safety. Additionally it easily causes an abord, both
of which are behaviours a method in std really shouldn't have. And while writing this I noticed
that it's current implementation is unsafe

I know that Vec is by default not unwind safe, but the behaviour of drain_filer when the
predicate panics is well surprising because:

  1. it will continue calling the closure which paniced when drop
    if the closure panics again this will cause an aboard and while some people
    like all panics to be aboard other work with error-kernel patterns and for them
    ending up with a aboard is quite bad
  2. if will not correctly continue the draining potentially one value
    and containing one value already dropped potentially leading to use after free

An example of this behaviour is here:
play.rust-lang.org

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Jun 12, 2018

While the 2. point should be solvable I think the first point on itself should
lead to an reconsideration of the behaviour of DrainFilter to run to completation
on drop, reasons for changing this include:

  • iterators are lazy in rust, executing an iterator when dropping is kinda unexpected behaviour
    deriving from what is normally expected
  • the predicate passed to drain_filter might panic under some circumstances (e.g. a lock
    got poisoned) in which case it's likely-ish to panic again when called during drop leading
    to an double panic and therefore aboard, which is quite bad for anyone using error kernel
    patterns or at last wanting to shut down in a controlled way, it's fine if you use panic=aboard anyway
  • if you have side effects in the predicate and don't run DrainFilter to completion you might get
    surprising bugs when it is then run to completion when dropped (but you might have done
    other thinks between draining it to a point and it being dropped)
  • you can not opt-out of this behaviour without modifying the predicate passed to it, which you
    might not be able to do without wrapping it, on the other hand you can always opt-in to run
    it to completion by just running the iterator to completion (yes this last argument is a bit
    handwavey)

Arguments for running to completion include:

  • drain_filter is similar to ratain which is a function, so people might be surprised when they
    "just" drop DrainFilter instead of running it to completion
    • this argument was countered many times in other RFC's and is why #[unused_must_use]
      exist's, which in some situations already recommend to use .for_each(drop) which ironically
      happens to be what DrainFilter does on drop
  • drain_filter is often used for it's side effect only, so it's to verbose
    • using it that way makes it rougly equal to retain
      • but retain use &T, drain_filter used &mut T
  • others??
  • [EDIT, ADDED LATER, THX @tmccombs ]: not completing on drop can be very confusing when combined with adapters like find, all, any which I quite a good reason to keep the current behaviour.

It might be just me or I missed some point but changing the Drop behaviour and
adding #[unused_must_use] seems to be preferable?

If .for_each(drop) is to long we might instead consider to add an RFC for iterators meant for
there side effect adding a method like complete() to the iterator (or well drain() but this
is a complete different discussion)

rustonaut commented Jun 12, 2018

While the 2. point should be solvable I think the first point on itself should
lead to an reconsideration of the behaviour of DrainFilter to run to completation
on drop, reasons for changing this include:

  • iterators are lazy in rust, executing an iterator when dropping is kinda unexpected behaviour
    deriving from what is normally expected
  • the predicate passed to drain_filter might panic under some circumstances (e.g. a lock
    got poisoned) in which case it's likely-ish to panic again when called during drop leading
    to an double panic and therefore aboard, which is quite bad for anyone using error kernel
    patterns or at last wanting to shut down in a controlled way, it's fine if you use panic=aboard anyway
  • if you have side effects in the predicate and don't run DrainFilter to completion you might get
    surprising bugs when it is then run to completion when dropped (but you might have done
    other thinks between draining it to a point and it being dropped)
  • you can not opt-out of this behaviour without modifying the predicate passed to it, which you
    might not be able to do without wrapping it, on the other hand you can always opt-in to run
    it to completion by just running the iterator to completion (yes this last argument is a bit
    handwavey)

Arguments for running to completion include:

  • drain_filter is similar to ratain which is a function, so people might be surprised when they
    "just" drop DrainFilter instead of running it to completion
    • this argument was countered many times in other RFC's and is why #[unused_must_use]
      exist's, which in some situations already recommend to use .for_each(drop) which ironically
      happens to be what DrainFilter does on drop
  • drain_filter is often used for it's side effect only, so it's to verbose
    • using it that way makes it rougly equal to retain
      • but retain use &T, drain_filter used &mut T
  • others??
  • [EDIT, ADDED LATER, THX @tmccombs ]: not completing on drop can be very confusing when combined with adapters like find, all, any which I quite a good reason to keep the current behaviour.

It might be just me or I missed some point but changing the Drop behaviour and
adding #[unused_must_use] seems to be preferable?

If .for_each(drop) is to long we might instead consider to add an RFC for iterators meant for
there side effect adding a method like complete() to the iterator (or well drain() but this
is a complete different discussion)

@tmccombs

This comment has been minimized.

Show comment
Hide comment
@tmccombs

tmccombs Jun 12, 2018

Contributor

others??

I can't find the original reasoning, but I remember there was also some problem with adapters working with a DrainFilter that doesn't run to completion.

See also #43244 (comment)

Contributor

tmccombs commented Jun 12, 2018

others??

I can't find the original reasoning, but I remember there was also some problem with adapters working with a DrainFilter that doesn't run to completion.

See also #43244 (comment)

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Jun 12, 2018

Good point, e.g. find would cause drain to drain just until it hit's the first
match, similar all, any do short circuit, which can be quite confusing
wrt. drain.

Hm, maybe I should change my opinion. Through this might be a general problem
with iterators having side-effects and maybe we should consider a general solution
(independent of this tracking issue) like a .allways_complete() adapter.

rustonaut commented Jun 12, 2018

Good point, e.g. find would cause drain to drain just until it hit's the first
match, similar all, any do short circuit, which can be quite confusing
wrt. drain.

Hm, maybe I should change my opinion. Through this might be a general problem
with iterators having side-effects and maybe we should consider a general solution
(independent of this tracking issue) like a .allways_complete() adapter.

@Emerentius

This comment has been minimized.

Show comment
Hide comment
@Emerentius

Emerentius Jun 12, 2018

Contributor

I have personally not found any safety reason why drain needs to run to completion but as I've written here a couple posts above, the side-effects on next() interact in a suboptimal way with adapters such as take_while, peekable and skip_while.

This also brings up the same issues as my RFC on non-selfexhausting drain and its companion selfexhausting iter adapter RFC.

It's true that drain_filter can easily cause aborts but can you show an example of where it violates safety?

Contributor

Emerentius commented Jun 12, 2018

I have personally not found any safety reason why drain needs to run to completion but as I've written here a couple posts above, the side-effects on next() interact in a suboptimal way with adapters such as take_while, peekable and skip_while.

This also brings up the same issues as my RFC on non-selfexhausting drain and its companion selfexhausting iter adapter RFC.

It's true that drain_filter can easily cause aborts but can you show an example of where it violates safety?

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Jun 12, 2018

Yup, I already did: play.rust-lang.org

Which is this:

#![feature(drain_filter)]

use std::panic::catch_unwind;

struct PrintOnDrop {
    id: u8
}

impl Drop for PrintOnDrop {
    fn drop(&mut self) {
        println!("dropped: {}", self.id)
    }
}

fn main() {
    println!("-- start --");
    let _ = catch_unwind(move || {
        let mut a: Vec<_> = [0, 1, 4, 5, 6].iter()
            .map(|&id| PrintOnDrop { id })
            .collect::<Vec<_>>();
        
        let drain = a.drain_filter(|dc| {
            if dc.id == 4 { panic!("let's say a unwrap went wrong"); }
            dc.id < 4
        });
        
        drain.for_each(::std::mem::drop);
    });
    println!("-- end --");
    //output:
    // -- start --
    // dropped: 0    <-\
    // dropped: 1       \_ this is a double drop
    // dropped: 0  _  <-/
    // dropped: 5   \------ here 4 got leaked (kind fine)  
    // dropped: 6
    // -- end --
    
}

But that's an implementation internal think, which went wrong.
Basically the open question is how to handle the panic of an predicate function:

  1. skip the element it panicked on, leak it and increase the del counter
    • requires some form of panic detection
  2. do not advance idx before calling the predicate
    • but this means on drop will call it again with the same predicate

Another question is if it's a good idea to run functions which can be seen as api user input on drop
in general, but then this is the only way not to make find, any, etc. behave confusing.

Maybe a consideration could be something like:

  1. set a flag when entering next, unset it before returning from next
  2. on drop if the flag is still set we know we paniced and hence leak
    the remaining items OR drop all remaining items
    1. can be quite a big leak with unexpected side effects if you e.g. leak an Arc
    2. can be very surprising if you have Arc and Weak's

Maybe there is an better solution.
Through whichever it is it should be documented in rustdoc once implemented.

rustonaut commented Jun 12, 2018

Yup, I already did: play.rust-lang.org

Which is this:

#![feature(drain_filter)]

use std::panic::catch_unwind;

struct PrintOnDrop {
    id: u8
}

impl Drop for PrintOnDrop {
    fn drop(&mut self) {
        println!("dropped: {}", self.id)
    }
}

fn main() {
    println!("-- start --");
    let _ = catch_unwind(move || {
        let mut a: Vec<_> = [0, 1, 4, 5, 6].iter()
            .map(|&id| PrintOnDrop { id })
            .collect::<Vec<_>>();
        
        let drain = a.drain_filter(|dc| {
            if dc.id == 4 { panic!("let's say a unwrap went wrong"); }
            dc.id < 4
        });
        
        drain.for_each(::std::mem::drop);
    });
    println!("-- end --");
    //output:
    // -- start --
    // dropped: 0    <-\
    // dropped: 1       \_ this is a double drop
    // dropped: 0  _  <-/
    // dropped: 5   \------ here 4 got leaked (kind fine)  
    // dropped: 6
    // -- end --
    
}

But that's an implementation internal think, which went wrong.
Basically the open question is how to handle the panic of an predicate function:

  1. skip the element it panicked on, leak it and increase the del counter
    • requires some form of panic detection
  2. do not advance idx before calling the predicate
    • but this means on drop will call it again with the same predicate

Another question is if it's a good idea to run functions which can be seen as api user input on drop
in general, but then this is the only way not to make find, any, etc. behave confusing.

Maybe a consideration could be something like:

  1. set a flag when entering next, unset it before returning from next
  2. on drop if the flag is still set we know we paniced and hence leak
    the remaining items OR drop all remaining items
    1. can be quite a big leak with unexpected side effects if you e.g. leak an Arc
    2. can be very surprising if you have Arc and Weak's

Maybe there is an better solution.
Through whichever it is it should be documented in rustdoc once implemented.

@RalfJung

This comment has been minimized.

Show comment
Hide comment
@RalfJung

RalfJung Jun 13, 2018

Member

@dathinab

Yup, I already did

Leaking is undesirable but fine and may be hard to avoid here, but a double-drop is definitely not. Good catch! Would you like to report a separate issue about this safety problem?

Member

RalfJung commented Jun 13, 2018

@dathinab

Yup, I already did

Leaking is undesirable but fine and may be hard to avoid here, but a double-drop is definitely not. Good catch! Would you like to report a separate issue about this safety problem?

@vityafx

This comment has been minimized.

Show comment
Hide comment
@vityafx

vityafx Aug 10, 2018

Does drain_filter do reallocations every time it removes an item from collection? Or it does reallocate only once and works like std::remove and std::erase (in pair) in C++? I'd prefer such behavior because of exactly one allocation: we simply put our elements to the end of collection and then removes shrink it to proper size.

Also, why there is no try_drain_filter ? Which returns Option type, and None value if we should stop? I have a very big collection and it is meaningless to continue for me when I have already got what I needed.

vityafx commented Aug 10, 2018

Does drain_filter do reallocations every time it removes an item from collection? Or it does reallocate only once and works like std::remove and std::erase (in pair) in C++? I'd prefer such behavior because of exactly one allocation: we simply put our elements to the end of collection and then removes shrink it to proper size.

Also, why there is no try_drain_filter ? Which returns Option type, and None value if we should stop? I have a very big collection and it is meaningless to continue for me when I have already got what I needed.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Aug 10, 2018

rustonaut commented Aug 10, 2018

@vityafx

This comment has been minimized.

Show comment
Hide comment
@vityafx

vityafx Aug 10, 2018

@rustonaut thanks. What is your opinion about try_drain_filter? :)

P.S. Just looked at the code too, it looks as it works the way we wanted.

vityafx commented Aug 10, 2018

@rustonaut thanks. What is your opinion about try_drain_filter? :)

P.S. Just looked at the code too, it looks as it works the way we wanted.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Aug 10, 2018

rustonaut commented Aug 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment