Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Open
1 of 2 tasks
Gankra opened this issue Jul 14, 2017 · 172 comments
Open
1 of 2 tasks

Tracking issue for Vec::drain_filter and LinkedList::drain_filter #43244

Gankra opened this issue Jul 14, 2017 · 172 comments
Labels
A-collections Area: std::collections. B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Jul 14, 2017

Feature gate: #![feature(drain_filter)]

This is a tracking issue for Vec::drain_filter and LinkedList::drain_filter, which can be used for random deletes using iterators.

Public API

pub mod alloc {
    pub mod vec {
        impl<T, A: Allocator> Vec<T, A> {
            pub fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<'_, T, F, A>
            where
                F: FnMut(&mut T) -> bool,
            {
            }
        }

        #[derive(Debug)]
        pub struct DrainFilter<'a, T, F, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global>
        where
            F: FnMut(&mut T) -> bool, {}

        impl<T, F, A: Allocator> Iterator for DrainFilter<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            type Item = T;
            fn next(&mut self) -> Option<T> {}
            fn size_hint(&self) -> (usize, Option<usize>) {}
        }

        impl<T, F, A: Allocator> Drop for DrainFilter<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            fn drop(&mut self) {}
        }
    }

    pub mod collections {
        pub mod linked_list {
            impl<T> LinkedList<T> {
                pub fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<'_, T, F>
                where
                    F: FnMut(&mut T) -> bool,
                {
                }
            }

            pub struct DrainFilter<'a, T: 'a, F: 'a>
            where
                F: FnMut(&mut T) -> bool, {}

            impl<T, F> Iterator for DrainFilter<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                type Item = T;
                fn next(&mut self) -> Option<T> {}
                fn size_hint(&self) -> (usize, Option<usize>) {}
            }

            impl<T, F> Drop for DrainFilter<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn drop(&mut self) {}
            }

            impl<T: fmt::Debug, F> fmt::Debug for DrainFilter<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {}
            }
        }
    }
}

Steps / History

Unresolved Questions

  • What should the method be named?
  • Should drain_filter accept a Range argument?
  • How should the iterator behave on drop/panic mid-iteration, or if leaked?
  • Missing Send+Sync impls on linked list's DrainFilter, see comment

See #43244 (comment) for a more detailed summary of open issues.

@Mark-Simulacrum Mark-Simulacrum added B-unstable Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jul 19, 2017
@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2017

Related issues:
#25477
#34265

bors added a commit that referenced this issue Aug 15, 2017
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bluss
Copy link
Member

bluss commented Sep 4, 2017

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

@rustonaut
Copy link

rustonaut commented Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

@rustonaut
Copy link

rustonaut commented Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

@jonhoo
Copy link
Contributor

jonhoo commented Sep 22, 2017

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@rustonaut
Copy link

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

@bluss
Copy link
Member

bluss commented Sep 23, 2017

.remove_if has been a prior suggestion too

@rustonaut
Copy link

rustonaut commented Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

@Gankra
Copy link
Contributor Author

Gankra commented Sep 25, 2017

There is no precedent for using _where or _if anywhere in the standard library.

@jonhoo
Copy link
Contributor

jonhoo commented Sep 25, 2017

@gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

@crlf0710
Copy link
Member

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

@thegranddesign
Copy link

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

@thegranddesign
Copy link

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

@rpjohnst
Copy link
Contributor

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@thegranddesign
Copy link

@rpjohnst so I searched here, am I missing filter somewhere?

@rpjohnst
Copy link
Contributor

Yes, it's a member of Iterator, not Vec.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 25, 2017

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

@kennytm kennytm changed the title Tracking issue for Vec::drain_filter Tracking issue for Vec::drain_filter and LinkedList::drain_filter Nov 27, 2017
@polarathene
Copy link

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

@jonhoo
Copy link
Contributor

jonhoo commented Dec 3, 2017

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

@tmccombs
Copy link
Contributor

tmccombs commented Dec 7, 2017

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

@Gankra
Copy link
Contributor Author

Gankra commented Dec 7, 2017

Yes

bors added a commit that referenced this issue Dec 9, 2017
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
bors added a commit that referenced this issue Dec 9, 2017
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
@rustonaut
Copy link

rustonaut commented Jan 3, 2022

Also tbh. I think we should lint by default when implementing Iterator on a type which is not must_use by default, but again out of scope of this issue.

More in scope is that the behavior what happens if you drop a "consume-on-drop" iterator which you wrapped inside of an map:
The "consume-on-drop" iterator gets consumed but the mapping (and side effects resulting from it) is not done. Which can be quite surprising for people used to different behavior.

Which is why all standard iterator combinators are marked as must_use just with a less specific message.

So I would argue that for the with-combinator use-case my proposal is still better then consume-on-drop, at the same time the scope has the issue of making it harder/more verbose to use such combinators.

EDIT: Also in most cases where you combine drain with an iterator you use it as an iterator not a "drop elements in range" function, in which case the lazy behavior is likely more consistent/expected then the consume-on-drop behavior.

@rustonaut
Copy link

Would it be useful to add a method to Iterator that does the equivalent of foreach(drop)?

I think in most cases where you want to use foreach(drop) it's when there are two different kinds of
functions hiding in one, like e.g. for drain it's:

  1. at once remove and drop all elements in a specific range
  2. step by step move all elements in a specific range from the collection to another place

In generally I think it's preferable to provide to separate functions for this.

Jumbling them together creates the risc of setting a bad API design precedence (like we have here) or stumbling
traps (like dropping a drain + combinator behaving unexpectedly and you not being able to stop draining, both things which normally are not a problem with iterators in rust, except for drain).

@ssomers
Copy link
Contributor

ssomers commented Jan 3, 2022

we can by now all(mostly) agree that drain is miss-designed

And yet I have not seen anyone point out what is wrong with its design.

@rustonaut
Copy link

rustonaut commented Jan 3, 2022

You mean besides:

setting a bad API design precedence

like dropping a drain + combinator behaving unexpectedly

(that point can be super confusing for less experienced programmers)

you not being able to stop draining

And it generally behaving different to more or less all other iterators in std.

Sure it's not to too bad but it's still a case where forcing two different functions into one resulted in sub-par end results.


drain doesn't really drain-on-drop, it drains instantly, consuming the contents of collection

It does drain-on-drop semantically, sure you can force the view that it instantly consumes the collection, but then
it magically gives part of the collection back as a side-effect after first removing much more then it should.
Which is as bad IMHO. In either case it's a form of "non-resource-cleanup" side effect on drop.

And if you use {v.drain(..);} (i.e. consume all the collection, instead of parts of it). Using truncate(0) would be more idiomatic I think.

@rustonaut
Copy link

rustonaut commented Jan 3, 2022

The semantics of drain are:

  • poison the collection (or at least parts of it)
  • move the element step by step out of the poisoned collection
  • when done: un-poison the collection
  • on drop: proceed to move elements out, and drop them directly,
    even if they normally wouldn't have been dropped but passed to a different
    iterator.

Sure implementation wise with the current code you could argue that it is instead moving all elements in the collection
into the iterator, then step by step moves out the elements in the drain range and on completion moves the renaming
parts back to the collection. But IMHO that is nitpicking and does neither represent the original intend behind drain nor
is it expected behavior for something which is called "drain" takes a range and returns an iterator, I would say.

@frank-king
Copy link
Contributor

frank-king commented Jan 4, 2022

Under any circumstance, I do not think the .drain and .drain_filter method should return an iterator that can cause UB if it is std::mem::forgeted.

Especially for BTreeMap and BTreeSet. Imagine there is a .drain(range) method which takes out the entries in range bound by iteration and then do range removal on drop. std::mem::forget can definitely leave the poisoned collection to cause any undefined behavior.

For consistency of APIs among variant collections, we may consider to restrict to the drain iterators not to depend on drop which can be canceled by the user, just as my precious comment #43244 (comment).

@ssomers
Copy link
Contributor

ssomers commented Jan 4, 2022

like dropping a drain + combinator behaving unexpectedly

(that point can be super confusing for less experienced programmers)

It's super confusing to me because I haven't seen any example. I know things can get hairy on drop (I've written a drain_filter), but please include / link to an example or unit test using drain.

you not being able to stop draining

Yes you can, you can give the end of the range. If you want the iteration to "interact" with the drain process, you end up with a different algorithm, basically a better version of drain_filter.

And it generally behaving different to more or less all other iterators in std.

As far as iteration is concerned, It's pretty much like into_iter(). By which I don't mean that it was a great idea to make something that is more complicated than into_iter() already is.

drain doesn't really drain-on-drop, it drains instantly, consuming the contents of collection

It does drain-on-drop semantically

Define drain-on-drop. To me, it means that the iterator needs a drop handler that alters the source collection. The contract of drain doesn't require that.

sure you can force the view that it instantly consumes the collection, but then it magically gives part of the collection back as a side-effect after first removing much more then it should.

Not sure what you mean. I take the view that drain consumes exactly what needs to be drained. Do you mean the fact that the current implementation throws away all elements, if you mem::forget the drain iterator? I don't care, that is not part of drain's contract (I hope).

And if you use {v.drain(..);}

then you should rather use clear().

The semantics of drain are:

You're describing the current implementation of Vec::drain or possibly others. It doesn't have to be implemented like that.

@ssomers
Copy link
Contributor

ssomers commented Jan 4, 2022

What if we make the drain_iterator only accessible via a closure scope? Then drop all the un-iterated elements after scope exited

To be clear, this is about the iterator returned by drain_filter. Does such a scope parameter exist elsewhere in libraries?

Under any circumstance, I do not think the .drain and .drain_filter method should return an iterator that can cause UB if it is std::mem::forgeted.

I think that's a given in all official Rust. There are unit tests against (single) panic in a drop of the elements, panic in predicates, using mem::forget, and they don't allow UB, or leaving behind a poisoned collection that triggers UB later. Though I wouldn't be surprised one can cook up a combination that is still UB.

@rustonaut
Copy link

rustonaut commented Jan 5, 2022 via email

@rustonaut
Copy link

Yes you can [..] given range

That is deciding what to drain beforhand, not stopping draining. With any normal rustish iterator I can just decide to stop iterating and drop it. Not with drain which then will magically continue iterating.

As far as iteration is concerned, It's pretty much like into_iter().

The point of drain is to explicitly not be like into_iter but to drain a pre-specified range over time.

Define drain-on-drop.

It drains the collection when dropped even if it is not iterated/consumed. If it would be lazy (like it IMHO should be) then calling drain and dropping the iterator without iterating on it at all would not drain anything as you never used the "draining" iterator.

It doesn't have to be implemented like that.

That's the point we don't need to implement drain_filter like drain and hence should implement it following the general rust iterator design, i.e. making it lazy, i.e. not continue iterating on drop but just do any necessary memory cleanup (moving remaining element to fill "gaps" from dropped elements if not already done, set the length back to what it now should be etc.) .

rather use clear().

yes

@rustonaut
Copy link

rustonaut commented Jan 6, 2022

It's super confusing to me because I haven't seen any example.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9beb242c04ec50e2ebaf2c02fd8151cd

note that it's just an ad-hoc example and you normally would write the algorithm shown there differently,
through you totally can run into this in practice, just in more complex situations.

@tanriol
Copy link
Contributor

tanriol commented Jan 6, 2022

[...] With any normal rustish iterator I can just decide to stop iterating and drop it. Not with drain which then will magically continue iterating.

Note that even for normal iterators this is about laziness, not about stopping exactly at the desired point, so it would still be confusing with some combinators ("wait, why does new drain with take_while remove an element not matching the condition?").

@rustonaut
Copy link

rustonaut commented Jan 6, 2022 via email

@ssomers
Copy link
Contributor

ssomers commented Jan 6, 2022

The point of drain is to explicitly not be like into_iter but to drain a pre-specified range over time.

That's apparently many people's expectation. My expectation is that it efficiently removes a range and somehow repurposes the elements. I would say the most explicit difference with into_iter is that there's no into or iter in the name - it's a verb, because it drains itself, it doesn't suggest it creates something that drains.

But clearly that's not how it lands. Therefore I conclude that drain is an unfortunate choice of signature. But it's stable and useful. Don't kill it, perhaps try to change it, avoid the same word in a different method like drain_filter.

It doesn't have to be implemented like that.

That's the point we don't need to implement drain_filter like drain

I think everyone agrees with that. I'm just saying that drain doesn't have to be implemented like Vec::drain is. But now I realize this is more than an implementation detail (and a mem:forget-behaviour detail): ckaran pointed out that the way it's implemented (the exact form of the current signature) keeps the collection locked as long as the iteration over the drained elements lingers on.

@ssomers
Copy link
Contributor

ssomers commented Jan 6, 2022

Wow, the doc of Vec::drain says it "Creates a draining iterator"… only to describe what it actually accomplishes later.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9beb242c04ec50e2ebaf2c02fd8151cd

I appreciate the effort, but now I'm even more lost. The surprise of the iterator returned by drain is that it acts lazily? Unless you yank its tail, it doesn't fire up the filter with its smelly side effect. What is this but an example of how laziness can surprise those used to an imperative style?

@frank-king
Copy link
Contributor

frank-king commented Jan 7, 2022

How about add a marker trait MustDrop to forbid the drain iterators to be forgotten (be put in std::mem::ManuallyDrop)?
At least for the current implementation of Vec::drain, I think the behavior is extremely surprising to me when the drain iterator is std::mem::forgeted.

To me, I expect .drain(..) iterators to behave like this:

  1. first it acts like an iterator which yield elements when .next() is called, just like .into_iter(), but does not take the ownership of the original collection away.
  2. after the iteration work is done and before the mutable reference is released, it recovers the collection with a batch remove.

In addition, to be consistent with the contract that iterators are lazy, I think we should only allow the drain iterators (including iterators created by .drain() or .drain_filter) to drain when .next() is called.

For example. If a new Rust programmer who has just learnt how to use Vec::iter(), he or she might expect the next thing to happen:

let mut foo = vec![0, 1, 2, 3, 4];
let mut iter = foo.drain(1..5);
assert_eq!(iter.next().unwrap(), 1);
drop(iter); // After `drop`, it stops draining and recovers the original collection.
assert_eq!(foo, [0, 2, 3, 4]);

Dessix added a commit to microsoft/snocat that referenced this issue Feb 23, 2022
Vec::retain has been stabilized since our original usage, and
meets our original intent more concisely than `drain_filter`.

For details on the current status of the `drain_filter` RFC,
including hang-ups on API consistency and unintuitiveness of
lazy evaluation, see its tracking issue at:
rust-lang/rust#43244

Also improves error messages and backtraces for proxy_tcp,
which was altered to remove drain_filter usage.
@Dylan-DPC
Copy link
Member

@lukesneeringer
Copy link

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

For what it's worth, I was searching using the word extract before eventually finding this.

@tigregalis
Copy link

I came across this feature (would be great to see it stabilised!) but what I actually wanted was something like drain_filter_map i.e. what filter_map is to filter.

An illustration:

// loaded_images: Record<ImageHandle, Image>
// loaded_images.get(handle) -> Option<Image>
// loading_image_handles: Vec<ImageHandle>

// if the image is loaded, remove its handle from `loading_image_handles`, and do something with the result
// otherwise leave it in (not yet loaded)

// this
    for handle in loading_image_handles
        .drain_filter(|handle| loaded_images.get(handle).is_some())
    {
        let image = loaded_images.get(&handle).unwrap();
        // to get `image`, we have to do `images.get` again, and unwrap it,
        // after we'd already previously checked
    }
// could become this
    for image in loading_image_handles
        .drain_filter_map(|handle| loaded_images.get(handle))
    {
        // `image` is available
    }

As a side note, because of the analogy of filter_map vs filter I like the drain_filter naming if there is also a drain_filter_map - otherwise, what would drain_filter_map be called?

@SOF3
Copy link
Contributor

SOF3 commented Sep 12, 2022

@tigregalis see also rust-lang/rfcs#3299

@dtolnay
Copy link
Member

dtolnay commented Oct 5, 2022

I noticed linked list's DrainFilter is missing Send and Sync impls. I believe these types should have identical autotrait impls because they do not differ in thread safety.

std::vec::DrainFilter:

Screenshot from 2022-10-05 12-20-26
https://doc.rust-lang.org/1.64.0/std/vec/struct.DrainFilter.html#synthetic-implementations

std::collections::linked_list::DrainFilter:

Screenshot from 2022-10-05 12-20-38
https://doc.rust-lang.org/1.64.0/std/collections/linked_list/struct.DrainFilter.html#synthetic-implementations

@Boscop
Copy link

Boscop commented Dec 2, 2022

I've had more and more use cases where I wish Vec::drain_filter had a range argument like Vec::drain, it would make it much more useful!

(Alternatively, the item index could also be passed to the filter closure, but that would not be consistent with similar methods.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests