Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Vec::extract_if and LinkedList::extract_if #43244

Open
2 of 3 tasks
Gankra opened this issue Jul 14, 2017 · 219 comments
Open
2 of 3 tasks

Tracking issue for Vec::extract_if and LinkedList::extract_if #43244

Gankra opened this issue Jul 14, 2017 · 219 comments
Labels
A-collections Area: std::collections. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Jul 14, 2017

Feature gate: #![feature(extract_if)] (previously drain_filter)

This is a tracking issue for Vec::extract_if and LinkedList::extract_if, which can be used for random deletes using iterators.

Public API

pub mod alloc {
    pub mod vec {
        impl<T, A: Allocator> Vec<T, A> {
            pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F, A>
            where
                F: FnMut(&mut T) -> bool,
            {
            }
        }

        #[derive(Debug)]
        pub struct ExtractIf<'a, T, F, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global>
        where
            F: FnMut(&mut T) -> bool, {}

        impl<T, F, A: Allocator> Iterator for ExtractIf<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            type Item = T;
            fn next(&mut self) -> Option<T> {}
            fn size_hint(&self) -> (usize, Option<usize>) {}
        }

        impl<T, F, A: Allocator> Drop for ExtractIf<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            fn drop(&mut self) {}
        }
    }

    pub mod collections {
        pub mod linked_list {
            impl<T> LinkedList<T> {
                pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F>
                where
                    F: FnMut(&mut T) -> bool,
                {
                }
            }

            pub struct ExtractIf<'a, T: 'a, F: 'a>
            where
                F: FnMut(&mut T) -> bool, {}

            impl<T, F> Iterator for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                type Item = T;
                fn next(&mut self) -> Option<T> {}
                fn size_hint(&self) -> (usize, Option<usize>) {}
            }

            impl<T, F> Drop for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn drop(&mut self) {}
            }

            impl<T: fmt::Debug, F> fmt::Debug for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {}
            }
        }
    }
}

Steps / History

Unresolved Questions

  • What should the method be named?
  • Should extract_if accept a Range argument?
  • Missing Send+Sync impls on linked list's ExtractIf, see comment

See #43244 (comment) for a more detailed summary of open issues.

@Mark-Simulacrum Mark-Simulacrum added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jul 19, 2017
@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2017

Related issues:
#25477
#34265

bors added a commit that referenced this issue Aug 15, 2017
Add Vec::drain_filter

This implements the API proposed in #43244.

So I spent like half a day figuring out how to implement this in some awesome super-optimized unsafe way, which had me very confident this was worth putting into the stdlib.

Then I looked at the impl for `retain`, and was like "oh dang". I compared the two and they basically ended up being the same speed. And the `retain` impl probably translates to DoubleEndedIter a lot more cleanly if we ever want that.

So now I'm not totally confident this needs to go in the stdlib, but I've got two implementations and an amazingly robust test suite, so I figured I might as well toss it over the fence for discussion.
@bluss
Copy link
Member

bluss commented Sep 4, 2017

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

@rustonaut
Copy link

rustonaut commented Sep 11, 2017

Is there any chance this will stabilize in some form in the not to far future?

@rustonaut
Copy link

rustonaut commented Sep 11, 2017

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

@jonhoo
Copy link
Contributor

jonhoo commented Sep 22, 2017

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

@rustonaut
Copy link

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

@bluss
Copy link
Member

bluss commented Sep 23, 2017

.remove_if has been a prior suggestion too

@rustonaut
Copy link

rustonaut commented Sep 23, 2017

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

@Gankra
Copy link
Contributor Author

Gankra commented Sep 25, 2017

There is no precedent for using _where or _if anywhere in the standard library.

@jonhoo
Copy link
Contributor

jonhoo commented Sep 25, 2017

@gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

@crlf0710
Copy link
Member

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

@thegranddesign
Copy link

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

@thegranddesign
Copy link

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

@rpjohnst
Copy link
Contributor

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@thegranddesign
Copy link

@rpjohnst so I searched here, am I missing filter somewhere?

@rpjohnst
Copy link
Contributor

Yes, it's a member of Iterator, not Vec.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 25, 2017

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

@kennytm kennytm changed the title Tracking issue for Vec::drain_filter Tracking issue for Vec::drain_filter and LinkedList::drain_filter Nov 27, 2017
@polarathene
Copy link

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

@jonhoo
Copy link
Contributor

jonhoo commented Dec 3, 2017

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

@tmccombs
Copy link
Contributor

tmccombs commented Dec 7, 2017

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

@Gankra
Copy link
Contributor Author

Gankra commented Dec 7, 2017

Yes

bors added a commit that referenced this issue Dec 9, 2017
Add Drop impl for linked_list::DrainFilter

This is part of #43244. See #43244 (comment)
@cybersoulK
Copy link

cybersoulK commented Aug 27, 2023

@jsen-

let mut pending_clients: Vec<Connection> = Vec::new();

 //If connection.recv_request is Some(client_params), drain (connection, client_params)
for (connection, client_params) in pending_clients.extract_mapped(|connection| connection.recv_request()) {

    //...
}

edit:
yes! the extract_mapped is what i would need ⏬

edit:
after reviewing this thread again, i have something else to say: I also use drain_filter frequently. So i believe the best would be to have 2 different methods for different purposes:

drain_if and drain_mapped.
or
extract_if and extract_mapped

@jsen-
Copy link
Contributor

jsen- commented Aug 27, 2023

If I understood correctly, your example is another great use case for extract_mapped

struct ClientParams;
struct HandleRequest<M>(std::marker::PhantomData<M>);
impl<M> HandleRequest<M> {
    pub fn recv_request(&mut self) -> Option<M> {
        None
    }
}

fn main() {
    let mut pending_clients = Vec::<HandleRequest<ClientParams>>::new();
    for (connection, client_params) in pending_clients.extract_mapped(|connection| connection.recv_request())
    {
        // do something with drained and mapped connection
    }
}

// type signatures only (as per my proposal excluding range bounds, which I didn't implement yet), for the actual impl see:
// https://github.com/jsen-/rust/blob/4ee20f29d9e656fbc7a26e297df66054c1fb535a/library/alloc/src/vec/mod.rs#L2961-L2975
// https://github.com/jsen-/rust/blob/4ee20f29d9e656fbc7a26e297df66054c1fb535a/library/alloc/src/vec/extract_mapped.rs
struct ExtractMapped<'a, T, F, R>(std::marker::PhantomData<&'a (T, F, R)>)
where
    F: FnMut(&mut T) -> Option<R>;

impl<T, F, R> Iterator for ExtractMapped<'_, T, F, R>
where
    F: FnMut(&mut T) -> Option<R>,
{
    type Item = (T, R);

    fn next(&mut self) -> Option<Self::Item> { todo!() }
}

// extension trait (will not be included in the actual implementation)
trait ExtractMappedExt<T> {
    fn extract_mapped<'a, F, Ret>(self, mapper: F) -> ExtractMapped<'a, T, F, Ret>
    where
        F: FnMut(&mut T) -> Option<Ret>;
}

impl<T> ExtractMappedExt<T> for Vec<T> {
    fn extract_mapped<'a, F, Ret>(self, _mapper: F) -> ExtractMapped<'a, T, F, Ret>
    where
        F: FnMut(&mut T) -> Option<Ret>,
    {
        ExtractMapped(std::marker::PhantomData)
    }
}

@Mikilio Mikilio mentioned this issue Sep 1, 2023
11 tasks
@avl
Copy link

avl commented Sep 3, 2023

Maybe the API-description in the first post should be changed to use extract_if-terminology?

As it is now, there's a note that the feature was previously known as drain_filter, but then the 'Public API'-section still talks about 'drain_filter'.

For newcomers to this feature, I think it would reduce confusion!

@the8472
Copy link
Member

the8472 commented Sep 14, 2023

@cybersoulK you cannot "consume" the item inside the mapping/filtering closure.

rust-lang/rfcs#3299 (comment) proposes a way to do that.

@Lambdaris
Copy link

Currently there is a bug if extract_if is combined with allocator api.

impl<T, F, A: Allocator> Iterator for ExtractIf<'_, T, F, A>
where
    F: FnMut(&mut T) -> bool,
{
    type Item = T;

    fn next(&mut self) -> Option<T> {
      // ...
                    return Some(Box::from_raw(node.as_ptr()).element);
     // ...
    }
}

It should be something like Box::from_raw_in(node.as_ptr(), &self.list.alloc) otherwise rust would deallocate it in Global.

@therealbnut
Copy link

therealbnut commented Sep 28, 2023

I hope this is the right place to discuss, but it'd be nice if this iterator did not store the predicate. Ideally I think the usage would be more like this:

let mut drain_iter = vec.extract_iter();
while let Some(next) = drain_iter.next() {
    if condition(next) {
        // Calling `extract` on the iterator is what makes it different from `Iter`.
        println!("extracted: {:?}", drain_iter.extract());
    }
}

The driving reason behind this suggestion is so that you can extract/retain from multiple Vecs at the same time, and use a predicate involving both (or any other conveniences from using an iterator):

let mut drain_iter_a = vec_a.extract_iter();
let mut drain_iter_b = vec_b.extract_iter();

while let Some((next_a, next_b)) = drain_iter_a.next().zip(drain_iter_b.next()) {
    if condition(next_a, next_b) {
        println!("extracted: {:?} {:?}", drain_iter_a.extract(), drain_iter_b.extract());
    }
}

Doing this sort of thing otherwise is unsafe or very inefficient without access to Vec's internals.

I've had multiple reasons for wanting this, but my current reason is that I want to be able to expose a slice of items from a container, and have the container also store metadata for each item (in a separate Vec).

Here's a rust playground with a very bad implementation, but perhaps enough to play with the ergonomics.

Bonus points if the method on Vec can take a RangeBounds like drain.

@therealbnut
Copy link

therealbnut commented Sep 28, 2023

I think having it work as above may also address several of the comments on #43244 (comment) - such as unwinding, the FnMut return type, how to signify what the return value means, ...

@neithernut
Copy link

One significant downside to this approach would be the obstruction of building a pipeline, e.g. with Iterator::map, Iterator::filter_map and Iterator::collect. With an iterator holding the predicate, you can simply write things like:

let res: String = vec
    .extract_if(confition)
    .map(ToString::to_string)
    .filter(|s| !s.is_empty())
    .intersperse(", ".to_string())
    .collect();

With your proposal, we;d either have to write it like it's 1999:

let mut drain_iter = vec.extract_iter();
let mut first = true;
let mut res = String::new();
while let Some(next) = drain_iter.next() {
    if condition(next) {
        let s = drain_iter.extract().to_string();
        if !s.is_empty() {
            if first {
                first = false;
                res.push_str(", ");
            }
            res.push_str(s.as_ref());
        }
    }
}

or basically hand-craft a new Iterator (basically re-implementing an extract_if taking a predicate) on top of the extraction iter;

let mut drain_iter = vec.extract_iter();
let res: String = std::iter::from_fn(|| drain_iter.next(}.and_then(|i| if condition(i) {
    Some(drain_iter.extract())
} else {
    None
}))
    .map(ToString::to_string)
    .filter(|s| !s.is_empty())
    .intersperse(", ".to_string())
    .collect();

That being said, the existence of a Vec::extract_if doesn't obstruct the Vec::extract_iter you are proposing.

@jplatte
Copy link
Contributor

jplatte commented Sep 28, 2023

Had an idea but now I think it only works for lending iterators.

Another possibility would be for extract_iter to yield Extract<T>s that Deref to T and provide an associated method to move the element out of the vector so you can do

let mut drain_iter_a = vec_a.extract_iter();
let mut drain_iter_b = vec_b.extract_iter();

while let Some((next_a, next_b)) = drain_iter_a.next().zip(drain_iter_b.next()) {
    if condition(&next_a, &next_b) {
        println!(
            "extracted: {:?} {:?}",
            Extract::extract(drain_iter_a),
            Extract::extract(drain_iter_b),
        );
    }
}

but also

let res: String = vec
    .extract_iter()
    .filter_map(|it| {
        if condition(&it) {
            Some(Extract::extract(it))
        } else {
            None
        }
    })
    .map(ToString::to_string)
    .filter(|s| !s.is_empty())
    .intersperse(", ".to_string())
    .collect();

@therealbnut
Copy link

therealbnut commented Sep 28, 2023

@neithernut

One significant downside to this approach would be the obstruction of building a pipeline

I hadn't considered that, I agree.

That being said, the existence of an Vec::extract_if doesn't obstruct the Vec::extract_iter you are proposing.

I think this would be good. We could build one on top of the other, but we'd want to check whether the optimiser could give comparable performance to having seperate implementations.

I've been messing around with the retain_mut and drain implementations and come up with this implementation (first revision) of extract_iter. It's not thoroughly tested, but I guess it may optimise to similar to retain_mut's performance (except due to losing the DELETED optimisation, and having some additional state).

@therealbnut
Copy link

Sorry for all the extra discussion here, let me know if it's better to move it to another issue.

Inspired by @jplatte's suggestion I've come up with an implementation that makes an ExtractIf which is Vec agnostic. I believe it could be one way to do what I wanted and also address @neithernut's concerns:

The summary of this is that there's 3 types:

  • pub struct ExtractIf which provides the logic of extracting.
  • pub trait ExtractingIterator which provides a few methods needed for extracting
  • pub struct VecExtractIf which provides a Vec implementation of ExtractingIterator

The trait ExtractingIterator looks like this:

pub trait ExtractingIterator: ExactSizeIterator {
    type OwnedItem;
    fn peek_mut(&mut self) -> Option<Self::Item>;
    fn extract(&mut self) -> Self::OwnedItem;
}

The implementation of ExtractIf makes use of advance_by and len to work efficiently when the iterator is dropped early.

See this gist for the details:

Granted you do not get combining multiple vectors out of the box, but I believe it's much more flexible this way, and you can construct the multi-vector way fairly simply without any complicated logic or unsafe code.

@therealbnut
Copy link

ExtractableIterator is probably a better name, although I wasn't intending for any of the names to be final.

@spikespaz
Copy link

I'm only here to say that I would prefer drain_if. It is the thing I would search for besides drain_filter or filter_drain.

@Kolsky
Copy link

Kolsky commented Oct 20, 2023

The documentation on extract_if's iterator drop behaviour mentions retaining the remaining elements, but it contradicts the drop impl, which just logically moves non-traversed elements without checking pred. Also I believe that the current name is unwarranted due to being essentially hard to find, containing if and having non-clear intention behind it either way.

@photino
Copy link

photino commented Dec 10, 2023

I would prefer drain_while, since we already have drain, skip_while, take_while, wait_while.

And the documentation also says that using this method is equivalent to the following code:

let mut i = 0;
while i < vec.len() {
    if some_predicate(&mut vec[i]) {
        let val = vec.remove(i);
        // your code here
    } else {
        i += 1;
    }
}

@tguichaoua
Copy link
Contributor

tguichaoua commented Dec 10, 2023

I would prefer drain_while, since we already have drain, skip_while, take_while, wait_while.

And the documentation also says that using this method is equivalent to the following code:

let mut i = 0;
while i < vec.len() {
    if some_predicate(&mut vec[i]) {
        let val = vec.remove(i);
        // your code here
    } else {
        i += 1;
    }
}

From take_while and skip_while documentation :

After false is returned, take_while()’s job is over,

After false is returned, skip_while()’s job is over,

The _while suffix implies it's job is over after the first time the predicate returns false.
something like this

while predicate(...) {
    // do job
}
// stop job

But as you mention the equivalent code of extract_if/drain_filter/drain_if is different: it doesn't use the predicate in the loop condition.

@photino
Copy link

photino commented Dec 10, 2023

This is a subtle difference. I prefer drain_if now.

The _while suffix implies it's job is over after the first time the predicate returns false. something like this

while predicate(...) {
    // do job
}
// stop job

But as you mention the equivalent code of extract_if/drain_filter/drain_if is different: it doesn't use the predicate in the loop condition.

@rsalmei
Copy link

rsalmei commented Dec 11, 2023

I also understand this func in terms of "drain". I think this word should appear in its name somehow.

@cybersoulK
Copy link

cybersoulK commented Dec 11, 2023

drain/extract already conveys a continuous iteration.
drain_if or extract_if
drain_mapped or extract_mapped (#43244 (comment))

@the8472
Copy link
Member

the8472 commented Dec 11, 2023

The methods were intentionally renamed because they behave differently compared to drain. Drain keeps draining if you drop the Drain struct. I.e. unlike most iterators it is not lazy.
extract_if on the other hand is lazy and only removes elements as long as the iterator gets spun.

@rsalmei
Copy link

rsalmei commented Dec 11, 2023

The methods were intentionally renamed because they behave differently compared to drain. Drain keeps draining if you drop the Drain struct. I.e. unlike most iterators it is not lazy.

extract_if on the other hand is lazy and only removes elements as long as the iterator gets spun.

I wasn't aware of that, thanks. It makes sense to be different then.

@Emilgardis
Copy link
Contributor

Emilgardis commented Dec 11, 2023

Maybe the history section should be updated to highlight the name change and the reason a bit better

@NexusXe
Copy link

NexusXe commented Mar 18, 2024

Is it possible to point to this feature when a user is trying to use the drain_filter method? This seems like it could cause some confusion.

AaronC81 added a commit to AaronC81/delta-null that referenced this issue Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests