Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `Iterator::collect_into<E: Extend>(e: E)`? #45840

Open
stuhood opened this Issue Nov 7, 2017 · 10 comments

Comments

Projects
None yet
7 participants
@stuhood
Copy link

stuhood commented Nov 7, 2017

Hey gang.

As originally determined in https://twitter.com/kot_2010/status/927119253324619776, building a pre-size-hinted mutable Vec as the output of a loop (even for very small n) can occasionally be more than twice as fast as collecting into a Vec. As demonstrated here https://github.com/stuhood/rust-vec-bench, using extend into a pre-size-hinted Vec closes most of the gap.

Focusing on improving Iterator::size_hint implementations so that they suggest better sizes for collect would likely help reduce this gap. But there are diminishing returns to heuristics, and in many cases users should be able to give better explicit hints.

This issue initially proposed adding an Iterator::collect_with_size_hint method, but @scottmcm 's suggestion to add Iterator::collect_into<E: Extend>(e: E) -> E seems more general, and suggests a useful analogue in unzip_into<E1: Extend, E2: Extend>(e1: E1, e2: E2) -> (E1, E2).

@durka

This comment has been minimized.

Copy link
Contributor

durka commented Nov 8, 2017

The problem is that Filter et al can't return a "better" size hint because the trait contract is that the lower bound must actually be a lower bound. Maybe that should be relaxed?

Another possibility is collect could look at the upper bound and be smarter about what capacity to use? This wouldn't always be advantageous because you have no idea how many elements will be filtered out.

You can also do this -- but note that OverrideSizeHint is explicitly violating the trait contract:

type SizeHint = (usize, Option<usize>);
struct OverrideSizeHint<I>(I, SizeHint);
extension_trait! {
    <I: Iterator> SizeHintExt<I> for I {
        fn override_size_hint(self, hint: SizeHint) -> OverrideSizeHint<I> {
            OverrideSizeHint(self, hint)
        }
    }
}
impl<I: Iterator> Iterator for OverrideSizeHint<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        self.0.next()
    }

    fn size_hint(&self) -> SizeHint {
        self.1
    }
}

fn collect_hint(ratio: u8, chars: &[char]) -> Vec<char> {
    chars.iter().enumerate().filter_map(|(i, c)| {
        if filter(ratio, i) {
            Some(*c)
        } else {
            None
        }
    }).override_size_hint((chars.len(), Some(chars.len()))).collect()
}
@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Nov 10, 2017

Hmm, what about .collect_into(Vec::with_capacity(5)), or something? (Name TBD of course)

collect is the method-chain-syntax for FromIterator; this would be the equivalent for Extend.

@stuhood

This comment has been minimized.

Copy link
Author

stuhood commented Nov 27, 2017

@scottmcm : I like that alternative as well.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Nov 29, 2017

I noticed the other day that unzip has a similar problem. It always just extends with Some(x), so the size_hint seen by the extend code is always exactly 1, and thus useless for pre-allocating.

@stuhood stuhood changed the title `collect_with_size_hint`? Add `Iterator::collect_into<E: Extend>(e: E)`? Nov 29, 2017

@stuhood

This comment has been minimized.

Copy link
Author

stuhood commented Nov 29, 2017

Updated description/title to apply @scottmcm 's collect_into approach instead.

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented Dec 18, 2017

Do any Iterator impls need to override the default collect_into or would the default always be optimal?

trait Iterator {
    /* ... */

    fn collect_into<E: Extend<Self::Item>>(self, e: E) -> E {
        e.extend(self);
        e
    }
}

As I understand it, the following are always equivalent right?

let vec = whatever.into_iter().etc(...).collect_into(Vec::with_capacity(N));
let mut vec = Vec::with_capacity(N);
vec.extend(whatever.into_iter().etc(...));

Is the advantage of collect_into that the method nesting works out nicer?

@stuhood

This comment has been minimized.

Copy link
Author

stuhood commented Dec 18, 2017

Is the advantage of collect_into that the method nesting works out nicer?

Yes... and avoiding the mutable local variable is nice as well. Finally, the symmetry with collect is appealing.

@CodeSandwich

This comment has been minimized.

Copy link

CodeSandwich commented Jan 27, 2018

I love the idea of collecting into an existing collection, but for another reason: ergonomics. I've proposed something similar on Rust user and internals forums.

I think, that the function (extend_into?) should accept collections as BorrowMut to allow passing them as references. Also it could return the passed collection to allow nice chaining.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Jan 29, 2018

We might also consider partition_into and unzip_into for symmetry. Iterator::partition and unzip already use Extend, but they create new Default collections to extend into.

Also, FWIW rayon already has collect_into and unzip_into with different semantics than those proposed here, but we could consider renaming in rayon before it reaches 1.0 -- rayon-rs/rayon#519.

@CodeSandwich

This comment has been minimized.

Copy link

CodeSandwich commented Feb 16, 2018

I feel that extend_into was received quite well. I want to make a next step and propose an official RFC for it. Here's what I want to issue.

@stuhood I've included your motivation, are you OK with that?

bors added a commit that referenced this issue Aug 5, 2018

Auto merge of #53086 - scottmcm:use-upper-in-collect, r=<try>
Try to guess a smarter initial capacity in Vec::from_iter

> Another possibility is collect could look at the upper bound and be smarter about what capacity to use?  ~ #45840 (comment)

This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61.

More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead.

This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂  It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least.  And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`.

cc @ljedrz
cc #48994

bors added a commit that referenced this issue Aug 8, 2018

Auto merge of #53086 - scottmcm:use-upper-in-collect, r=<try>
Try to guess a smarter initial capacity in Vec::from_iter

> Another possibility is collect could look at the upper bound and be smarter about what capacity to use?  ~ #45840 (comment)

This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61.

More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead.

This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂  It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least.  And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`.

cc @ljedrz
cc #48994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.