Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::collect_into #2339

Closed
CodeSandwich opened this issue Feb 19, 2018 · 16 comments
Closed

Add Iterator::collect_into #2339

CodeSandwich opened this issue Feb 19, 2018 · 16 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@CodeSandwich
Copy link

Summary

Add Iterator::collect_into method, which takes a collection and extends it with iterator’s content

Motivation

Ergonomics

Sometimes there is a need to create a sophisticated iterator and then put all it's items into a collection. Of cource there is Iterator::collect, which creates a new collection, but there is no convenient method of collecting into an EXISTING one.

Sure, it's possible, but a bit awkward:

fn my_fn(&self, vec: &mut Vec) {
    let iter = self.items.iter()
        .filter(...)
        .enumerate()
        .map(...);
    vec.extend(iter);
}

or more compact, but way harder to follow:

fn my_fn(&self, vec: &mut Vec) {
    vec.extend(self.items.iter()
        .filter(...)
        .enumerate()
        .map(...)
    );
}

but nothing beats THIS:

fn my_fn(&self, vec: &mut Vec) {
    self.items.iter()
        .filter(...)
        .enumerate()
        .map(...);
        .collect_into(vec);
}

collect_into could return the extended collection:

fn my_fn(&self, vec: &mut Vec) -> usize {
    self.items.iter()
        .filter(...)
        .enumerate()
        .map(...);
        .collect_into(vec)
        .len()
}

the function could accept collections by both reference or value:

fn my_fn(&self, mut vec: Vec) -> Vec {
    self.items.iter()
        .filter(...)
        .enumerate()
        .map(...);
        .collect_into(vec)
}

Manually size-hinted collect

This use case was invented by @stuhood in Rust issue 45840.

As originally determined in https://twitter.com/kot_2010/status/927119253324619776), building a pre-size-hinted mutable Vec as the output of a loop (even for very small n) can occasionally be more than twice as fast as collecting into a Vec. As demonstrated here , using extend into a pre-size-hinted Vec closes most of the gap.

Focusing on improving Iterator::size_hint implementations so that they suggest better sizes for collect would likely help reduce this gap. But there are diminishing returns to heuristics, and in many cases users should be able to give better explicit hints.

Guide-level explanation

Iterator::collect_into takes all items from the Iterator and adds them to an existing collection. It's a counter-part for Extend::extend, but instead of being invoked on container, it's called on Iterator making it's usage similar to Iterator::collect.

let mut vec = vec![0, 4];
(3..7)
    .filter(|i| i % 2 == 0)
    .map(|i| i * 2);
    .collect_into(&mut vec);
assert_eq!(vec![0, 4, 8, 12], vec);
}

Iterator::collect_into takes collection either by mutable reference or value. It also returns the extended collection making chaining elegant.

let vec = (2..4)
    .map(|i| i * 2)
    .collect_into(vec![0, 2])
    .windows(2)
    .map(|c| c[0] + c[1])
    .collect::<Vec<_>>();
assert_eq!(vec![2, 6, 10], vec);

Reference-level explanation

No rocket science here.

pub trait Iterator {
    type Item;

    ...

    fn collect_into<B, E>(self, extended: B) -> B
    where
        B: BorrowMut<E>,
        E: Extend<Self::Item> {
        extended.borrow_mut().extend(self);
        extended
     }
}

The BorrowMut allows passing collections either by value or by reference, which makes it much more elastic.

Drawbacks

This proposition makes Iterator interface even broader without extending its functionality significantly, it's just ergonomics.

Rationale and alternatives

This proposition removes a small stdlib papercut making it more pleasant and expressive. If it's not added, people will just keep using workarounds with Extend::extend.

Unresolved questions

None, that I'm aware of.

@stuhood
Copy link

stuhood commented Feb 19, 2018

@CodeSandwich : Thanks for opening this!

An example of the usecase from rust-lang/rust#45840 would be nice to see: ie., just using collect_into as a way to give a precise size hint:

let vec = (1..100)
    .filter(|i| i % 2 == 0)
    .collect_into(Vec::with_capacity(50))

@SimonSapin
Copy link
Contributor

SimonSapin commented Feb 19, 2018

Is this the same as std::iter::Extend? (Sorry, you do mention it.)

@Centril
Copy link
Contributor

Centril commented Feb 20, 2018

We could simply name the method .extend(...). I think that is shorter and clearer.

I don't think this requires an RFC. You can just file a PR against rust-lang/rust.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 20, 2018
@leonardo-m
Copy link

Is it allowed to use collect_into on an fixed-size array?

@CodeSandwich
Copy link
Author

Arrays do not implement Extend, so no.

@Centril
Copy link
Contributor

Centril commented Feb 24, 2018

Arrays of [T; N] could implementExtend by overwriting all elements from the start and there's always ArrayVec which implements the trait.

@leonardo-m
Copy link

Often I have to fill fixed-size arrays.

Sometimes I'd like a syntax like:

let a = vec![|x| (x as u32).pow(2); 9];
let b: [u16; 9] = [|x| x as u16; 9];

@CodeSandwich
Copy link
Author

Smells like Python.
You can do that with Vec, but without fancy syntax:

let v = (0..9).map(|x| x * x).collect::<Vec<_>>();

The arrays are way more restricted (often just useless) because of safety constraints. If the closure panics half way through initialization, Rust will not be able to drop it properly. The workaround could be ArrayVec.

@CodeSandwich
Copy link
Author

CodeSandwich commented Feb 25, 2018

I've hit a usability wall during implementation. When using the method:

(3..5).extend(vec![1, 2]));

Compiler can't infer the type of the Vec. That's because it's hidden behind two layers of inferring:

B: BorrowMut<E>,
E: Extend<Self::Item>,

Rust cant' find the right combination of B and E. Theoretically there could be many, Vec could implement BorrowMut<MyCollection>, where MyCollection implements Extend. The result is that the user is forced to always define the type of E explicitly, which is really ugly and seemingly dumb:

(3..5).extend::<_, Vec<_>>(vec![1, 2]))

There are two solutions:

  1. Keep it, it's not THAT bad
  2. Split extend into extend, which takes E by value and extend_ref, which takes E by mutable reference. This removes the BorrowMut generic making inference trivial, but also limits flexibility.

@SimonSapin
Copy link
Contributor

Why is the BorrowMut indirection useful? What’s an example where you would have B != E?

@CodeSandwich
Copy link
Author

It's useful when you want to pass Extend by mutable reference, for example &mut Vec.

@Centril
Copy link
Contributor

Centril commented Feb 25, 2018

@CodeSandwich I'd say that specifying ::<_, Vec<_>> is pretty awful - any gains in ergonomics from .extend(..) are imo negated. So for resolution, I say provide .extend(..) which takes E by value and .extend_mut(..) which takes &mut E. This follows the naming convention and is terse as well as readable.

@CodeSandwich
Copy link
Author

@Centril Thank you, I split extend into extend and extend_mut.

If this change doesn't have own RFC, should I already mark it as stable? With which since version, 1.26? If I want to mark it as unstable, I have to set some issue number, I could make it 0.

@nagisa
Copy link
Member

nagisa commented Feb 25, 2018

The additions to the standard library always start off as unstable. For features that haven’t gone through the RFC process, pnce the implementation is nearing completion (the PR is submitted and mostly reviewed), a tracking issue is created and unstable annotations are pointed at the newly created tracking issue.

@Centril
Copy link
Contributor

Centril commented Feb 25, 2018

Set the issue number to 0 for now and no since attribute. Once the libs team member asks you, create a new tracking issue and make a commit to the PR and set an issue number.

@Centril
Copy link
Contributor

Centril commented Mar 3, 2018

Closing since the PR was filed.

@Centril Centril closed this as completed Mar 3, 2018
bors added a commit to rust-lang/rust that referenced this issue Mar 6, 2018
Add Iterator::collect_into

This is PR for adding `Iterator::extend` and `Iterator::extend_mut` as discussed in [RFC issue 2339](rust-lang/rfcs#2339). It was advised to bypass RFC and just implement it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

6 participants