Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::intersperse #75784

Closed
wants to merge 8 commits into from
Closed

Add Iterator::intersperse #75784

wants to merge 8 commits into from

Conversation

jonas-schievink
Copy link
Contributor

@jonas-schievink jonas-schievink commented Aug 21, 2020

This is occasionally useful when wanting to separate an iterator's items.

Implementation adapted from itertools.

Drawback: Breaks all code currently using itertools' intersperse method since the method name collides.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 21, 2020
Copy link
Contributor

@JulianKnodt JulianKnodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit of intersperse over something like:

let range = 0..100;
let a = repeat(3);
range.zip(a).flat_map(|(i, j)| {
    once(i).chain(once(j))
});

The only difference would be that there would be an extra element at the end in this case.

Edit: Sorry for the late edit, but I realize the fold portion of this could also be represented as
.fold_first(|acc, next| f(acc, interspersed_element, next)). You can also create an iterator by taking the first element off and prefixing the rest with the repeated one, which handles the case where you need an interspersed iterator. I think that this iterator can be represented by other more primitive iterators, but if it is a common use case I think it could be useful.

library/core/src/iter/adapters/intersperse.rs Show resolved Hide resolved
library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved
@pickfire
Copy link
Contributor

@jonas-schievink
Copy link
Contributor Author

Is there any benefit of intersperse over something like:

let range = 0..100;
let a = repeat(3);
range.zip(a).flat_map(|(i, j)| {
    once(i).chain(once(j))
});

The only difference would be that there would be an extra element at the end in this case.

Avoiding that last element is exactly why you'd use intersperse.

@jonas-schievink Why not add intercalate while we are at this? https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-List.html#v:intercalate

Never heard of that one. It should probably be in its own PR.

@Lonami
Copy link
Contributor

Lonami commented Aug 22, 2020

It should probably be in its own PR.

How far do we want to go with these methods? I'd vote for "probably less commonly used ones" to live in itertools.

@pickfire
Copy link
Contributor

At least this is generating a better assembly than a manual hand rolled iterator and other methods, weird that I can only find this method the only way to join without panicking.

use std::iter::{once, repeat};

pub fn t1() -> String {
    let mut iter = ["hello", "world"].iter().copied();
    let first = iter.next();
    let rest = repeat(" ")
        .zip(iter)
        .flat_map(|(x, y)| once(x).chain(once(y)));
    first.iter().copied().chain(rest).collect()
}

https://rust.godbolt.org/z/hK5jdc cc @lzutao did I did anything wrongly?

Some(self.element.clone())
}
}

Copy link
Contributor

@pickfire pickfire Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a version of try_fold?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hey, I wrote this! Specializing try_fold is probably not necessary. Iterator::for_each is implemented in terms of Iterator::fold, not Iterator::try_fold, so specializing fold alone is sufficient to get really substantial performance improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I see that other specializations do include try_fold too. I think there may be a reason they do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several forwarding hierarchies in Iterator. fold and try_fold are separate ones.
See the (now somewhat outdated) analysis in the PR that made fold independent from try_fold: #72139 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 So it means we should implement all the important ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the custom fold implementation provides a speedup over the default one which relies on next then the other ones would likely benefit too.
Additionally the other optimization and extension traits that most iterator adapters implement such as ExactSizeIterator, TrustedLen, DoubleEndedIterator, FusedIterator, TrustedRandomAccess all seem applicable here.

Copy link
Contributor Author

@jonas-schievink jonas-schievink Sep 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an implementation of try_fold

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have removed try_fold again, but not the tests. Any reason for that?

@JulianKnodt
Copy link
Contributor

Ah in the link you sent I think the compiled section for intersperse has the -O flag set, and setting it for the hand-made one provides somewhat similar assembly: https://rust.godbolt.org/z/baaP73

@pickfire
Copy link
Contributor

pickfire commented Aug 23, 2020

@JulianKnodt Thanks, I didn't know about the rustc option can be set there. Updated https://rust.godbolt.org/z/az3r89 Looks like all of the versions perform good except there is bound check for the Join version for result.capacity() >= len.

@scottmcm
Copy link
Member

Drawback: Breaks all code currently using itertools' intersperse method since the method name collides.

I really wish we had a nice way to let people disambiguate extension method conflicts 🙁

@Mark-Simulacrum
Copy link
Member

I'm going to r? @LukasKalbertodt as a libs team member. The implementation here looks good to me.

I personally suspect that it might not be a good idea to land this given the itertools conflict, but I did have an idea for how we could alleviate that in the future at least: itertools could have #[cfg(not(rust_has_intersperse))] on its extension trait method, set in build.rs via something like https://docs.rs/autocfg/1.0.1/autocfg/ or ideally cfg accessible (though that would need the feature stabilized and #72011 fixed).

That doesn't help us now though.

@pickfire
Copy link
Contributor

I think we could itertools could use #[cfg(path = "Iterator::intersperse")].

@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Aug 31, 2020
@bors
Copy link
Contributor

bors commented Sep 3, 2020

☔ The latest upstream changes (presumably #70793) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@jswrenn jswrenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm the current maintainer of Itertools. I'm thrilled to see this PR! Intersperse is widely useful, and very deserving of being a method on Iterator. My very first PR to Itertools was actually optimizing Intersperse::fold, because chains of Intersperseed tokens are widespread in the internals of the standard proc macro libraries.

I personally suspect that it might not be a good idea to land this given the itertools conflict

I really wouldn't want Itertools to get in the way of a good addition. As an Itertools user, the fixes for these sorts of conflicts are pretty simple: you just used the fully-qualified method call syntax, or limit the scope of your import of Itertools so that your call to libcore's intersperse is unaffected. (For the uninitiated, the introduction of Iterator::flatten caused the same issues. I don't think it posed much more than a minor inconvenience.)

It's only as an Itertools maintainer that these sorts of conflicts cause me a real headache. I really don't like that our README has to discourage contributions to minimize these sorts of issues, and I don't like not merging PRs that seem so useful that I can foresee them making their way into Iterator.

On that note, our upcoming 0.10.0 release is going to include intersperse_with, a generalized form of intersperse. If Iterator is going to gain intersperse, it doesn't seem out of the question that it might want intersperse_with too. Should I cut it from the release to avoid a potential future conflict?

in the future at least: itertools could have #[cfg(not(rust_has_intersperse))] on its extension trait method, set in build.rs via something like https://docs.rs/autocfg/1.0.1/autocfg/

This seems promising(!!!), but maybe a little unpleasant for contributors. Any new, non-allocating method added to Itertools poses a potential future conflict with Itertools. We'd need to test for every such method name in our build.rs. I'd much prefer a permanent solution to this problem.

Some(self.element.clone())
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hey, I wrote this! Specializing try_fold is probably not necessary. Iterator::for_each is implemented in terms of Iterator::fold, not Iterator::try_fold, so specializing fold alone is sufficient to get really substantial performance improvements.

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,76 @@
use super::Peekable;

/// An iterator adapter that places a separator between all elements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// An iterator adapter that places a separator between all elements.
/// An iterator adapter that places a separator between all elements.
///
/// This `struct` is created by [`Iterator::intersperse`]. See its
/// documentation for more.

Document how was in created.

Comment on lines +10 to +11
separator: I::Item,
iter: Peekable<I>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe iter should comes first since it is the most important thing?

Suggested change
separator: I::Item,
iter: Peekable<I>,
iter: Peekable<I>,
separator: I::Item,

Comment on lines +50 to +55
// Use `peek()` first to avoid calling `next()` on an empty iterator.
if !self.needs_sep || self.iter.peek().is_some() {
if let Some(x) = self.iter.next() {
accum = f(accum, x);
}
}
Copy link
Contributor

@pickfire pickfire Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this result in sep, item, sep, item if next() was already called once before fold on iterator with three item?

I thought fold should always start with an item instead of a sep?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought fold should always start with an item instead of a sep?

Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold will start with the second item (which, in this case, is a separator).

Comment on lines +68 to +73
let next_is_elem = !self.needs_sep;
let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo);
let hi = match hi {
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
None => None,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better?

Suggested change
let next_is_elem = !self.needs_sep;
let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo);
let hi = match hi {
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
None => None,
};
let next_is_elem = !self.needs_sep as usize;
let lo = lo.saturating_sub(next_is_elem).saturating_add(lo);
let hi = match hi {
Some(hi) => hi.saturating_sub(next_is_elem).checked_add(hi),
None => None,
};

#[test]
fn test_intersperse() {
let xs = ["a", "", "b", "c"];
let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect();
Copy link
Contributor

@pickfire pickfire Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect();
let v: Vec<_> = xs.iter().copied().intersperse(", ").collect();

assert_eq!(text, "a, , b, c".to_string());

let ys = [0, 1, 2, 3];
let mut it = ys[..0].iter().map(|x| *x).intersperse(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut it = ys[..0].iter().map(|x| *x).intersperse(1);
let mut it = ys[..0].iter().copied().intersperse(1);

// copied is the same as .map(|&x| x)

Not sure if this works but probably it does.

assert_eq!(iter.next(), Some(", "));
assert_eq!(iter.size_hint(), (5, Some(5)));

assert_eq!([].iter().intersperse(&()).size_hint(), (0, Some(0)));
Copy link
Contributor

@pickfire pickfire Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for one without upper bound? I think this is sufficient but just wondering if we need it since someone could change None to return Some(T).

Comment on lines +3257 to +3261
let mut iter = (1..3).intersperse(0);
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));

let mut iter = (1..4).intersperse(0);
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));
Copy link
Contributor

@pickfire pickfire Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is the difference between these and the above test? Do we need these?

#[test]
fn test_fold_specialization_intersperse() {
let mut iter = (1..2).intersperse(0);
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for fold and try_fold after one item is consumed?

Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most people are in favor of adding this despite the conflict with itertools. I'm also fine with it. But recently it seems like these kinds of conflicts popped up a lot, so, as others have said, a proper lang solution would be really great.

I left a few inline comments, but nothing serious.

What worries me a bit about this API is that it encourages lots of clones and makes it easy to accidentally clone a bunch. Sure, if you pay attention, you probably can avoid expensive clones by only using Copy separators. But we all know how "if you pay attention" works out long term in large projects.

However, I don't know how to improve this :/

  • Using Copy instead of Clone as bound seems overly restrictive. Sometimes people might actually want to clone.
  • Forcing I: Iterator<Item = &Separator> could work but is also kinda awkward? And might not even work as iterators can't return references to self.
  • Using Borrow bounds and yielding Cows (owned for items, borrowed for iterator) makes the API way too complicated.

{
let mut accum = init;

// Use `peek()` first to avoid calling `next()` on an empty iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only useful for iterators that are not fused, right? Or what do we gain by not calling next() on an empty iterator?

Also, I think this would break for the following case:

// Imagine `it` is a non fused iterator that yields: `None`, `Some('a')`, `None`
it.intersperse('x').fold(String::new(), |mut s, c| { s.push(c); s })

That would result in "xa" with this implementation, but should yield "", right?

Comment on lines +50 to +55
// Use `peek()` first to avoid calling `next()` on an empty iterator.
if !self.needs_sep || self.iter.peek().is_some() {
if let Some(x) = self.iter.next() {
accum = f(accum, x);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought fold should always start with an item instead of a sep?

Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold will start with the second item (which, in this case, is a separator).

Comment on lines +70 to +73
let hi = match hi {
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
None => None,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard opinion, but this should work as well:

Suggested change
let hi = match hi {
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
None => None,
};
let hi = hi.and_then(|hi| hi.saturating_sub(next_is_elem as usize).checked_add(hi));

Some(self.element.clone())
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have removed try_fold again, but not the tests. Any reason for that?

Comment on lines +548 to +549
/// let hello = ["Hello", "World"].iter().copied().intersperse(" ").collect::<String>();
/// assert_eq!(hello, "Hello World");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example where separator is yielded more than once would be nice. And preferably with slices instead of strings.

@crlf0710 crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2020
@bors
Copy link
Contributor

bors commented Oct 14, 2020

☔ The latest upstream changes (presumably #77926) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 30, 2020
@Dylan-DPC-zz
Copy link

@jonas-schievink if you can resolve the conflict, we can get this reviewed quickly :)

@jonas-schievink
Copy link
Contributor Author

I probably won't have time to drive this further and address all the comments, so closing. In case anyone wants to finish this, feel free!

@Dylan-DPC-zz
Copy link

Thanks :)

m-ou-se added a commit to m-ou-se/rust that referenced this pull request Dec 30, 2020
Add `Iterator::intersperse`

This is a rebase of rust-lang#75784. I'm hoping to push this past the finish line!

cc `@jonas-schievink`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet