Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ER] Iterator::zip_exact #85342

Open
leonardo-m opened this issue May 15, 2021 · 8 comments
Open

[ER] Iterator::zip_exact #85342

leonardo-m opened this issue May 15, 2021 · 8 comments
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@leonardo-m
Copy link

Perhaps it's worth having an Iterator::zip_strict as in Python 3.10 in the stdlib:

https://www.python.org/dev/peps/pep-0618/

@the8472
Copy link
Member

the8472 commented May 15, 2021

Interesting idea. It would have the benefit of being easier to optimize than the current implementation, background: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Improving.20TrustedRandomAccess.20and.20its.20Zip.20specialization

And some cases where one zips with an infinite stream could be made compatible by adding take(other.len()) if length is known. So it might even be ok to just move the optimizations over from zip to zip_strict.

I would propose calling it zip_exact, after ExactSizeIterator.

@leonardo-m leonardo-m changed the title [ER] Iterator::zip_strict ? [ER] Iterator::zip_exact May 15, 2021
@cuviper
Copy link
Member

cuviper commented May 17, 2021

Itertools::zip_eq is along these lines.

I think zip_exact sounds ok too, but in any case we still have to decide how to handle incorrect lengths.

@the8472
Copy link
Member

the8472 commented May 17, 2021

With regular Iterator the length match can only checked when reaching the end. So the check also wouldn't happen during short-circuiting terminal operations.
Other options are to require ExactSizeIterator on both, which might be a bit restrictive, or use specialization to fail eagerly on a best-effort basis.
In the 2nd case a Result could be used, otherwise panicing would be required.

@cuviper
Copy link
Member

cuviper commented May 17, 2021

Even if both are ExactSizeIterator, that's a safe trait which is allowed to be incorrect. We could use that to optimize that we'll only try to consume min(a.len(), b.len()) items, even if they have more, but either of those iterators might still come up short.

@the8472
Copy link
Member

the8472 commented May 17, 2021

There are two kinds of checks. One is for safety and the other is just to give the user feedback whether their use of zip_exact is semantically correct. Those are orthogonal.

The advantage of ExactSizeIterator would be that it could be checked at ZipExact construction time, which means it would always work even when the iterator is never consumed or only used in short-circuiting operations. The downside is that it is quite restrictive since many iterators don't implement it.
Knowing the length also allows internal iteration methods to be written as counted loops, which might optimize better.

Thinking about it, we probably don't even need ExactSizeIterator for an early check. We can look at the size_hint() and only take it into account if it is as tight as ExactSizeIterator requires without requiring a bound on that trait.

The safety checks on the other hand can be eliminated via specialization. The default implementation would still use .unwrap() but impls specialized on TrustedLen or TrustedRandomAccess can skip them.

Most importantly, this would allow elimination of all the MAY_HAVE_SIDE_EFFECT mess, which is quite fragile and has led to safety bugs in the past

let sz_a = self.a.size();
let sz_b = self.b.size();
// Adjust a, b to equal length, make sure that only the first call
// of `next_back` does this, otherwise we will break the restriction
// on calls to `self.next_back()` after calling `get_unchecked()`.
if sz_a != sz_b {
let sz_a = self.a.size();
if A::MAY_HAVE_SIDE_EFFECT && sz_a > self.len {
for _ in 0..sz_a - self.len {
self.a.next_back();
}
self.a_len = self.len;
}
let sz_b = self.b.size();
if B::MAY_HAVE_SIDE_EFFECT && sz_b > self.len {
for _ in 0..sz_b - self.len {
self.b.next_back();
}
}
}
}

} else if A::MAY_HAVE_SIDE_EFFECT && self.index < self.a_len {
let i = self.index;
self.index += 1;
self.len += 1;
// match the base implementation's potential side effects
// SAFETY: we just checked that `i` < `self.a.len()`
unsafe {
self.a.__iterator_get_unchecked(i);
}
None

@scottmcm
Copy link
Member

scottmcm commented May 18, 2021

A conversation from itertools about what the semantics of its zip_eq should be when they're not actually the same length: rust-itertools/itertools#531 (comment)


There are a bunch of different ways this could go. For example, zip_exact(a, b) could even be something like

let (a_len, b_len) = (a.len(), b.len());
assert!(a_len == b_len);
return (0..a_len).map(|_| (a.next().unwrap(), b.next().unwrap()))

@the8472
Copy link
Member

the8472 commented May 18, 2021

A conversation from itertools about what the semantics of its zip_eq should be when they're not actually the same length: rust-itertools/itertools#531 (comment)

I don't quite see the advantage of zip_lazy_eq and zip_eager_eq variants discussed there compared to a best-effort early check + additional lazy checks during iteration. At least for forward iteration, i.e. we avoid the ExactSizeIterator bound there.
Backward iteration should be always eager and thus require the bound.

That way we stay as compatible as possible with the current zip implementation which should make it easier for people to switch over if we remove brittle optimizations from zip and offer zip_exact as replacement.

Most optimizations require specialization anyway and we can also use "best effort" as cover for additional checks implemented via specialization.

There are a bunch of different ways this could go. For example, zip_exact(a, b) could even be something like

let (a_len, b_len) = (a.len(), b.len());
assert!(a_len == b_len);
return (0..a_len).map(|_| (a.next().unwrap(), b.next().unwrap()))

That would require -> impl Iterator. Current libs policy seems to be that adapters should be nameable structs. But internal iteration methods could be implemented that way.

@inquisitivecrystal
Copy link
Contributor

@rustbot label A-iterators C-feature-request T-libs

@rustbot rustbot added A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants