Skip to content

Conversation

@krtab
Copy link
Contributor

@krtab krtab commented Oct 20, 2022

Hi!

I'm aware that I'm not exactly following the normal order for adding things to the library but I wanted to try the contributing workflow and it was pretty quick to add a minimal example of what I'd like to do.

The idea is that many iterator adapters own the underlying iterator. In my use-case, the iterator wrapped in a Peekable was xmlparser::Tokenizer which has other methods I may want to access to. I also wanted my struct (a parser for a XML-based language) to own the underlying tokenizer, and was hence missing inner().

To be more general, I think (and propose to implement) that all relevant iterator adapters get inner() and into_inner() methods.

PS: I've tried to see if it was discussed before but didn't find any reference. Let me know if I missed something.

ACP: rust-lang/libs-team#128

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 20, 2022
@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2022
@joshtriplett
Copy link
Member

This seems reasonable to me; the example of having a type that implements Iterator and also has other methods makes the most sense. I can imagine ways to confuse an iterator adapter by accessing the underlying iterator when not expected, but inner here returns a non-mutable reference, and into_inner consumes the wrapper, so either way that doesn't seem likely. With that in mind, happy to r+ this once you create a tracking issue.

(Also, I don't think reason = "wip" fits; I think you want reason = "recently added".)

@krtab
Copy link
Contributor Author

krtab commented Oct 20, 2022

Hi josh!

thanks for replying so quickly. I have created a tracking issue but there are still points that need discussion API wise. For example, in the zulip discussion, the following question surrounding Peekable was raised by @mejrs

I'm not sure whether it makes sense for Peekable to have this. It is very much about peeking not advancing the iterator, but it does advance the inner iterator. if you peeked once and then inner'd it, the returned iterator would not return the first item.

We can discuss it wherever is best between here, the tracking issue (#103302) or zulip.

@the8472
Copy link
Member

the8472 commented Oct 20, 2022

At least for Zip and possibly other adapters this is unsound because TrustedRandomAccess specializations bring the iterator into a state that is not consistent unless used exclusively through the methods whitelisted in the TrustedRandomAccess documentation.

It would also preclude some future optimizations that would similarly assume that internals are not observable once an iterator has been wrapped into an adapter.

@the8472
Copy link
Member

the8472 commented Oct 20, 2022

In my use-case, the iterator wrapped in a Peekable was xmlparser::Tokenizer which has other methods I may want to access to.

To achieve that you can do things like

let token_iter = ...;
let borrowed_iter = token_iter.by_ref().filter(...);
// do something with borrowed_iter here
drop(borrowed_iter);
// token_iter becomes usable again

by_ref uses the impl<I> Iterator for &mut I where I: Iterator implementation which disables all those assumptions about exclusive access and thus doesn't suffer from the problem.

@scottmcm
Copy link
Member

Given that this is a "new thing that every iterator adapter should think about supporting", I think it would be appropriate to have an ACP about it: https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html. That's a better place to have a motivation discussion than a PR.

I do think it's quite reasonable for the simple 1:1 adapters -- like for Copied the implications on the inner iterator from actions on the outer iterator are in a sense "obvious".

For a bunch of them, though, this being an infallible API is more of a commitment than you might be thinking. For example, Take<I> is currently (usize, I) internally, but that's not really fundamental. It could also be Option<(NonZeroUsize, I)> internally, for example, so that it drops the wrapped iterator once it'll no longer take anything from it. And that would mean that into_inner(self) -> I could no longer be implemented.

Even for Skip, where taking out the inner thing is comparatively reasonable because after the first call it doesn't do anything, the behaviour isn't necessarily obvious. It would be very easy for someone to write it = it.skip(10).into_inner(); and expect that to be equivalent to it.advance_by(10), but it's very much not. (As an aside, maybe we should stabilize advance_by one of these days?)

So for me, 👍 to the obvious ones, but I'm less certain about all of them.

@the8472
Copy link
Member

the8472 commented Oct 20, 2022

I do think it's quite reasonable for the simple 1:1 adapters -- like for Copied the implications on the inner iterator from actions on the outer iterator are in a sense "obvious".

Those might also be the least interesting to expose though? E.g. the cited motivation (xmlparser::Tokenizer) doesn't return reference items, so one couldn't even use copied() on it.

I have two questions:

a) is by_ref sufficient?
b) if not, would an iter.unspoolable().adapter().adapter().adapter().unspool() be a better choice? That way we can unpack through several layers even if unpacking some intermediate components makes less sense and we can treat it similar to by_ref() by not implementing optimization traits that require exclusive access, knowing that someone will be able to access the inner state.

@krtab
Copy link
Contributor Author

krtab commented Oct 25, 2022

Thanks a lot for your numerous comments

@the8472

At least for Zip and possibly other adapters this is unsound because TrustedRandomAccess specializations bring the iterator into a state that is not consistent unless used exclusively through the methods whitelisted in the TrustedRandomAccess documentation.

TIL, I would indeed need to dive into TrustedRandomAccess. Do you have a pointer to any doc/blog post about it by any chance?

It would also preclude some future optimizations that would similarly assume that internals are not observable once an iterator has been wrapped into an adapter.

Definitely. I guess this motivates further discussion? I'm not completely aware of what the decision process is for these things.

To achieve that you can do things like
[...]
by_ref uses the impl<I> Iterator for &mut I where I: Iterator implementation which disables all those assumptions about exclusive access and thus doesn't suffer from the problem.

I could but the API would have been better served by not doing it in that case (because I want to hide the Tokenizer from the end user as an implementation detail). I know this is a pattern I've bumped into in the past (wrapping a iterator adapter in a struct) and I would guess I'm not the only one.

Moreover, I would still need an inner like method to acces the &mut Tokenizer wouldn't I?

@scottmcm

Given that this is a "new thing that every iterator adapter should think about supporting", I think it would be appropriate to have an ACP about it: https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html. That's a better place to have a motivation discussion than a PR.

I'll definitely write something then.

For a bunch of them, though, this being an infallible API is more of a commitment than you might be thinking.
[...]

You are 100% right that this is a commitment I hadn't forseen. I guess it is to be discussed in the ACP?

@the8472

a) is by_ref sufficient?

No, cf. above. I would like Tokenizer to be an implementation detail.

b) if not, would an iter.unspoolable().adapter().adapter().adapter().unspool() be a better choice? That way we can unpack through several layers even if unpacking some intermediate components makes less sense and we can treat it similar to by_ref() by not implementing optimization traits that require exclusive access, knowing that someone will be able to access the inner state.

I don't understand what "unspool" means.

Thanks again to the both of you for your precious feedback!

@krtab
Copy link
Contributor Author

krtab commented Oct 28, 2022

I have opened the APC following your advice:
rust-lang/libs-team#128

@the8472
Copy link
Member

the8472 commented Oct 28, 2022

TIL, I would indeed need to dive into TrustedRandomAccess. Do you have a pointer to any doc/blog post about it by any chance?

https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/core/iter/adapters/zip/trait.TrustedRandomAccess.html

I don't understand what "unspool" means.

csm_SGL-SIGRAFIL-Carbon-Endlosfasern_ba3dff8b5e

What I was suggesting is to introduce a new adapter Unspoolable and then each cooperating adapter that wraps it would add an unspool() method (via a trait impl) that'd punch through several adapter layers until you get the Unspoolable (or its inner iterator) back. That way we wouldn't have to worry about the API for the intermediate layers and we'd have a place that would opt out of all the optimization traits.

We already have an internal unsafe trait that does almost that, but it's for a slightly different purpose.

The explicit struct could be optional. Instead it could be implemented for &mut Iterator and unspool() would just unpack to the nearest &mut Iterator.

@Dylan-DPC Dylan-DPC added S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 17, 2023
@krtab
Copy link
Contributor Author

krtab commented Jul 17, 2023

Closing per rust-lang/libs-team#128 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants