Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for feature(iter_advance_by) #77404

Open
1 of 3 tasks
timvermeulen opened this issue Oct 1, 2020 · 36 comments
Open
1 of 3 tasks

Tracking Issue for feature(iter_advance_by) #77404

timvermeulen opened this issue Oct 1, 2020 · 36 comments
Labels
A-iterators Area: Iterators C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@timvermeulen
Copy link
Contributor

timvermeulen commented Oct 1, 2020

This is a tracking issue for the methods Iterator::advance_by and DoubleEndedIterator::advance_back_by.
The feature gate for the issue is #![feature(iter_advance_by)].


Previously the recommendation was to implement nth and nth_back on your iterators to efficiently advance them by multiple elements at once (useful for .skip(n) and .step_by(n)). After this feature is stabilized the recommendation will/should be to implement advance_by and advance_back_by instead, because they compose better and are often simpler to implement.

Iterators in libcore that wrap another iterator (possibly from elsewhere than libcore) will need to keep their nth and nth_back implementations for the foreseeable future and perhaps indefinitely, because the inner iterator may have an efficient nth implementation without implementing advance_by as well.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Implementation history

@timvermeulen timvermeulen added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Oct 1, 2020
@timvermeulen
Copy link
Contributor Author

I’ve already written implementations of these methods for most iterators in core::iter, I’ll submit PRs for them once I have enough tests.

@jonas-schievink jonas-schievink added A-iterators Area: Iterators T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 1, 2020
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 6, 2020
…tmcm

Implement advance_by, advance_back_by for iter::Chain

Part of rust-lang#77404.

This PR does two things:
- implement `Chain::advance[_back]_by` in terms of `advance[_back]_by` on `self.a` and `advance[_back]_by` on `self.b`
- change `Chain::nth[_back]` to use `advance[_back]_by` on `self.a` and `nth[_back]` on `self.b`

This ensures that `Chain::nth` can take advantage of an efficient `nth` implementation on the second iterator, in case it doesn't implement `advance_by`.

cc `@scottmcm` in case you want to review this
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 7, 2020
…=<try>

Implement advance_by, advance_back_by for slice::{Iter, IterMut}

Part of rust-lang#77404.

Would be nice if we can get away with getting rid of `nth[_back]` altogether, but if not, we can keep `nth` and `advance_by` alongside each other.

Also see rust-lang#76909 (comment).

cc `@ecstatic-morse` `@scottmcm`
@GuillaumeGomez
Copy link
Member

The fact that advance_by returns a Result isn't great in my opinion. In case it moves after the end of the iterator, it's not an error, just that it ran after the end. It would make more sense to return Some(how_much_after_the_end) instead of Err(how_much_after_the_end).

@m-ou-se
Copy link
Member

m-ou-se commented Oct 23, 2020

Previously the recommendation was to implement nth and nth_back on your iterators to efficiently advance them by multiple elements at once (useful for .skip(n) and .step_by(n)). After this feature is stabilized the recommendation will/should be to implement advance_by and advance_back_by instead, because they compose better and are often simpler to implement.

That means that every exising Iterator implementation out there that already has an efficient nth() will now automatically get a default slow advance_by() added, implemented as a next() loop, right? I don't think it can be expected that all these implementations will be updated overnight the moment this hits stable, or even get updated at all. Generic code will then have to make a choice between 1) being efficient/simple with std's Iterators but potentially very slow with other Iterators (advance_by) and 2) being efficient with all iterators but with a slightly more confusing function (nth). :(

I like advance_by, but I feel like this problem is not something to ignore.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 23, 2020

The fact that advance_by returns a Result isn't great in my opinion. In case it moves after the end of the iterator, it's not an error, just that it ran after the end. It would make more sense to return Some(how_much_after_the_end) instead of Err(how_much_after_the_end).

Returning an Option also isn't great. None in the context of an Iterator usually means the end is reached, so using None here for not reaching the end can also get confusing.

I'm wondering about the downside of having to return the 'missing length' from advance_by. I can imagine there are data structures that do not store their length, but do have an upper bound on their length. (E.g. a fixed size buffer containing a C-style 0-terminated string.) Those can just return None directly from nth(n) if n is >= the maximum length. advance_by(n) would still require counting the elements to return the right Err(i), even if the caller is probably just going to use .is_ok(). Having these two similar functions (nth and advance_by) where you can't know in general which one is the more efficient one would be a shame.

What are the advantages of returning Err(missing_length) as opposed to None or false?

@kevincox
Copy link
Contributor

Since advance_by returns the number of items you can't implement in terms of .nth() which was the previous recommended way to skip elements. (one exception is that you could use nth for cases where size_hint().1 > 0 but this doesn't solve the recursion problem discussed below) I think that returning the number of elements is a useful feature so I wouldn't want to avoid it. This is also something that .nth() lacks today.

However even if it didn't return the length it isn't clear how the default would be implemented. You can't have the defaults of advance_by and nth refer to each other because then if neither was implemented you would have infinite recursion. And if we implement advance_by(n) as if n > 0 { self.nth(n-1) } then for maximum performance you would need to implement both advance_by and nth. I think it is best to avoid this legacy.

So while I admit that it is unfortunate that switching callers from nth to advance_by can bypass an optimized nth until the iterator implementation is updated I think that this is the best solution available.

@timvermeulen
Copy link
Contributor Author

@m-ou-se

That means that every exising Iterator implementation out there that already has an efficient nth() will now automatically get a default slow advance_by() added, implemented as a next() loop, right? I don't think it can be expected that all these implementations will be updated overnight the moment this hits stable, or even get updated at all.

You're absolutely right, which is why we're not going to change any uses of nth in libcore to advance_by (on potentially user-defined iterator types), even if it would clean up the code. It's an unfortunate side-effect of the fact that nth was here first.

What are the advantages of returning Err(missing_length) as opposed to None or false?

The main problem that advance_by solves is that it composes well in ways that nth doesn't for iterators like Chain. For example, in order to efficiently compute (0..10).chain(10..20).chain(20..30).nth(25), it's crucial that the first inner iterator of Chain doesn't just indicate that it doesn't have an nth element (or that it isn't able to advance by n), but that it also somehow indicates how much progress was made in order to know which value of n to pass to the second iterator. nth isn't able to solve this problem on its own.

Having these two similar functions (nth and advance_by) where you can't know in general which one is the more efficient one would be a shame.

I don't really have a satisfying answer to this... There are indeed iterators out there for which nth can be faster than advance_by for certain inputs, although I do believe that these are relatively rare. The best way to handle this might just be to only use advance_by if either the iterator type is known to have an equally efficient advance_by implementation, or you need the Err(n) value for something. An example of that is how the current implementation of Chain::nth calls advance_by on the first iterator (because it needs the error value) and nth on the second.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 23, 2020

I also don't have any solutions here. :( Would've been nice if advance_by was there first, but now we're stuck with these unfortunate problems.

Anyway, let's make sure these concerns are discussed properly before stabilization. Can you add them to the 'steps' above? (The return type, the inefficient default implementation for existing 'random access' iterators, and having to calculate the number of elements even when that's unused in many cases.)

@WaffleLapkin
Copy link
Member

You can't have the defaults of advance_by and nth refer to each other because then if neither was implemented you would have infinite recursion.

@kevincox It's unfortunate, but it can be solved by allowing "a minimum complete implementation to be specified mutually for recursive default methods" (see e.g.: rust-lang/rfcs#628). It's a really neat and useful feature, but I'm not sure if lang-team want to spend it's time budget on this currently (It doesn't seem like there were much progress since 2015...).

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Nov 6, 2020
@the8472
Copy link
Member

the8472 commented Apr 2, 2021

pub fn advance_by(&mut self, n: usize) -> Result<(), usize>

This method will eagerly skip n elements by calling next up to n times until None is encountered.

Is that really the intended specification? Some adapters and sources could implement advance_by far more efficiently than by calling next(). E.g. vec.iter().map().advance_by() could just ignore what happens in map and advance the slice iterator directly.

Wouldn't it be better to explicitly allow implementations to skip side-effects?

Having an advance_by_back without side-effects guarantees would help the impl DoubleEndedItator for Zip code by being able to bring the tails of two iterators with different lengths into alignment more efficiently.

@sollyucko
Copy link
Contributor

E.g. vec.iter().map().advance_by() could just ignore what happens in map and advance the slice iterator directly.

If the function passed to map doesn't have any side effects, monomorphization + inlining + dead code elimination (DCE) should take care of it. If it does have side effects, it would be very confusing if specifically this method and nothing else were to skip the potentially-important side effect.

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

If the function passed to map doesn't have any side effects, monomorphization + inlining + dead code elimination (DCE) should take care of it.

Optimizations are far from perfect, we have a lot of code that aims to make things more efficient. And I gave a simple example, in practice iterator pipelines are often more complex and can run into inlining thresholds which defeat any further optimizations.

If it does have side effects, it would be very confusing if specifically this method and nothing else were to skip the potentially-important side effect.

There always has to be a first, so that's not a strong argument. advance_by basically is a seeking method for the explicit purpose of skipping further ahead into an iterator. So it's the ideal place for not doing unnecessary work.

There's prior art for that, the java stream API explicitly doesn't guarantee that side-effects for many of its operations happen at all or happen in any particular order. That enables this kind of optimization.

Of course that behavior should be noted in its documentation, that's why I quoted and asked about the specification.

@sollyucko
Copy link
Contributor

Perhaps it could be useful to have a version of Iterator, map (this is what makes the most sense to me), or advance_by, or a method on Iterator similar to Java's ordered and parallel, that allows skipping or duplicating side-effects, where that behavior is useful, while retaining the current behavior for code that doesn't opt in?

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

Since advance_by is a new method I am suggesting making its use that opt-in.

Having separate opt-in methods would be a lot trickier in rust because Java has a private default implementation which examines and optimizes the whole stream pipeline while in Rust the adapters operate more independently and talk less to each other. Propagating behavior backwards without introducing new fields in each adapter that would hold such configuration requires new methods to be implemented that have different behavior.

Opting out of an optimization is a lot easier than opting into it for the whole pipeline. Opt-in requires a more efficient method to be implemented for each adapter. Opting out only needs to add 1 adapter that doesn't have the optimization and uses the naive approach instead.

@kevincox
Copy link
Contributor

kevincox commented Apr 3, 2021

It is important to note that the compilers definition of side effects is very different than a programmers may be. For example you may use .map() to map ids to objects by looking them up from the filesystem, a database or a network request. No matter how good the compiler is it can't realistically eliminate this work. It would be very helpful to avoid these calls when the result isn't necessary.

@sollyucko
Copy link
Contributor

sollyucko commented Apr 3, 2021

Having separate opt-in methods would be a lot trickier in rust because Java has a private default implementation which examines and optimizes the whole stream pipeline while in Rust the adapters operate more independently and talk less to each other. Propagating behavior backwards without introducing new fields in each adapter that would hold such configuration requires new methods to be implemented that have different behavior.

Opting out of an optimization is a lot easier than opting into it for the whole pipeline. Opt-in requires a more efficient method to be implemented for each adapter. Opting out only needs to add 1 adapter that doesn't have the optimization and uses the naive approach instead.

Hmm, that's true... But what about map_assume_no_side_effects (bikeshed) only affecting its mapping function? It would also have the advantage of guaranteeing that if you use the normal version of map, the side effects always happen sequentially, even if you allow calling code to use the iterator.

Unless there are any cases where you would want to sometimes skip side effects and sometimes execute them, with the same iterator?

Also, in terms of what other methods would need multiple versions:

Yes/maybe:

  • map would need it
  • take_while might need it, although it might need to always evaluate all of them to avoid skipping over the end
    • It could potentially be useful to have a version that assumes the predicate always returns false once the end is reached
  • map_while is similar to take_while
  • inspect could have both versions
  • intersperse (clone) and intersperse_with
  • cloned

No:

  • flat_map wouldn't know how many elements to skip, and it would have to delegate to the sub-iterators either way
  • filter has the same problem
  • filter_map has the same problem as flat_map and filter
  • skip_while probably would need to evaluate all of them until the first returned element, at which point it just delegates
  • scan would need to evaluate all of them

Nothing else accepts a function and returns an iterator.

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

There also are intersperse (we could skip a lot of .clone() calls there) and intersperse_with.

So duplicating those would require several adapter structs and copy-paste (or macros) rather than just implementing advance_by on existing adapters. And of course we would have to repeat that exercise for any new adapters that take functions.

@sollyucko
Copy link
Contributor

Hmm, that's true... I guess you could instead make wrapper structs, but there would still be a lot of boilerplate, although probably per-method rather than per-implementor... Also, for map/etc. callers that want to opt out, that could also be added, by delegating everything except advance_by, which would use the default instead...

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

  • filter has the same problem
  • skip_while probably would need to evaluate all of them until the first returned element, at which point it just delegates

Actually there are hypothetical optimizations that could skip those. E.g. if you take an iterator from a sorted collection such as BTreeSet and do .filter(...).min() then it can bail out as soon as it finds the first filtered element without evaluating the rest because it knows that the source is sorted and the min elements come first.

The same can be done with DoubleEndedIterator on a sorted collection and .max().

Also, for map/etc. callers that want to opt out, that could also be added, by delegating everything except advance_by, which would use the default instead...

Indeed, that's what I said earlier #77404 (comment)

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

For an opt-out we could do .map(...).has_sideffects() which indicates that the previous adapter has side-effects that must be evaluated as if the iterator were driven to completion by a dumb while let Some(_) = iter.next() {} loop without any smartness in the other adapters (other than short-circuiting guaranteed by those adapters). It may also disable some optimizations in preceding or following adapters other than the immediately wrapped one but is not guaranteed to do so.

@the8472
Copy link
Member

the8472 commented Apr 14, 2021

Another adapter that could benefit from skipping side-effects is .cloned().

@the8472
Copy link
Member

the8472 commented May 18, 2021

step_by could also benefit from advance_by and skip side-effects of items that it is stepping over. That would also make it more efficient to implement TrustedRandomAccess on StepBy since a naive implementation would naturally skip all side-effects and that's currently not justifiable.

Perhaps we could also reduce potential for unexpected behavior by specializing advance_by based on Iterator::Item – e.g. whether it's a mutable reference, which is more likely I used for its side-effects, such as manipulating collection contents via iter_mut()) – or on the Fn vs. FnMut distinction, for adapters that take functions.
This wouldn't totally exclude the possibility of optimizations eliminating intentional side-effects since the user could still use static variables or interior mutability but it would at least make it much less likely.

But I am not entirely sure if such specializations are possible and permissible under min_specialization.

@kevincox
Copy link
Contributor

That sounds very surprising and I would recommend against it. It changes the behaviour depending on the type. Furthermore Rust doesn't know which types are logically mutable. &std::sync::Mutext<T> is just as mutable as &mut T to the user.

@the8472
Copy link
Member

the8472 commented May 18, 2021

Well, eliminating side-effects of closures may also seem surprising, at least when you're explicitly asking for a mutating iterator.

Currently there's a lot of untapped optimization opportunity in iterators. But backwards compatibility makes it difficult to change behavior. Java got this right from the start with its Stream API which explicitly says side-effects may be elided.

I think advance_by would provide a starting point to have an operation that explicitly skips side-effects. Newly added adapters could then use it for optimizations unconditionally.
But that leaves existing adapters. Those either can't exploit advance_by at all or they would have to do so based on some heuristic for iterators that are less likely to evaluated for their side-effects. That's what I proposed above.

@sollyucko
Copy link
Contributor

Another possibility could be to have some sort of FnPure trait for closures with no side effects, and a wrapper AssumePure that can be used when the compiler is unable to automatically detect that. This would allow having a uniform interface for adapters accepting closures, rather than having to come up with a modified name for each, and possibly allowing specialization of only the methods that skip over items, rather than having to re-implement or delegate each.

@the8472
Copy link
Member

the8472 commented May 18, 2021

That would be a much bigger change since it would require compiler changes and people might want to rely on pure functions for other reasons and then maybe ask for AssumePure to be unsafe because it would violate something they want to rely on in unsafe code etc. etc.

The other problem is that what the compiler considers side-effects may not be the same what the developer thinks are relevant side-effects. E.g. we might want to make different tradeoffs for inspect and map.

@the8472
Copy link
Member

the8472 commented Jul 12, 2021

I'm currently working on implementing advance_by on more iterators and making advance_by(0) do meaningful work such as bringing a Take, Zip or Flatten into a state where next()/next_back() don't have to unspool the inner iterator anymore to perform their first step. That in turn could lead to advance_by(0) possibly returning Err(0) instead of the currently guaranteed Ok(). That would be useful as a hint that the following call to next() would return None.

I'm also testing whether advance_by(..., skip_sideeffects: EffectTypes) or something like that can be added to the signature which would enable an implementation of advance_by on Map<I, F> which would then, depending on the passed skip_sideeffects, either use the default implementation or forward to the underlying iterator without invoking F.
But even if that can be made to work it would still only be a building block. The question how to use it in places where it would improve performance without breaking compatibility would still remain. So it may not be all that useful by itself.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 31, 2021
…tmcm

Implement advance_by, advance_back_by for slice::{Iter, IterMut}

Part of rust-lang#77404.

Picking up where rust-lang#77633 was closed.

I have addressed rust-lang#77633 (comment) by restoring `nth` and `nth_back`. So according to that comment this should already be r=m-ou-se, but it has been sitting for a while.
@Ten0
Copy link

Ten0 commented Aug 20, 2021

Wouldn't the default implementation possibly be more efficient if it were to be implemented using try_fold?

Manishearth added a commit to Manishearth/rust that referenced this issue Oct 4, 2021
…shtriplett

implement advance_(back_)_by on more iterators

Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters.

This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
Manishearth added a commit to Manishearth/rust that referenced this issue Oct 4, 2021
…shtriplett

implement advance_(back_)_by on more iterators

Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters.

This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Oct 4, 2021
…shtriplett

implement advance_(back_)_by on more iterators

Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters.

This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
@the8472
Copy link
Member

the8472 commented Oct 13, 2021

While working on another advance_by implementation I realized that I introduced an inconsistency in #87091. Now the documentation says:

advance_by(n) will return Ok(()) if the iterator successfully advances by n elements, or Err(k) if None is encountered, where k is the number of elements the iterator is advanced by before running out of elements (i.e. the length of the iterator). Note that k is always less than n.

[...] advance_by(0) may either return Ok() or Err(0). The former conveys no information whether the iterator is or is not exhausted, the latter can be treated as if next had returned None. Replacing a Err(0) with Ok is only correct for n = 0.

k is always less than n is incompatible with advance_by(0) possibly returning Err(0).

My confusion stems from the Ok/Err distinction conveying the same information as k < n already would. It encodes information redundantly. That might be useful for optimizations or to simplify control flow, e.g. by letting flags on which we would branch bubble up through a return chain. The use of ? does look elegant in the original PR. But the advantages don't seem as obvious in more complex adapters where some arithmetic is needed to calculate the correct amount, and that could overflow because the inner iterator misbehaves and currently this isn't checked properly.

Maybe a different return type would be better. usize would be sufficient to convey the same information (modulo the Err(0) case) but may be unergonomic due to the additional comparisons. (usize, bool) could work nicely with overflowing arithmetic. (usize, Advance::{ExhaustedFellShort, ExhaustedExact, MayHaveMore}) could convey additional information whether the iterator has been exhausted.

Or maybe just the documentation needs to be fixed.

@kevincox
Copy link
Contributor

I think that k < n is the right invariant. I think advance_by(0) must always succeed.

@the8472
Copy link
Member

the8472 commented Oct 13, 2021

That presupposes Result is the right return type, which I'm also uncertain about.

@kevincox
Copy link
Contributor

Right, I should have been explicit. I think it is the right return type. You have the "happy case" and the "unhappy case". At least for the caller this makes sense. However you are right that the type is broader than it needs to be. It is possible for advance_by(10) to return Err(100) which obviously doesn't make sense or Err(0) which doesn't make sense. However that is not uncommon in std APIs and isn't too possible to avoid with Rust's current type system.

I don't think ExhaustedExact is a valuable concept. If people want to know if the iterator is done they should call size_hint. I don't think we should provide this API on advance_by where there is no indication of if it is actually implemented. Let's leave that logic to the dedicated method that thought this API through.

I think we could come up with other return types but I think the Ok and Err cases are important to make obviously distinct. Just returning usize doesn't really make that easy to do as you either have to store the expected amount to compare or we make it count down to zero but now you need to remember to compare to zero (I guess we could mark it as must_use which makes it not terrible, but the code isn't very self explaining in my opinion).

If fact thinking about it I may consider changing the return value to the be amount not returned AKA the amount remaining. Basically a fallback advance_by would look like:

fn advance_by(&mut self, mut count: usize) -> Result<(), usize> {
  while count > 0 {
    if self.next().is_none() {
      return Err(count)
    }
    count -= 1;
  }
  Ok(())
}

Then you can easily convert the return value to remaining by advance_by(n).err().unwrap_or(0) if you want the raw count. Err(0) would always be an invalid return as that should be Ok(0).

Another option would be to make a custom try type where 0 was considered "Ok" and non-zero was considered "Err" and provides Into<usize> but I'm not sure how much more familiar this would be.

@the8472
Copy link
Member

the8472 commented Oct 13, 2021

I don't think ExhaustedExact is a valuable concept. If people want to know if the iterator is done they should call size_hint.

It would be mostly useful for minor optimization to skip a followup operation, such as the next() nth or to drop the front iterator in Flatten. We can obviously already do that in the Err case because we know the iterator is exhausted. But in the Ok case we don't know that. (usize, bool) could convey those pieces orthogonally.

Granted, the value of that distinction is minor, all it costs to notice the exhaustion is another call to next or advance_by.

Just returning usize doesn't really make that easy to do as you either have to store the expected amount to compare

Many of the implementations need to do that anyway.

https://github.com/rust-lang/rust/pull/87091/files#diff-c0d520d60171cd367fdbe3ad1387b13cd2be83597f623d4209f8ca4fc1394f56R398

@the8472
Copy link
Member

the8472 commented Oct 15, 2021

#89916 removes the inconsistency from the docs and updates the implementations to always return Ok(()) in that case.

@the8472
Copy link
Member

the8472 commented Dec 26, 2021

For discussion: #92284 which changes the return type to usize. The change does simplify a few things.

@kkharji
Copy link

kkharji commented Apr 28, 2022

What is blocking this feature? advance_by is super useful.

@jonas-schievink
Copy link
Member

This method is one of only 4 in the vtable of dyn Iterator (alongside next, size_hint and nth). Are you sure that incurring this codegen bloat for every Rust program that uses dyn Iterator is worth it, or should there be a where Self: Sized bound on this method?

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jul 25, 2022
…ochenkov

Expose size_hint() for TokenStream's iterator

The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator:

https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/proc_macro/src/lib.rs#L363-L371

so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction:

https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/alloc/src/vec/into_iter.rs#L170-L177

I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation.

Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (rust-lang#77404) and I noticed this comment at the top of libproc_macro:

https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/proc_macro/src/lib.rs#L20-L22
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
Expose size_hint() for TokenStream's iterator

The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator:

https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/proc_macro/src/lib.rs#L363-L371

so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction:

https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/alloc/src/vec/into_iter.rs#L170-L177

I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation.

Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (rust-lang/rust#77404) and I noticed this comment at the top of libproc_macro:

https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/proc_macro/src/lib.rs#L20-L22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests