Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove finished flag from MapWhile #68820

Merged
merged 3 commits into from
Mar 22, 2020

Conversation

WaffleLapkin
Copy link
Member

This PR removes finished flag from MapWhile as been proposed in #66577 (comment).

This also resolves open questions of the tracking issue (#68537):

  • MapWhile can't implement both
  • Debug output (this pr removes finished flag, so there is no question in including it in debug output)

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 4, 2020
@Mark-Simulacrum
Copy link
Member

I don't think we should do this. While it is true that fuse() gives you this behavior anyway, and for most iterators in practice it won't matter (almost all iterators are fused, I'd guess), I think we should be essentially the same as map_while. The fact that we feel the need to call this out in documentation is another red flag for me.

However, I am not the libs team (nor on it), so let's r? @LukasKalbertodt

I would suggest a fcp to close this, personally.

@WaffleLapkin
Copy link
Member Author

I was also afraid of this change. However, later I came with than it simplifies code by removing behaviour that isn't required (nor needed) by map_while (and also this behaviour can be easily achieved by fuse). Moreover map_while from this pr is more powerful because it can be used when you don't need to stop on first None (it's rarely needed but I think it's still a point).

I think this change won't bring a lot of cognitive problems, mostly because in 99% of places where you can use iterator (like fold, for_each, from_iter, for loop, etc) it doesn't matter if iterator is fused or not because everything stops on first None.

The addition to docs is there just to be sure that nobody will ever have problems with this.

But anyway the decision is up to the libs team. Personally I think that is approach is more correct.

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Feb 7, 2020

Actually, thinking about this more... map_while where we don't fuse on end is just map, right?

Unless I'm missing something.

Edit: No, this is wrong, thinking further.

@Mark-Simulacrum
Copy link
Member

Oh, no, so then map_while's underlying operation is the equivalent of and_then on Option, vs. map being map on Option.

I think doing this makes sense after thinking about it (much) more. However, we should drop the _while suffix and instead call this and_then, per Option's precedent. There's no longer any fusing involved, so we cannot say that it does something "while" it returns Some. The documentation should also call out fusing it as something that users may want to do.

Essentially, an Iterator is a stream of Option<T>, and this takes Option<T> -> Option<U>. The only difference between this and map is that map cannot return None, i.e., map preserves the discriminant of the Option stream, whereas this does not.

@timvermeulen
Copy link
Contributor

However, we should drop the _while suffix and instead call this and_then, per Option's precedent. There's no longer any fusing involved, so we cannot say that it does something "while" it returns Some.

The implementation is indeed exactly like Option::and_then, but it's still very much a "while" because it's crucial to Iterator's semantics that iteration stops at the first None. What next returns after that doesn't matter because it's unspecified behavior. A for loop will still only yield elements "while" the predicate succeeds.

It's unfortunate we can't remove TakeWhile's flag anymore due to the FusedIterator impl, might have given a nice speedup. At this point we probably may as well make it fully fused.

@WaffleLapkin
Copy link
Member Author

@Mark-Simulacrum But there is already and_then (I still can't understand why is it called like that, when it most other langs it's called flat_map...) on Iterator and it's called flat_map... And it has different semantic in comparison with MapWhile:

let vec = vec![0, -1, 2, 3];

assert_eq!(
    vec.iter().copied().flat_map(|x| usize::try_from(x).ok()).collect::<Vec<_>>(),
    &[0, 2, 3],
);

assert_eq!(
    vec.iter().copied().map_while(|x| usize::try_from(x).ok()).collect::<Vec<_>>(),
    &[0],
);

(playground)

There's no longer any fusing involved, so we cannot say that it does something "while" it returns Some.

But it does something while predicate returns Some... It's the difference between filter_map and map_while - filter_map won't stop* while the underlying iterator return Some and map_while won't stop* while both underlying iterator and the predicate return Some.

*by "stop" I mean "return None"

and this takes Option<T> -> Option<U>

T -> Option<U>

@WaffleLapkin
Copy link
Member Author

The implementation is indeed exactly like Option::and_then

I've found out that next() could be rewritten just like self.iter.next().and_then(self.predicate) :D

@Mark-Simulacrum
Copy link
Member

The implementation is indeed exactly like Option::and_then, but it's still very much a "while" because it's crucial to Iterator's semantics that iteration stops at the first None. What next returns after that doesn't matter because it's unspecified behavior. A for loop will still only yield elements "while" the predicate succeeds.

If that's the only reason it's a while, then all iterator predicates would have to be named as such? So that seems like an odd reason. Plus, the adapter should not fuse at None; the caller may do so of course.

This adapter, not fused, would specifically be "just" and_then without any of the "frills" of other iterators atop it. For example, flat_map/filter_map cannot be used to implement and_then, as they "skip" over None's -- see for example, where the inner None in the parent iterator is skipped, whereas and_then would yield Some, Some, None, Some with an identity mapping.

@timvermeulen
Copy link
Contributor

whereas and_then would yield Some, Some, None, Some with an identity mapping.

I'm not sure if I understood your point correctly, but here you're relying on implementation details. The moment None is returned, the iterator is done. What comes after entirely doesn't matter. So yes, this iterator happens to behave like Option::and_then when you call next repeatedly under the implementation proposed by this PR, but this behavior is unspecified.

For instance:

let vec = vec![Some(1), Some(2), None, Some(3)];
for x in vec.into_iter().map_while(|x| x) {
    // the loop body is executed twice
}

x here is never 3, regardless of whether we remove the finished flag or not. The only way to notice a difference is by relying on behavior that is left intentionally unspecified. So if the "while" in the name was appropriate with the finished flag, then it will still be appropriate without. The semantics haven't changed.

@Mark-Simulacrum
Copy link
Member

If what came after the first None didn't matter, then there would be no real reason to fuse iterators. It's not true in my opinion that the rest of the iterator is unspecified; it is very much so specified by all adapters in std at least (or so I would expect).

What an iterator does in for loops or via collect and what it returns are different, but both are important.

@timvermeulen
Copy link
Contributor

If what came after the first None didn't matter, then there would be no real reason to fuse iterators.

This seems somewhat backwards to me. For a plain iterator, what comes after the first None is completely unspecified. In some cases, this can make the iterator simpler or more efficient (like here). It would be wasteful if all iterators were required to keep track of whether they have returned None before because in almost all cases, iterators are discarded once next returns None. But some algorithms rely on next returning None indefinitely, in which case they can opt into fusing the iterator themselves. So the real reason to fuse iterators is the fact that whatever comes after the first None is unspecified, not the other way around.

It's not true in my opinion that the rest of the iterator is unspecified; it is very much so specified by all adapters in std at least (or so I would expect).

It's only specified whenever FusedIterator is implemented, which indeed most libcore iterators do (conditionally). It's actually somewhat unfortunate that non-fused iterators are so rare because it can make certain bugs very hard to find. The iterator tests add several iterator adaptors that are intentionally non-fused in order to test that libcore iterators still behave correctly when the underlying iterator isn't fused.

All this to say that removing the finished flag only changes behavior that was already unspecified to begin with because MapWhile doesn't implement FusedIterator.

@WaffleLapkin
Copy link
Member Author

For a plain iterator, what comes after the first None is completely unspecified. In some cases, this can make the iterator simpler or more efficient (like here).

In my opinion, It's not even unspecified (in this case), MapWhile simply will try to map the next element of the underlying iterator. And this behaviour isn't problematic because in case you want to rely on the fact that iterator will always return None after the first None, you'll just add : FusedIterator bound or use .fuse(). And in most cases, you even don't need this.

I'm wondering if there are cases when you need to continue after first None and finished flag can break these use-cases...

@Mark-Simulacrum
Copy link
Member

Hm, maybe we're talking past each other. I don't disagree that an iterator that is not fused may return Some after the first None. But it is not true that what that iterator returns after the first None is arbitrary and can change over time. i.e., not being fused does not let iterators start returning arbitrary data after the first None in some releases. (Which is what "unspecified" in this context means, I think; it means that it is an unstable implementation detail).

With this PR, map_while (or as I'm proposing, and_then), has this specification:

  • If the inner iterator returns Some, the closure will be called, and its return value used.
  • If the inner iterator returns None, the iterator will return None.

Note, that this implies fused-ness if the underlying iterator is fused, but (like most iterator adapters) does not fuse. This seems reasonable to me.

However, such a specification does not lead me to call it "map_while" -- there's no while here, there's no state in the adapter itself. It doesn't do something while the underlying adapter is doing something, or the provided closure is continuing to yield elements. It specifically only acts on one iterator at a time.

And indeed, we can only make this change because map_while is still unstable -- if it was stabilized, then regardless of whether it was fused officially (i.e., implemented the trait), I would argue that such a behavioral difference would not be acceptable to make.

@timvermeulen
Copy link
Contributor

But it is not true that what that iterator returns after the first None is arbitrary and can change over time. i.e., not being fused does not let iterators start returning arbitrary data after the first None in some releases.

I guess this is the source of our disagreement 🙂 It was my understanding/expectation that non-fused iterators are at liberty to change what next returns after having returned None, without that being a breaking change. I strongly suspect that we've changed this behavior for several adaptors in the past (in obscure cases), but I can't think of anything specific, so maybe not.

Regardless of all this, when used "normally" (i.e. stopping on the first None), this iterator adaptor will still only yield results "while" the predicate succeeds, so to me MapWhile still makes the most sense.

@Mark-Simulacrum
Copy link
Member

We do not call filter filter_while, though it has the same behavior, right?

Plus, it's not just predicate success -- it's either the predicate succeeding or the inner iterator returning None.

@timvermeulen
Copy link
Contributor

I'm not sure I follow, filter and filter_map will still iterate the entire underlying iterator regardless of which items the predicate fails on. A failed filter predicate does not end iteration, unlike with take_while and map_while. This seems to be fairly consistent with what we already have, except for scan I suppose (which IMO should not have been given the ability to stop iteration early).

@Mark-Simulacrum
Copy link
Member

Hm, I guess it's true in the sense that filter will keep calling your predicate while you return false (vs. returning None); it only cares about the underlying iterator returning None.

I would personally prefer and_then as the name I think still, since that to me is much clearer than map_while ever was -- the fact that we "stop" iteration at the first None is a fact that's true of the caller's code, not the adapter itself (even if 99% of code will indeed stop iteration at the first None).

@@ -1114,8 +1114,23 @@ pub trait Iterator {
/// The `-3` is no longer there, because it was consumed in order to see if
/// the iteration should stop, but wasn't placed back into the iterator.
///
/// Note that unlike [`take_while`] this iterator is **not** fused:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be documented. The lack of a FusedIterator impl should be enough to know it's not fused. Although in general if you need a fused iterator you should just call .fuse() whether it happens to implement FusedIterator or not.

Additionally I think that the behavior after .next() has returned None should be left unspecified, unless of course there is some use case for that behavior. I'd imagine that if you need access to more elements after the predicate has returned None you'd be better off just using .map(predicate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree in that I think mentioning this iterator not being fused in the docs is a good thing. The lack of impls, especially for marker traits, is easily overlooked.

However, I agree that we probably don't want to specify the behavior after the first None is returned. At least for now. If it turns out later that there are use cases for a well defined behavior, we can still add it. So I would replace the code example with one sentence and put it all in one paragraph. Something like this:

Note that unlike take_while this iterator is not fused. It is also not specified what this iterator returns after the first None is returned. If you need fused iterator, use fuse.

@LukasKalbertodt
Copy link
Member

I think we should either make MapWhile be fused (i.e. rejecting this PR) or rename it. Having something so close to take_while not being fused is not a good idea I'd say. However, I'm not a fan of the name and_then in this case, but can't think of a better one either.

What advantages does removing the finished flag have? (Apart from implementation complexity (which is not too bad IMO) and the Debug output.) Do we have reason to believe that this is faster in any significant way?

@timvermeulen
Copy link
Contributor

Do we have reason to believe that this is faster in any significant way?

The overhead of the finished flag should be about the same as the overhead of calling fuse() on a non-fused iterator, in both size and performance (although technically MapWhile::next doesn't change finished when self.iter.next() returns None, so that might help the optimizer). We'll probably need to do benchmarks to see if it impacts performance.

It's probably good to note that Scan has no finished flag, too – and map_while is equivalent to scan with a () state. So there is some precedent for not having it. Granted, Scan has no "while" in its name, but from my experience this has only caused more confusion than it avoided, with people wondering why the closure returns an Option in the first place.

@LukasKalbertodt
Copy link
Member

So if we keep map_while with its finished flag and implement FusedIterator for it, then it works as expected without footguns for most users. And the few users who actually notice the potential performance hit can use scan.

Slightly reduced implementation complexity, the Debug impl and a probably fairly small performance impact do not justify breaking the consistency between map_while and take_while IMO. Especially since this method was specifically introduced in relation to take_while.

@timvermeulen
Copy link
Contributor

then it works as expected without footguns for most users

Are we sure that this actually is a footgun? I've never heard of any bugs that were caused by Scan not being fused, but then again, its usage is probably quite low. Since it's unstable anyway, couldn't we just add the flag back if it turns out that this is causing confusion/bugs? It would be unfortunate if we ended up making the obvious thing slower than the non-obvious thing.

If non-fused iterators are indeed a footgun for people, then you could argue every iterator should be fused.

@WaffleLapkin
Copy link
Member Author

then it works as expected without footguns for most users

I'm also want to argue about this being a footgun. Most users in most cases will use operations that stop on first None (for_each, fold, from_iter, etc) and won't notice this. However, if someone wants to do something complicated that requires not to stop on first None then he probably knows what he is doing and can check for FusedIterator impl.

I think that finished flag won't be a big performance penalty, however, it's work that is not needed anyway. And this is not "zero-cost".

@LukasKalbertodt
Copy link
Member

"Footgun" was probably a too strong of a word. But it still is a strangeness in the API that could lead to bugs.

@WaffleLapkin I just took a look at your original PR again. The example in your top comment requires map_while to be fused, right?

Also, two pieces of the doc comment of Iterator::map_while seem to imply that map_while is fused. If we actually want to make MapWhile non fused, then those places need to be adjusted.

After None is returned, map_while()'s job is over, and the rest of the elements are ignored.

and

Stopping after an initial None:

#![feature(iter_map_while)]
use std::convert::TryFrom;

let a = [0, -1, 1, -2];

let mut iter = a.iter().map_while(|x| u32::try_from(*x).ok());

assert_eq!(iter.next(), Some(0u32));

// We have more elements that are fit in u32, but since we already
// got a None, map_while() isn't used any more
assert_eq!(iter.next(), None);

I'm just a bit confused as your code and comments on the first PR seem like you very much want MapWhile to be fused ^_^

@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Feb 20, 2020

Oh, my bad, sorry. I've simply added note about unfuseness and forgotten about other parts of the doc ':)

I'll fix doc today tomorrow.

Fix doc of `Iterator::map_while` so it would
be clearer that it isn't fused.
@LukasKalbertodt
Copy link
Member

Sorry for the team ping, bu I think I need some input from @rust-lang/libs here.

In summary: this PR changes the unstable MapWhile iterator. The original PR adding the iterator implemented FusedIterator for it by using a finished flag. This PR proposes to remove that flag and the FusedIterator impl. Arguments in favor of this change:

  • The fused behavior can easily be regained via .fuse(). That's the whole purpose of Fuse: to add a finished flag to make any iterator fused.
  • Artificially fusing an iterator like this has some runtime overhead (at least one additional branch). If the user of map_while doesn't need the fused property, they shouldn't pay for it.
  • Most iterator "consumers" (for loops, for_each, ...) do not require fused iterators anyway.

Arguments against this change:

  • map_while was introduced as companion for take_while (just like filter_map to filter). However, take_while is fused. This inconsistency is not great.
  • The name _while kind of implies a fused iterator IMO. (Some other people interpret the name in another way and wouldn't agree with my previous statement.)

I think the three possible options are: (a) merge as is, (b) rename iterator to better reflect its semantics and merge, (c) decide MapWhile stays fused and reject this PR. Personally I tend towards (b) or (c), but that's not too strong of an opinion.

I know this is just an unstable API and we can change it later anyway, but we might as well decide it now. Otherwise the iterator might accidentally be stabilized without discussing this issue further.

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 2, 2020
@Amanieu
Copy link
Member

Amanieu commented Mar 18, 2020

I'm in favor of merging this. If you rely on fused iterators then you should be using .fuse() anyways (which is a no-op on already fused iterators).

Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's get this merged. I only have two more comments.

src/libcore/iter/traits/iterator.rs Outdated Show resolved Hide resolved
@@ -1114,8 +1114,23 @@ pub trait Iterator {
/// The `-3` is no longer there, because it was consumed in order to see if
/// the iteration should stop, but wasn't placed back into the iterator.
///
/// Note that unlike [`take_while`] this iterator is **not** fused:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree in that I think mentioning this iterator not being fused in the docs is a good thing. The lack of impls, especially for marker traits, is easily overlooked.

However, I agree that we probably don't want to specify the behavior after the first None is returned. At least for now. If it turns out later that there are use cases for a well defined behavior, we can still add it. So I would replace the code example with one sentence and put it all in one paragraph. Something like this:

Note that unlike take_while this iterator is not fused. It is also not specified what this iterator returns after the first None is returned. If you need fused iterator, use fuse.

@LukasKalbertodt
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Mar 22, 2020

📌 Commit e964d71 has been approved by LukasKalbertodt

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Mar 22, 2020
@bors
Copy link
Contributor

bors commented Mar 22, 2020

⌛ Testing commit e964d71 with merge 5ae85f4...

@bors
Copy link
Contributor

bors commented Mar 22, 2020

☀️ Test successful - checks-azure
Approved by: LukasKalbertodt
Pushing 5ae85f4 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 22, 2020
@bors bors merged commit 5ae85f4 into rust-lang:master Mar 22, 2020
@WaffleLapkin WaffleLapkin deleted the remove_finished_from_map_while branch March 22, 2020 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants