Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::iter::unfold #55869

Merged
merged 6 commits into from
Nov 23, 2018
Merged

Add std::iter::unfold #55869

merged 6 commits into from
Nov 23, 2018

Conversation

SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Nov 11, 2018

This adds an unstable std::iter::iterate std::iter::unfold function and std::iter::Iterate std::iter::Unfold type that trivially wrap a FnMut() -> Option<T> FnMut(&mut State) -> Option<T> closure to create an iterator. Iterator state can be kept in the closure’s environment or captures.

This is intended to help reduce amount of boilerplate needed when defining an iterator that is only created in one place. Compare the existing example of the std::iter module: (explanatory comments elided)

struct Counter {
    count: usize,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: 0 }
    }
}

impl Iterator for Counter {
    type Item = usize;

    fn next(&mut self) -> Option<usize> {
        self.count += 1;
        if self.count < 6 {
            Some(self.count)
        } else {
            None
        }
    }
}

… with the same algorithm rewritten to use this new API:

fn counter() -> impl Iterator<Item=usize> {
    std::iter::unfold(0, |count| {
        *count += 1;
        if *count < 6 {
            Some(*count)
        } else {
            None
        }
    })
}

This also add unstable std::iter::successors which takes an (optional) initial item and a closure that takes an item and computes the next one (its successor).

let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);

@SimonSapin SimonSapin added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Nov 11, 2018
@rust-highfive
Copy link
Collaborator

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 11, 2018
@SimonSapin
Copy link
Contributor Author

SimonSapin commented Nov 11, 2018

For another example, the other somewhat-general purpose iterator that I initially considered submitting can be expressed on top of iterate like below. However it feels not quite general enough to belong in the standard library. iterate on the other hand feels more fundamental, as a way of bridging closures (and their very nice syntax) to iterators.

fn successors<T, F>(mut next: Option<T>, mut succ: F) -> impl Iterator<Item = T>
where
    F: FnMut(&T) -> Option<T>,
{
    std::iter::iterate(move || {
        next.take().map(|item| {
            next = succ(&item);
            item
        })
    })
}

Edit: with explicit state:

fn successors<T, F>(first: Option<T>, mut succ: F) -> impl Iterator<Item = T>
where
    F: FnMut(&T) -> Option<T>,
{
    std::iter::unfold(first, move |next| {
        next.take().map(|item| {
            *next = succ(&item);
            item
        })
    })
}

@nagisa
Copy link
Member

nagisa commented Nov 11, 2018

NB: this is closer to Haskell’s unfoldr than iterate. I’m fine with the proposed name, but there may be some value in using the same name as Haskell does.


One thing that could be tweaked about this function is making the state explicit instead of implicit in the closure as it is now. That would make the function’s signature

pub fn iterate<T, S, F: FnMut(S) -> Option<(T, S)>>(state: S, f: F) -> Iterate<F>;

and the motivating example look like this:

fn counter() -> impl Iterator<Item=usize> {
    std::iter::iterate(0, move |count| {
        if count < 6 {
            Some((count + 1, count + 1))
        } else {
            None
        }
    })
}

which, among other things, would make this strictly more composable and flexible. It would also be closer design-wise to our other iterators (esp. Iterator::fold).

@SimonSapin
Copy link
Contributor Author

What does the r mean in unfoldr?

How is explicitly state more composable or flexible? Returning an (optional) tuple feels kinda awkward to me.

@nagisa
Copy link
Member

nagisa commented Nov 11, 2018

r stands for "right-associative", and there’s not really a l variant of unfold, at least not for their built-in lazy lists (which are somewhat close to iterators in Rust). I suspect the exact origin of the r in unfoldr is from unfoldr being dual to foldr. foldr has different semantics when compared to foldl (which is what our Iterator::fold is).

Associativity matters more in Haskell than it does in Rust, though, so I think there’s little reason to worry about the associativity here.

How is explicitly state more composable or flexible? Returning an (optional) tuple feels kinda awkward to me.

At the very least it is possible to use non-closures with the function. I agree with the tuple feeling slightly more awkward than captured state in this specific example, but to me it seems that given a different example it could go the other way as well. All that being said, I feel that Iterator::fold having explicit state is a strong incentive to have similarly explicit state in related functions as well.

@ljedrz
Copy link
Contributor

ljedrz commented Nov 11, 2018

A blast from the past.

@bluetech
Copy link

If it will be possible to use generators to create iterators, would this function still be useful?

@SimonSapin
Copy link
Contributor Author

@ljedrz In my opinion we’ve been very conservative in the past with the standard library, especially in the months before and after 1.0. You can see that the given deprecation reason can be summarized as "Meh."

@bluetech Indeed it would be less useful. However as far as I know such generators (or generators at all, other than as an unstable implementation detail of async fns) are not on the roadmap at the moment.

I’ve rename to unfold / Unfold, and added explicit state. However rather than moved, the state is passed to the closure as &mut St and is not part of the closure’s return type.

@SimonSapin SimonSapin changed the title Add std::iter::iterate Add std::iter::unfold Nov 11, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:00d501e1:start=1541968869333660002,finish=1541968924170331674,duration=54836671672
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-5.0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@ljedrz
Copy link
Contributor

ljedrz commented Nov 11, 2018

@SimonSapin don't get me wrong, I'm a big fan of the idea :).

@clarfonthey
Copy link
Contributor

I'm worried that this will discourage people providing proper size hints for their iterators. This is also very similar to repeat_with and perhaps that should be defined in terms of this?

@@ -386,3 +386,68 @@ impl<T> FusedIterator for Once<T> {}
pub fn once<T>(value: T) -> Once<T> {
Once { inner: Some(value).into_iter() }
}

/// Creates a new iterator where each iteration calls the provided closure
/// `F: FnMut() -> Option<T>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `F: FnMut() -> Option<T>`.
/// `F: FnMut(&mut St) -> Option<T>`.

/// without using the more verbose syntax of creating a dedicated type
/// and implementing the `Iterator` trait for it.
/// Iterator state can be kept in the closure’s captures and environment.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
/// An initial state can also be passed in the first argument of `unfold`.
///

/// ```
/// #![feature(iter_unfold)]
/// let counter = std::iter::unfold(0, |count| {
/// // increment our count. This is why we started at zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// // increment our count. This is why we started at zero.
/// // Increment our count. This is why we started at zero.

(should be fixed in module level docs also...)

/// // increment our count. This is why we started at zero.
/// *count += 1;
///
/// // check to see if we've finished counting or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// // check to see if we've finished counting or not.
/// // Check to see if we've finished counting or not.

}
}

/// An iterator where each iteration calls the provided closure `F: FnMut() -> Option<T>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// An iterator where each iteration calls the provided closure `F: FnMut() -> Option<T>`.
/// An iterator where each iteration calls the provided closure
/// `F: FnMut(&mut St) -> Option<T>`.

/// }
/// });
/// assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a note that this iterator is not fused and that calling it after it returned None the first time might return Some afterwards.

#[unstable(feature = "iter_unfold", issue = /* FIXME */ "0")]
pub struct Unfold<St, F> {
/// The current state of the iterator
pub state: St,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check: are we comfortable with exposing a public field at this point in time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was public in the old Unfold that we removed in 1.3 and I didn’t see a reason for it not to be. But I’ve undone this change for now in case it makes anyone feel better.

/// See its documentation for more.
///
/// [`unfold`]: fn.unfold.html
#[derive(Copy, Clone, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add an impl Debug for Unfold<St, F> where St: Debug, F: Debug. Note in particular F: Debug; this is probably not what the user wants so I think the Debug implementation should be hand rolled in this instance.

We do this all over the place with iterators, for example:

impl<I: Debug, P> Debug for Filter<I, P>

No P: Debug here. ;)

@Centril
Copy link
Contributor

Centril commented Nov 11, 2018

I think this is a good idea and I agree with @SimonSapin that generators aren't sufficient justification to skip this given that generators haven't even been RFC accepted yet (experimental don't count) and that they thus are far into the future.

The design seems mostly right to me and the name is also good (follows the Haskell theme we generally have for iterators).

@clarcharr I think it's fine to not provide proper size hints for your iterator; premature optimization and all that... First things first, prototype your application and make it work correctly first, then after, if profiling shows up a problem, then you tune things with a custom iterator.

/// }
/// });
/// assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also leave a note about size_hint.

@SimonSapin
Copy link
Contributor Author

I’ve added a note about size_hint and FusedIterator.

FWIW in the use case that triggered this PR (tree traversal) there is no cheap way to compute a size_hint better than (0, None). (Or perhaps (0, Some(usize::MAX / size_of::<Node>())) would be accurate, but it wouldn’t be particularly useful.)

@cuviper
Copy link
Member

cuviper commented Nov 12, 2018

FWIW, itertools picked up unfold after it was evicted from std::iter. But these are free functions anyway, so they won't conflict the way additions to Iterator can with Itertools.

@alexcrichton
Copy link
Member

On reading the signature here my first reaction was "why take a state parameter because the closure env also has state?" but it looks like that's how this function started and it's ended up here with a state parameter. On reading the discussion here I'm not quite sure why, but @SimonSapin how come you made the switch? (or @cuviper, do you know why it's the way it is in itertools?)

@cuviper
Copy link
Member

cuviper commented Nov 13, 2018

I believe @bluss just took the Unfold design as-is from std: rust-itertools/itertools@edc9295. Recently, @matklad had a similar question about state vs. captures in rust-itertools/itertools#298.

I guess the state parameter can make it easier to write a single-expression unfold, rather than having to let-bind the initial state separately. And with captures, if you need to leave the initial scope, you have to remember to move into the closure too.

In some sense, the unfold state just mirrors the fold accumulator. Maybe that's enough.

@matklad
Copy link
Member

matklad commented Nov 13, 2018

Note that we already can avoid creating a struct using the Iterator::scan method. So, the original example can be written, albeit very unintuitively, as

fn counter() -> impl Iterator<Item=i32> {
    let mut state = 0;
    std::iter::repeat(())
        .scan((), move |(), ()| {
            state += 1;
            if state < 6 {
                Some(state)
            } else {
                None
            }
        })
}

@matklad
Copy link
Member

matklad commented Nov 13, 2018

I personally really enjoy Kotlin's generateSequence design: it fits better for common case, and is much more intuitive then unfold. Using generate, the counter example looks like:

fn counter() -> impl Iterator<Item=usize> {
    generate(Some(1), |it| {
        if it + 1 < 6 { Some(it + 1) } else { None }
    })
}

playground

I'd rather we add generate to std, though, we should probably try it out in itertools first PR.

@SimonSapin
Copy link
Contributor Author

@alexcrichton I initially liked the simplicity of the next() impl that does nothing but call the closure, but I don’t really mind explicit state. I made the switch because I didn’t really have a counter-point to the consistency argument:

#55869 (comment)

making the state explicit instead of implicit in the closure as it is now […] would also be closer design-wise to our other iterators (esp. Iterator::fold).

… but I’m ok with switch back if other people prefer the more "trivial" behavior.

@cuviper Indeed, re move keyword. The earlier version of this PR had an example in the doc-comment that used it with fn counter() -> impl Iterator<Item=u32>.

@matklad I think this generate is identical to my successors in #55869 (comment) ?

@matklad
Copy link
Member

matklad commented Nov 13, 2018

@matklad I think this generate is identical to my successors in #55869 (comment) ?

Agree, didn't saw that comment. From my experience though, generate/successor is what you need in practice in the overwhelming majority of cases (like, fs::read_to_string). I think unfold can be expressed in terms of generate and map, where generate gens a pair of (state, item). There's also a possibility to add both, of course.

Probably a good thing to do would be to grep something like servo for usages of Unfold, and check how many will be simplified with generate.

@bluss
Copy link
Member

bluss commented Nov 13, 2018

Implicit state seems more elegant, but I don't use unfold enough in the wild to really know what works best. What is awkward about the &mut St reference is that you'll often go on to read (*state) and write back (*state = x;) updates.

I think fold has a good reason to use an explicit accumulator: when ownership is passed through each accumulation.

@SimonSapin
Copy link
Contributor Author

(FWIW Servo does not use itertools::Unfold. And std::iter::Unfold was removed long ago, this PR is about adding it back.)

@cuviper
Copy link
Member

cuviper commented Nov 13, 2018

This adds an unstable std::iter::iterate std::iter::unfold function and std::iter::Iterate std::iter::Unfold type that trivially wrap a FnMut() -> Option<T> FnMut(&mut State) -> Option<T> closure to create an iterator. Iterator state can be kept in the closure’s environment or captures.

BTW, I didn't see your original code, but std::iter::iterate did exist once too:

pub fn iterate<T, F>(
    seed: T,
    f: F,
) -> Unfold<(F, Option<T>, bool), fn(&mut (F, Option<T>, bool)) -> Option<T>>
where
    T: Clone,
    F: FnMut(T) -> T,

And there's a similar itertools::iterate, with its own return type and not requiring Clone:

pub fn iterate<St, F>(initial_value: St, f: F) -> Iterate<St, F>
where
    F: FnMut(&St) -> St, 

But unlike @SimonSapin's successors / @matklad's generate, these iterates don't terminate.

@SimonSapin
Copy link
Contributor Author

Hmm, I probably shouldn’t have squashed the commits. The original code was something like this:

impl<T, F: FnMut() -> Option<T>> Iterator for Foo<F> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        (self.0)()
    }
}

@SimonSapin
Copy link
Contributor Author

I’ve optimistically created a tracking issue at #55977.

From my experience though, generate/successor is what you need in practice in the overwhelming majority of cases

I’ve added successors.

@clarfonthey
Copy link
Contributor

Bikeshed: successors as continue_with, maybe?

@alexcrichton
Copy link
Member

Ok these seem reasonable enough to me to add unstable, r=me with @Centril's doc comments as well

@SimonSapin
Copy link
Contributor Author

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Nov 20, 2018

📌 Commit a4279a0 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Nov 23, 2018
Add std::iter::unfold

This adds an **unstable** ~`std::iter::iterate`~ `std::iter::unfold` function and ~`std::iter::Iterate`~ `std::iter::Unfold` type that trivially wrap a ~`FnMut() -> Option<T>`~ `FnMut(&mut State) -> Option<T>` closure to create an iterator. ~Iterator state can be kept in the closure’s environment or captures.~

This is intended to help reduce amount of boilerplate needed when defining an iterator that is only created in one place. Compare the existing example of the `std::iter` module: (explanatory comments elided)

```rust
struct Counter {
    count: usize,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: 0 }
    }
}

impl Iterator for Counter {
    type Item = usize;

    fn next(&mut self) -> Option<usize> {
        self.count += 1;
        if self.count < 6 {
            Some(self.count)
        } else {
            None
        }
    }
}
```

… with the same algorithm rewritten to use this new API:

```rust
fn counter() -> impl Iterator<Item=usize> {
    std::iter::unfold(0, |count| {
        *count += 1;
        if *count < 6 {
            Some(*count)
        } else {
            None
        }
    })
}
```

-----

This also add unstable `std::iter::successors` which takes an (optional) initial item and a closure that takes an item and computes the next one (its successor).

```rust
let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);
```
bors added a commit that referenced this pull request Nov 23, 2018
Rollup of 14 pull requests

Successful merges:

 - #55767 (Disable some pretty-printers when gdb is rust-enabled)
 - #55838 (Fix #[cfg] for step impl on ranges)
 - #55869 (Add std::iter::unfold)
 - #55945 (Ensure that the argument to `static_assert` is a `bool`)
 - #56022 (When popping in CTFE, perform validation before jumping to next statement to have a better span for the error)
 - #56048 (Add rustc_codegen_ssa to sysroot)
 - #56091 (Fix json output in the self-profiler)
 - #56097 (Fix invalid bitcast taking bool out of a union represented as a scalar)
 - #56116 (ci: Download clang/lldb from tarballs)
 - #56120 (Add unstable Literal::subspan().)
 - #56154 (Pass additional linker flags when targeting Fuchsia)
 - #56162 (std::str Adapt documentation to reality)
 - #56163 ([master] Backport 1.30.1 release notes)
 - #56168 (Fix the tracking issue for hash_raw_entry)

Failed merges:

r? @ghost
@bors bors merged commit a4279a0 into rust-lang:master Nov 23, 2018
@SimonSapin SimonSapin deleted the iterate branch November 28, 2019 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet