New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::iter::unfold #55869

Merged
merged 6 commits into from Nov 23, 2018

Conversation

Projects
None yet
@SimonSapin
Contributor

SimonSapin commented Nov 11, 2018

This adds an unstable std::iter::iterate std::iter::unfold function and std::iter::Iterate std::iter::Unfold type that trivially wrap a FnMut() -> Option<T> FnMut(&mut State) -> Option<T> closure to create an iterator. Iterator state can be kept in the closure’s environment or captures.

This is intended to help reduce amount of boilerplate needed when defining an iterator that is only created in one place. Compare the existing example of the std::iter module: (explanatory comments elided)

struct Counter {
    count: usize,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: 0 }
    }
}

impl Iterator for Counter {
    type Item = usize;

    fn next(&mut self) -> Option<usize> {
        self.count += 1;
        if self.count < 6 {
            Some(self.count)
        } else {
            None
        }
    }
}

… with the same algorithm rewritten to use this new API:

fn counter() -> impl Iterator<Item=usize> {
    std::iter::unfold(0, |count| {
        *count += 1;
        if *count < 6 {
            Some(*count)
        } else {
            None
        }
    })
}

This also add unstable std::iter::successors which takes an (optional) initial item and a closure that takes an item and computes the next one (its successor).

let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);
@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented Nov 11, 2018

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 11, 2018

For another example, the other somewhat-general purpose iterator that I initially considered submitting can be expressed on top of iterate like below. However it feels not quite general enough to belong in the standard library. iterate on the other hand feels more fundamental, as a way of bridging closures (and their very nice syntax) to iterators.

fn successors<T, F>(mut next: Option<T>, mut succ: F) -> impl Iterator<Item = T>
where
    F: FnMut(&T) -> Option<T>,
{
    std::iter::iterate(move || {
        next.take().map(|item| {
            next = succ(&item);
            item
        })
    })
}

Edit: with explicit state:

fn successors<T, F>(first: Option<T>, mut succ: F) -> impl Iterator<Item = T>
where
    F: FnMut(&T) -> Option<T>,
{
    std::iter::unfold(first, move |next| {
        next.take().map(|item| {
            *next = succ(&item);
            item
        })
    })
}
@nagisa

This comment has been minimized.

Contributor

nagisa commented Nov 11, 2018

NB: this is closer to Haskell’s unfoldr than iterate. I’m fine with the proposed name, but there may be some value in using the same name as Haskell does.


One thing that could be tweaked about this function is making the state explicit instead of implicit in the closure as it is now. That would make the function’s signature

pub fn iterate<T, S, F: FnMut(S) -> Option<(T, S)>>(state: S, f: F) -> Iterate<F>;

and the motivating example look like this:

fn counter() -> impl Iterator<Item=usize> {
    std::iter::iterate(0, move |count| {
        if count < 6 {
            Some((count + 1, count + 1))
        } else {
            None
        }
    })
}

which, among other things, would make this strictly more composable and flexible. It would also be closer design-wise to our other iterators (esp. Iterator::fold).

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 11, 2018

What does the r mean in unfoldr?

How is explicitly state more composable or flexible? Returning an (optional) tuple feels kinda awkward to me.

@nagisa

This comment has been minimized.

Contributor

nagisa commented Nov 11, 2018

r stands for "right-associative", and there’s not really a l variant of unfold, at least not for their built-in lazy lists (which are somewhat close to iterators in Rust). I suspect the exact origin of the r in unfoldr is from unfoldr being dual to foldr. foldr has different semantics when compared to foldl (which is what our Iterator::fold is).

Associativity matters more in Haskell than it does in Rust, though, so I think there’s little reason to worry about the associativity here.

How is explicitly state more composable or flexible? Returning an (optional) tuple feels kinda awkward to me.

At the very least it is possible to use non-closures with the function. I agree with the tuple feeling slightly more awkward than captured state in this specific example, but to me it seems that given a different example it could go the other way as well. All that being said, I feel that Iterator::fold having explicit state is a strong incentive to have similarly explicit state in related functions as well.

@ljedrz

This comment has been minimized.

Contributor

ljedrz commented Nov 11, 2018

@bluetech

This comment has been minimized.

bluetech commented Nov 11, 2018

If it will be possible to use generators to create iterators, would this function still be useful?

@SimonSapin SimonSapin force-pushed the SimonSapin:iterate branch from eac089d to 515fb03 Nov 11, 2018

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 11, 2018

@ljedrz In my opinion we’ve been very conservative in the past with the standard library, especially in the months before and after 1.0. You can see that the given deprecation reason can be summarized as "Meh."

@bluetech Indeed it would be less useful. However as far as I know such generators (or generators at all, other than as an unstable implementation detail of async fns) are not on the roadmap at the moment.

I’ve rename to unfold / Unfold, and added explicit state. However rather than moved, the state is passed to the closure as &mut St and is not part of the closure’s return type.

@SimonSapin SimonSapin changed the title from Add std::iter::iterate to Add std::iter::unfold Nov 11, 2018

@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented Nov 11, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:00d501e1:start=1541968869333660002,finish=1541968924170331674,duration=54836671672
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-5.0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@ljedrz

This comment has been minimized.

Contributor

ljedrz commented Nov 11, 2018

@SimonSapin don't get me wrong, I'm a big fan of the idea :).

@SimonSapin SimonSapin force-pushed the SimonSapin:iterate branch from 515fb03 to a25af8e Nov 11, 2018

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 11, 2018

I'm worried that this will discourage people providing proper size hints for their iterators. This is also very similar to repeat_with and perhaps that should be defined in terms of this?

@@ -386,3 +386,68 @@ impl<T> FusedIterator for Once<T> {}
pub fn once<T>(value: T) -> Once<T> {
Once { inner: Some(value).into_iter() }
}
/// Creates a new iterator where each iteration calls the provided closure
/// `F: FnMut() -> Option<T>`.

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor
Suggested change Beta
/// `F: FnMut() -> Option<T>`.
/// `F: FnMut(&mut St) -> Option<T>`.
/// without using the more verbose syntax of creating a dedicated type
/// and implementing the `Iterator` trait for it.
/// Iterator state can be kept in the closure’s captures and environment.
///

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor
Suggested change Beta
///
/// An initial state can also be passed in the first argument of `unfold`.
///
/// ```
/// #![feature(iter_unfold)]
/// let counter = std::iter::unfold(0, |count| {
/// // increment our count. This is why we started at zero.

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor
Suggested change Beta
/// // increment our count. This is why we started at zero.
/// // Increment our count. This is why we started at zero.

(should be fixed in module level docs also...)

/// // increment our count. This is why we started at zero.
/// *count += 1;
///
/// // check to see if we've finished counting or not.

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor
Suggested change Beta
/// // check to see if we've finished counting or not.
/// // Check to see if we've finished counting or not.
}
}
/// An iterator where each iteration calls the provided closure `F: FnMut() -> Option<T>`.

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor
Suggested change Beta
/// An iterator where each iteration calls the provided closure `F: FnMut() -> Option<T>`.
/// An iterator where each iteration calls the provided closure
/// `F: FnMut(&mut St) -> Option<T>`.
/// }
/// });
/// assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);
/// ```

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor

Maybe leave a note that this iterator is not fused and that calling it after it returned None the first time might return Some afterwards.

#[unstable(feature = "iter_unfold", issue = /* FIXME */ "0")]
pub struct Unfold<St, F> {
/// The current state of the iterator
pub state: St,

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor

Double check: are we comfortable with exposing a public field at this point in time?

This comment has been minimized.

@SimonSapin

SimonSapin Nov 12, 2018

Contributor

It was public in the old Unfold that we removed in 1.3 and I didn’t see a reason for it not to be. But I’ve undone this change for now in case it makes anyone feel better.

/// See its documentation for more.
///
/// [`unfold`]: fn.unfold.html
#[derive(Copy, Clone, Debug)]

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor

This will add an impl Debug for Unfold<St, F> where St: Debug, F: Debug. Note in particular F: Debug; this is probably not what the user wants so I think the Debug implementation should be hand rolled in this instance.

We do this all over the place with iterators, for example:

impl<I: Debug, P> Debug for Filter<I, P>

No P: Debug here. ;)

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 11, 2018

I think this is a good idea and I agree with @SimonSapin that generators aren't sufficient justification to skip this given that generators haven't even been RFC accepted yet (experimental don't count) and that they thus are far into the future.

The design seems mostly right to me and the name is also good (follows the Haskell theme we generally have for iterators).

@clarcharr I think it's fine to not provide proper size hints for your iterator; premature optimization and all that... First things first, prototype your application and make it work correctly first, then after, if profiling shows up a problem, then you tune things with a custom iterator.

/// }
/// });
/// assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);
/// ```

This comment has been minimized.

@Centril

Centril Nov 11, 2018

Contributor

Maybe also leave a note about size_hint.

@SimonSapin SimonSapin force-pushed the SimonSapin:iterate branch from a25af8e to e009d90 Nov 12, 2018

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 12, 2018

I’ve added a note about size_hint and FusedIterator.

FWIW in the use case that triggered this PR (tree traversal) there is no cheap way to compute a size_hint better than (0, None). (Or perhaps (0, Some(usize::MAX / size_of::<Node>())) would be accurate, but it wouldn’t be particularly useful.)

@cuviper

This comment has been minimized.

Member

cuviper commented Nov 12, 2018

FWIW, itertools picked up unfold after it was evicted from std::iter. But these are free functions anyway, so they won't conflict the way additions to Iterator can with Itertools.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Nov 13, 2018

On reading the signature here my first reaction was "why take a state parameter because the closure env also has state?" but it looks like that's how this function started and it's ended up here with a state parameter. On reading the discussion here I'm not quite sure why, but @SimonSapin how come you made the switch? (or @cuviper, do you know why it's the way it is in itertools?)

@cuviper

This comment has been minimized.

Member

cuviper commented Nov 13, 2018

I believe @bluss just took the Unfold design as-is from std: bluss/rust-itertools@edc9295. Recently, @matklad had a similar question about state vs. captures in bluss/rust-itertools#298.

I guess the state parameter can make it easier to write a single-expression unfold, rather than having to let-bind the initial state separately. And with captures, if you need to leave the initial scope, you have to remember to move into the closure too.

In some sense, the unfold state just mirrors the fold accumulator. Maybe that's enough.

@matklad

This comment has been minimized.

Member

matklad commented Nov 13, 2018

Note that we already can avoid creating a struct using the Iterator::scan method. So, the original example can be written, albeit very unintuitively, as

fn counter() -> impl Iterator<Item=i32> {
    let mut state = 0;
    std::iter::repeat(())
        .scan((), move |(), ()| {
            state += 1;
            if state < 6 {
                Some(state)
            } else {
                None
            }
        })
}
@matklad

This comment has been minimized.

Member

matklad commented Nov 13, 2018

I personally really enjoy Kotlin's generateSequence design: it fits better for common case, and is much more intuitive then unfold. Using generate, the counter example looks like:

fn counter() -> impl Iterator<Item=usize> {
    generate(Some(1), |it| {
        if it + 1 < 6 { Some(it + 1) } else { None }
    })
}

playground

I'd rather we add generate to std, though, we should probably try it out in itertools first PR.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 13, 2018

@alexcrichton I initially liked the simplicity of the next() impl that does nothing but call the closure, but I don’t really mind explicit state. I made the switch because I didn’t really have a counter-point to the consistency argument:

#55869 (comment)

making the state explicit instead of implicit in the closure as it is now […] would also be closer design-wise to our other iterators (esp. Iterator::fold).

… but I’m ok with switch back if other people prefer the more "trivial" behavior.

@cuviper Indeed, re move keyword. The earlier version of this PR had an example in the doc-comment that used it with fn counter() -> impl Iterator<Item=u32>.

@matklad I think this generate is identical to my successors in #55869 (comment) ?

@matklad

This comment has been minimized.

Member

matklad commented Nov 13, 2018

@matklad I think this generate is identical to my successors in #55869 (comment) ?

Agree, didn't saw that comment. From my experience though, generate/successor is what you need in practice in the overwhelming majority of cases (like, fs::read_to_string). I think unfold can be expressed in terms of generate and map, where generate gens a pair of (state, item). There's also a possibility to add both, of course.

Probably a good thing to do would be to grep something like servo for usages of Unfold, and check how many will be simplified with generate.

@bluss

This comment has been minimized.

Contributor

bluss commented Nov 13, 2018

Implicit state seems more elegant, but I don't use unfold enough in the wild to really know what works best. What is awkward about the &mut St reference is that you'll often go on to read (*state) and write back (*state = x;) updates.

I think fold has a good reason to use an explicit accumulator: when ownership is passed through each accumulation.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 13, 2018

(FWIW Servo does not use itertools::Unfold. And std::iter::Unfold was removed long ago, this PR is about adding it back.)

@cuviper

This comment has been minimized.

Member

cuviper commented Nov 13, 2018

This adds an unstable std::iter::iterate std::iter::unfold function and std::iter::Iterate std::iter::Unfold type that trivially wrap a FnMut() -> Option<T> FnMut(&mut State) -> Option<T> closure to create an iterator. Iterator state can be kept in the closure’s environment or captures.

BTW, I didn't see your original code, but std::iter::iterate did exist once too:

pub fn iterate<T, F>(
    seed: T,
    f: F,
) -> Unfold<(F, Option<T>, bool), fn(&mut (F, Option<T>, bool)) -> Option<T>>
where
    T: Clone,
    F: FnMut(T) -> T,

And there's a similar itertools::iterate, with its own return type and not requiring Clone:

pub fn iterate<St, F>(initial_value: St, f: F) -> Iterate<St, F>
where
    F: FnMut(&St) -> St, 

But unlike @SimonSapin's successors / @matklad's generate, these iterates don't terminate.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 13, 2018

Hmm, I probably shouldn’t have squashed the commits. The original code was something like this:

impl<T, F: FnMut() -> Option<T>> Iterator for Foo<F> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        (self.0)()
    }
}

@SimonSapin SimonSapin force-pushed the SimonSapin:iterate branch from e009d90 to f79b7c6 Nov 15, 2018

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 15, 2018

I’ve optimistically created a tracking issue at #55977.

From my experience though, generate/successor is what you need in practice in the overwhelming majority of cases

I’ve added successors.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 17, 2018

Bikeshed: successors as continue_with, maybe?

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Nov 19, 2018

Ok these seem reasonable enough to me to add unstable, r=me with @Centril's doc comments as well

@SimonSapin SimonSapin force-pushed the SimonSapin:iterate branch from f79b7c6 to a4279a0 Nov 20, 2018

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 20, 2018

@bors r=alexcrichton

@bors

This comment has been minimized.

Contributor

bors commented Nov 20, 2018

📌 Commit a4279a0 has been approved by alexcrichton

kennytm added a commit to kennytm/rust that referenced this pull request Nov 23, 2018

Rollup merge of rust-lang#55869 - SimonSapin:iterate, r=alexcrichton
Add std::iter::unfold

This adds an **unstable** ~`std::iter::iterate`~ `std::iter::unfold` function and ~`std::iter::Iterate`~ `std::iter::Unfold` type that trivially wrap a ~`FnMut() -> Option<T>`~ `FnMut(&mut State) -> Option<T>` closure to create an iterator. ~Iterator state can be kept in the closure’s environment or captures.~

This is intended to help reduce amount of boilerplate needed when defining an iterator that is only created in one place. Compare the existing example of the `std::iter` module: (explanatory comments elided)

```rust
struct Counter {
    count: usize,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: 0 }
    }
}

impl Iterator for Counter {
    type Item = usize;

    fn next(&mut self) -> Option<usize> {
        self.count += 1;
        if self.count < 6 {
            Some(self.count)
        } else {
            None
        }
    }
}
```

… with the same algorithm rewritten to use this new API:

```rust
fn counter() -> impl Iterator<Item=usize> {
    std::iter::unfold(0, |count| {
        *count += 1;
        if *count < 6 {
            Some(*count)
        } else {
            None
        }
    })
}
```

-----

This also add unstable `std::iter::successors` which takes an (optional) initial item and a closure that takes an item and computes the next one (its successor).

```rust
let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);
```

bors added a commit that referenced this pull request Nov 23, 2018

Auto merge of #56186 - kennytm:rollup, r=kennytm
Rollup of 14 pull requests

Successful merges:

 - #55767 (Disable some pretty-printers when gdb is rust-enabled)
 - #55838 (Fix #[cfg] for step impl on ranges)
 - #55869 (Add std::iter::unfold)
 - #55945 (Ensure that the argument to `static_assert` is a `bool`)
 - #56022 (When popping in CTFE, perform validation before jumping to next statement to have a better span for the error)
 - #56048 (Add rustc_codegen_ssa to sysroot)
 - #56091 (Fix json output in the self-profiler)
 - #56097 (Fix invalid bitcast taking bool out of a union represented as a scalar)
 - #56116 (ci: Download clang/lldb from tarballs)
 - #56120 (Add unstable Literal::subspan().)
 - #56154 (Pass additional linker flags when targeting Fuchsia)
 - #56162 (std::str Adapt documentation to reality)
 - #56163 ([master] Backport 1.30.1 release notes)
 - #56168 (Fix the tracking issue for hash_raw_entry)

Failed merges:

r? @ghost

@bors bors merged commit a4279a0 into rust-lang:master Nov 23, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment