Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator.unfold and IterableFactory.unfold #6851

Merged
merged 2 commits into from
Jul 9, 2018

Conversation

NthPortal
Copy link
Contributor

Resolves scala/bug#10955

@NthPortal NthPortal requested a review from julienrf June 24, 2018 04:33
@scala-jenkins scala-jenkins added this to the 2.13.0-M5 milestone Jun 24, 2018
Copy link
Contributor

@Ichoran Ichoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithm assumes immutability, but it shouldn't.

@@ -1,6 +1,7 @@
package scala.collection

import java.io.{ObjectInputStream, ObjectOutputStream}
import java.util.Objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method you're using is so trivial that I don't think it's worth adding this dependency.


override def hasNext: Boolean = {
if (nextResult eq null) {
nextResult = Objects.requireNonNull(f(state), "null during unfold")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic isn't safe if f is side-effecting--for instance, if it generates a random sequence that terminates. You need to have a separate var that remembers when you've hit the last item or somesuch. Personally I'd encode it as a uninitialized bool, and just put (A, S) in its own var with it being null as the indication that the stream has stopped (and then pack init into (null.asInstanceOf[A], init) to get going).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better yet use two fields, one for S and one for A, and either bools or integer state to keep track of what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand where it's unsafe. None indicates that the stream has stopped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I saw that. And I thought I saw a scenario where it wouldn't work, but I'm unable to recreate it. I suspect I got confused somehow.


override def hasNext: Boolean = {
if (nextResult eq null) {
nextResult = Objects.requireNonNull(f(state), "null during unfold")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I saw that. And I thought I saw a scenario where it wouldn't work, but I'm unable to recreate it. I suspect I got confused somehow.

* @tparam S Type of the internal state
*/
def unfold[A, S](init: S)(f: S => Option[(A, S)]): Iterable[A] = {
val initialState = init // unfortunately, `Iterable` has an `init` method
Copy link
Contributor Author

@NthPortal NthPortal Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aliasing bugs me, but I'm honestly not sure what the best course of action is.

init is a good, short parameter name, and I'm not a huge fan of a longer name like initialState. Ideally, all of the unfold methods ought to have the same parameter names, so changing this one means changing LazyList.unfold and Iterator.unfold as well.

Perhaps lengthen it slightly to initial?

Let the bikeshedding commence!

Copy link
Contributor

@Ichoran Ichoran Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you @inline def initialState = init?

@Ichoran
Copy link
Contributor

Ichoran commented Jun 24, 2018

I tested it, and the @inline def doesn't actually work--the call is inlined, but the anonymous inner class still picks up a reference to the enclosing class as if it needed to call the method.

The local val is fine as a rename, but having a non-anonymous class generates slightly cleaner bytecode to begin with. (In particular, it avoids an extra

      3: dup
      4: aload_0

). This is probably JITted out immediately, though. (I didn't check the assembly.)

@NthPortal NthPortal changed the title bug#10955 Add Iterator.unfold bug#10955 Add Iterator.unfold and Iterable.unfold Jun 24, 2018
*/
private final class UnfoldIterable[A, S](initial: S)(f: S => Option[(A, S)]) extends AbstractIterable[A] {
override def iterator: Iterator[A] = Iterator.unfold(initial)(f)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding it to IterableFactory (and SortedIterableFactory, MapFactory, SortedMapFactory) instead, so that this would be available on all collections types?

Also, instead of using an UnfoldIterable class, you can just call from(Iterator.unfold(initial)(f)) (though this would be less efficient than your version).

Copy link

@tarsa tarsa Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnfoldIterable doesn't memoize items since it creates new UnfoldIterator every time iteration is done. Iterable.from just creates an immutable List IIRC. It depends what we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably document Iterable.unfold better to indicate that it doesn't memoize elements

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked scala.collection.View and it seems a better fit than scala.collection.Iterable to put collections without memoization. Views are not memoized by default, while Iterables are backed by immutable.List by default. Instead of changing the semantics for one method (unfold) in Iterable companion object, we can move that method (unfold) to View companion object. WDYT? @LPTK

Copy link
Contributor Author

@NthPortal NthPortal Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a View.Unfold class, then we can have Factory.from(new View.Unfold(initial)(f)) for all types basically for free - similarly to what's done with View.Iterate, View.Tabulate, View.Fill, etc.

@julienrf
Copy link
Contributor

Also, maybe we should replace LazyList.unfold’s implementation with LazyList.fromIterator(Iterator.unfold(initial)(f)). That would address the fact that the current implementation of LazyList.unfold is not tail recursive.

@NthPortal NthPortal changed the title bug#10955 Add Iterator.unfold and Iterable.unfold bug#10955 Add Iterator.unfold and IterableFactory.unfold Jun 28, 2018
@NthPortal NthPortal changed the title bug#10955 Add Iterator.unfold and IterableFactory.unfold Add Iterator.unfold and IterableFactory.unfold Jun 28, 2018
@julienrf
Copy link
Contributor

julienrf commented Jun 28, 2018 via email

@NthPortal
Copy link
Contributor Author

I think this is ready for merge, if anyone wants to re-review

@@ -81,7 +81,7 @@ object SerializationStability {
}
}

// Generated on 20180605-18:45:47 with Scala version 2.13.0-20180604-234247-8dd6ca5)
// Generated on 20180701-21:01:46 with Scala version 2.13.0-20180701-205044-6e3f96b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you're asking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this file overridden? That’s the first time I see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes broke the test, so I had to update the test. Doing so automatically also updates the timestamp and hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure why the changes broke the test, since theoretically they only affect factories and companion objects, which shouldn't store any serialization data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe to regenerate at the moment. All collection classes have new SerialVersionUIDs and are serialization incompatible with M3 and earlier versions anyway.


@Test def unfold(): Unit = {
val it = Iterator.unfold(1)(i => if (i > 10) None else Some((i, i + 1)))
assertSameElements(1 to 10, it)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a test that produces an empty Iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a test checking that creation of Iterator.UnfoldIterator and invoking View.Unfold.iterator doesn't invoke generator function (i.e. one that takes state and produces optionally next element and state)? LazyList.unfold doesn't have this property - it always invokes generator function once, even if the LazyList is never used after LazyList.unfold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarsa that's a good idea - can also do that :)

* @tparam S Type of the internal state
* @return a $coll that produces elements using `f` until `f` returns `None`
*/
def unfold[A : Ev, S](init: S)(f: S => Option[(A, S)]): CC[A] = from(new View.Unfold(init)(f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add it to IterableFactory and EvidenceIterableFactory we should also add it to SortedIterableFactory, MapFactory and SortedMapFactory.

Or we can remove it from XxxFactory types and let users write Iterator.unfold(…)(…).to(Xxx) instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to be consistent with how iterate, tabulate and fill were defined.

Also, if we remove it from IterableFactory, then it's not defined for any factories at all, which seems undesirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I’m fine with this argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienrf IMHO it would be best not to expect users to rely on iterators for common useful operations, as iterators should be regarded as an error-prone (stateful) lowish-level abstraction, mainly used as implementation details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LPTK Then you can just write: View.unfold(…)(…).to(Xxx) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienrf you can't, because object View inherits unfold from IterableFactory. You could do new View.Unfold(...)(...).to(Xxx), but that seems not great.

@NthPortal NthPortal mentioned this pull request Jul 4, 2018
* @tparam A Type of the elements
* @tparam S Type of the internal state
*/
def unfold[A, S](init: S)(f: S => Option[(A, S)]): CC[A] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new generic implementation as efficient as this one and sufficiently lazy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it certainly should be sufficiently lazy, since it just creates a LazyList from a View. I can't speak to its performance (I haven't done any benchmarks), but I wouldn't imagine it to be substantially different.

@@ -81,7 +81,7 @@ object SerializationStability {
}
}

// Generated on 20180605-18:45:47 with Scala version 2.13.0-20180604-234247-8dd6ca5)
// Generated on 20180701-21:01:46 with Scala version 2.13.0-20180701-205044-6e3f96b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe to regenerate at the moment. All collection classes have new SerialVersionUIDs and are serialization incompatible with M3 and earlier versions anyway.

@julienrf julienrf merged commit f34a1e1 into scala:2.13.x Jul 9, 2018
@NthPortal NthPortal deleted the bug#10955/PR branch July 9, 2018 12:59
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Jul 9, 2018
@mslinn
Copy link

mslinn commented Jul 9, 2019

I find it difficult to understand how to use Iterator.unfold, and the Scaladoc is somewhat oblique. I found an SO posting that helped, but the accompanying explanation is terse. Here are the code examples from the SO posting:

val lines1: Try[Seq[String]] =
  Using(new BufferedReader(new FileReader("file.txt"))) { reader =>
    Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
  }

val lines2: Seq[String] =
  Using.resource(new BufferedReader(new FileReader("file.txt"))) { reader =>
    Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
  }

I wonder if it might be helpful to show an unfold equivalent for Scala 2.12?

@NthPortal
Copy link
Contributor Author

I don't disagree that it's oblique. Unfortunately, I'm not sure how to word and explain it better.

To channel my inner Seth, a PR improving the clarity of the docs would probably be accepted. I would do it myself if I had any ideas.

@mslinn
Copy link

mslinn commented Jul 9, 2019

It might be helpful to know a bit about how this method came to be. The use case which inspired the idea, how the original user(s) used this method, the types of tasks that it is particularly well suited, etc. Where did this gem come from?

@NthPortal
Copy link
Contributor Author

NthPortal commented Jul 9, 2019

it was added to LazyList initially in 1049031

@mslinn
Copy link

mslinn commented Jul 9, 2019

Looks like @julienrf did the work. @julienrf, would you be willing to type out information in response to the questions I asked?

It might be helpful to know a bit about how this method came to be. The use case which inspired the idea, how the original user(s) used this method, the types of tasks that it is particularly well suited, etc. Where did this gem come from?

@SethTisue
Copy link
Member

SethTisue commented Jul 9, 2019

In teaching, I would start by teaching iterate, in which we repeatedly feed the output of a function back into the function, collecting the results as we go: Iterator.iterate(0)(_ + 1) generates Iterator(0, 1, 2, 3, ...).

Then, unfold generalizes that in two ways at once:

  • Instead of always iterating indefinitely, we can indicate when we're done by returning a None instead of a Some
  • The value we emit to be included in the result is separate from the value that we pass to the next round of iteration.

An example I like is:

scala 2.13.0> def segments[T](xs: List[T]): List[List[T]] =
            |   List.unfold(xs)(xs =>
            |     if (xs.isEmpty) None
            |     else Some(xs.span(_ == xs.head)))
segments: [T](xs: List[T])List[List[T]]

scala 2.13.0> segments(List(1,1,2,3,3,3,4,5,5))
res0: List[List[Int]] = List(List(1, 1), List(2), List(3, 3, 3), List(4), List(5, 5))

Another example is good old Fibonacci:

scala 2.13.0> LazyList.unfold((1, 1)){case (a, b) => Some((a, (b, a + b)))}
res5: scala.collection.immutable.LazyList[Int] = LazyList(<not computed>)

scala 2.13.0> res5.take(10).force
res6: scala.collection.immutable.LazyList[Int] = LazyList(1, 1, 2, 3, 5, 8, 13, 21, 34, 55)

At each stage of the loop, we need to have kept track of two Ints in order to compute the next element, but we only want to actually emit one of them into the result.

@mslinn
Copy link

mslinn commented Jul 9, 2019

@SethTisue Thanks for the quick response. You addressed "how", which is helpful. I'd also like to know "why" and "when". How does this method compare with alternatives? When is it better, and why? What types of problems might this method be well suited for?

Also I am unclear about how results accumulate.

@NthPortal
Copy link
Contributor Author

NthPortal commented Jul 9, 2019

@mslinn they accumulate results effectively by calling from(Iterator.unfold(start)(op))

@julienrf
Copy link
Contributor

julienrf commented Jul 9, 2019

@mslinn

unfold is to producing collections as fold is to consuming collections. It provides a purely functional way of producing a collection of which we don’t know the size and elements a priori (we compute the elements one after the other, until we compute a None termination signal, or possibly infinitely). I don’t think we have benchmarks about it but I doubt it has a better performance than a while loop and a Builder, which is the imperative analogous of unfold.

It is more powerful than apply (e.g. List(1, 1, 2, 3)), which constrains the user to enumerate all the elements of the collection. It is more powerful than tabulate, which constrains the user to provide the collection length and is limited to collections whose elements can be computed from their index. It is also more powerful than iterate, which is limited to collections whose nth element can be computed from the n-1th element.

We can prove that by implementing iterate, tabulate, and apply in terms of unfold.

def apply[A](as: A*): List[A] =
  List.unfold(as) { case h +: t => Some((h, t)) case _ => None }

def tabulate[A](length: Int)(f: Int => A): List[A] =
  List.unfold(length)(n => if (n > 0) Some((f(n), n - 1)) else None)

def iterate[A](start: A, length: Int)(f: A => A): List[A] =
  List.unfold((length, start)) {
    case (0, _) => None
    case (n, a) => Some((a, (n - 1, f(a))))
  }

Following the principle of least powerful abstraction, you should use unfold when none of apply, tabulate or iterate is powerful enough for you.

@Ichoran
Copy link
Contributor

Ichoran commented Jul 9, 2019

This is great reasoning and I like the explanation, but note that iterate is (approximately) equally powerful when applied to a lazy collection because

def unfold[A, S](start: S)(op: S => Option[(A, S)]): List[A] =
  Iterator.
    iterate(op(start))(_.flatMap{ case (_, s) => op(s) }).
    map(_.map(_._1)).
    takeWhile(_.isDefined).
    flatten.
    toList

Thus, one should unfold not when iterate isn't powerful enough (it always is), but when it is a better match to your meaning (i.e. you create some hidden state that is iterated, from which you can produce your values).

@mslinn
Copy link

mslinn commented Jul 9, 2019

It is helpful to see unfold expressed in terms of Iterator.iterate, thanks.

Why does unfold use List.from instead of List.apply or calling toList on the Iterator.iterate results? The List.from docs don't offer any reasons for when that method should be preferred.

@julienrf
Copy link
Contributor

julienrf commented Jul 9, 2019

List.apply takes a Seq as parameter, not an Iterator. Calling .toList or List.from are equivalent.

@Ichoran
Copy link
Contributor

Ichoran commented Jul 9, 2019

@mslinn - No particular reason. List.from made more sense (stylistically) in the first draft I wrote, but after refactoring it doesn't look as nice as .toList, so I've switched it.

@tgrospic tgrospic mentioned this pull request Aug 2, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
9 participants