What should go in an iterator #17

Open
odersky opened this Issue Jan 17, 2017 · 7 comments

Projects

None yet

5 participants

@odersky
Contributor
odersky commented Jan 17, 2017 edited

This issue is intended to bundle the discussion what should go in an iterator. We had several discussions on PRs before, and Alexandru Nedelcu has written a blog post about some closely related issues:

https://alexn.org/blog/2017/01/16/iterator.html

There's two issues he brings up. I want to concentrate on the second here. He writes:

Well, at this point you should be thinking that this violates the principles of OOP design. When Iterator comes with operations like map and filter, that are polymorphic and can be overridden, it is no longer just a minimal protocol for “iterating over things”, but a big, fat interface.

I think things are not so simple, and the alternatives he quotes are not necessarily better. The first question to ask is:

Do iterators need all these operations?

I think the answer is yes, there is lots of code out there which uses them, and generally users appreciate the convenience that the same methods work everywhere. That's the strength of Scala collections.

[One side remark to illustrate: I observed many newcomers use collections and they were generally delighted by the consistency of the interface. Until they hit sliding and grouped - these are methods that are available only on iterators, not on iterables. The delight turned into disgust - how could one let such an obvious inconsistency persist? So, there's a huge value attached to have the same, rich set of operations work on all collection-like types].

The next question is how to implement these operations:

  1. We could just implement the methods directly or use a mixin trait, as we do now. Then the blog notes that all iterator-yielding methods would have to be overridden if we want them to return a richer iterator such as a CloseableIterator. There are two possible ways to address the problem:

    1. Ignore it. Leave map and friends as they are, returning an iterator. That means if you want
      a closeable iterator, don't use map and friends.
    2. Add machinery as we have for Iterables where map and friends return automatically the
      same kind of Iterable as the one on which they are called.
  2. We could use a "typeclass", i.e. a decorator value class to provide the methods externally. But I don't see how that solves the problem. If we add CloseableIterator then we have the following choices:

    1. Do nothing. Then map and friends will still return Iterators, exactly as before.
    2. Implement another decorator for CloseableIterator. Then we still have to re-implement the same number of transformations as before. Furthermore, we trade dynamic for static dispatch, which means we lose abstraction.

So I don't see the type class approach give us any advantages over plain method definitions here.

My tendency would be to do 1.1., i.e. ignore the problem on the Iterator level. We have much better machinery now to deal with the problem using views. In fact, I can trivially write a CloseableView like this:

trait CloseableView[+A] extends View[A] with IterableLike[A, CloseableView] {
  def close(): Unit
}

And everything works out of the box. In particular, this works:

val xs: CloseableView[Int]
val ys = xs.map(_ + 1)
val zs: CloseableView[Int] = ys

This shows that types of tranforms are indeed preserved on CloseableViews. So, given that we have this nice machinery in views, maybe we should just encourage everyone to use them for fancy stuff and use iterators for basic functionality. In particular when thinking about specializations, one should favor specializing views over specializing iterators.

@alexandru
alexandru commented Jan 17, 2017 edited

@odersky first of all I must say that you paying attention to random ranty blog posts on the Internet and having patience with us is one reason for why Scala is awesome :-)

I think Iterator being fat is a problem, an impediment to people for extending it. I understand that those operators are being used by people, but that's also because Iterable has acted as the substitute for lazy collections, with views not being used much. And there are other use-cases though and I can think of:

  1. to act as the generic interface over collections whenever designing APIs, although currently TraversableOnce is the more generic interface being used thorough Scala's library and in such cases you tend to use the low-level protocol or foldLeft which is basically the same thing; examples being utilities like Future.sequence
  2. because it can describe streaming, it has been abused for describing connections to network sockets (e.g. the sample I can think of is the Apache Kafka consumer, at least before version 10), but in such cases one uses the low-level iteration protocol, because you tend to keep such connections opened and listening for new events for as long as the Java process is active
  3. the Iterator is in essence a mutable and efficient head/tail decomposition, meaning you can process the current element and defer the rest, so even if its protocol cannot describe asynchronous boundaries, you can certainly give an unfinished iterator to some other thread/actor/whatever or pause it for later, but this also implies usage of its low-level iteration protocol

The problem with specialization of something like a CloseableIterator is not the convenience of having the same static type returned, but rather that we can change that communication protocol, which invalidates the correctness of the exposed operators. Now all of a sudden an operation like take(100) is no longer correct and probably needs to be re-implemented, something like this:

def take[A](cursor: CloseableIterator[A], n: Int): CloseableIterator[A] =
  new CloseableIterator[A] {
    private var processed = 0
    def close() = cursor.close()

    def hasNext = {
      try { processed < n && cursor.hasNext }
      catch { case NonFatal(ex) => 
        cursor.close()
        processed = n
        throw ex 
      }
    }

    def next(): A = {
      if (!hasNext) throw new NoSuchElementException
      var canClose = true
      try { 
        val result = cursor.next()
        processed += 1
        if (processed >= n) { canClose = false; cursor.close() } 
        result
      } catch {
        case NonFatal(ex) if canClose => 
          cursor.close()
          throw ex
      }
    }
  }

An interesting question here is whether CloseableIterator upholds the Liskov Substitution Principle, e.g. whether it can really act as a substitute for a plain Iterator and I think it does, as without a take re-implementation, you'd still have the problem of the connection leak with normal iterators.

You can see how this would play out with all the other operators. And of course, polymorphic behavior can be our friend here, after all if you want to take(100), it would be better for the user to have a method on the base interface that does the right thing depending on the underlying type. But then that default implementation provided is now a trap that users can fall into and inheritance itself can raise other issues. For example because of inheritance, the operations available for Iterable don't have consistent behavior with Iterator and that may take people by surprise:

scala> List(1).iterator.map{ x => println("lazy"); x + 1 }
res1: Iterator[Int] = non-empty iterator

scala> List(1).asInstanceOf[Iterable[Int]].map { x => println("strict"); x + 1 }
strict
res2: Iterable[Int] = List(2)

So we have 3 choices:

  1. provide a minimal base version that doesn't expose any operators at all and leave Iterator to be just a minimal interface with which people can do iteration
  2. have those operators specified, but without default implementations, forcing all implementations to deal with it, and maybe throw in a mixin with those implementations for convenience
  3. have those operators with default implementations and assume that if something like CloseableIterator happens, then users are responsible for cleaning after themselves

I really don't know what the best answer is, in spite of the title of that article as this is a really hard design question IMO, but I would have preferred number 1, but wouldn't mind option 2.

@sjrd
Member
sjrd commented Jan 17, 2017

IMO the discussion on CloseableIterator is misleading. From a LSP point of view, Iterator <: CloseableIterator (with a no-op close()), and not the other way around!

Reason: you've given it yourself: if I have an function that takes an Iterator, iterates a bit on it, and throws it away, it's fine by Iterator's standards. But I cannot give it a CloseableIterator because a mandatory call to close() would be missing.

On the other hand, if I have a function taking a CloseableIterator, which properly calls close() when it's done, I can also give it an Iterator with def close(): Unit = ().

Everything I can do with a CloseableIterator, I can also do with an Iterator. But there are things I can do with an Iterator that I am not allowed to do with a CloseableIterator. Therefore, by LSP, Iterator <: CloseableIterator.

@alexandru

@sjrd I don't think it's so clear cut, you can also argue that a take implementation for Iterator behaves just as badly as it does for a CloseableIterator, with CloseableIterator adding extra capabilities or restrictions on what's already there for safety reasons.

We are not changing the way items are pulled from the iterator, we are only adding a recommendation for what should happen when the iteration is stopped, which in practice happens anyway for iterators that have sockets to close, it just doesn't happen by means of the iterator interface.

@szeiger
Collaborator
szeiger commented Jan 17, 2017

I think the CloseableIterator is a red herring. The discussion so far has been about "auto-closing iterators" rather than "closeable iterators".

Unless I'm overlooking something, the implementation of take that @alexandru posted is so complicated because it isn't based on a cohesive specification of what a CloseableIterator should be. You can argue about the details of when the underlying data source should be closed (more about that below) but clearly the iterator that take produces doesn't have to offer more safety guarantees than the iterator that it consumes. If we assume that exhausting the iterator auto-closes the source, we need to add a call to close() in hasNext. But that's all. None of the error handling should be there. If we assume that the iterator itself is responsible for closing the source in case of an error, we do not need it because the underlying iterator has already done it. If we assume that the burden to close the source is on the consumer, we do not need it, either, because we push the responsibility down to our consumer.

So who should be responsible for closing the source on completion or error? As pointed out by @sjrd CloseableIterator <: Iterator doesn't make sense if you assume any auto-closing semantics. We can avoid this problem by treating CloseableIterator a.k.a. Iterator with Closeable as no more than (Iterator, Closeable) and looking at the problem from the perspective of resource ownership (which could be enforced by the type system in Rust). We do not usually think about ownership in Scala because as long as it's only about memory on the heap, the garbage collector takes care of it.

If you get a (Iterator, Closeable) you are the owner of both objects. You can pass the Iterator to a method that consumes it, thereby transferring ownership (i.e. unless the method that you call makes explicit guarantees, you are no longer allowed to use the Iterator afterwards). You are still the owner of the Closeable though, so you are responsible of closing it after the method returns.

@odersky I agree that a CloseableView fits nicely into the current design from a type-level perspective. It should also be easy to implement by adding the required decorators in fromIterable and fromIterableWithSameElemType so you wouldn't have to reimplement any of the actual transformations.

I don't understand your comment regarding specialization though. Views are based on transforming Iterators, so you need to specialize those first (or use a different, specialized abstraction).

@alexandru

I think the CloseableIterator is a red herring

@szeiger it was certainly not intended to mislead you in any way. Your arguments on resource ownership make sense, indeed, we in the managed-heap world don't think about that often enough.

@szeiger
Collaborator
szeiger commented Jan 17, 2017

@alexandru I didn't mean to imply there was any intentional misleading going on, merely that the discussion so far conflated the "closeable" part (which can be specified and implemented cleanly) with auto-closing/

@Ichoran
Ichoran commented Jan 17, 2017

If we don't like big fat interfaces, we can introduce an additional type that is either the underlying mechanism minus the big fat interface, or something which is easily wrapped. For instance,

trait BareIterator[A] {
  def hasNext: Boolean
  def next(): A
}

trait Iterator[A] extends BareIterator[A] {
  def isEmpty = !hasNext
  ...
}

I don't think we actually need to do this--I think the fat interface is fine, and indeed I think the whole point of using Iterator as a thing instead of a pair (A => Boolean, A => B) is to be able to work at a higher level of abstraction--but it hasn't yet been mentioned as an option, so I am mentioning it for completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment