New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foldRight broken for large lists #3295

Closed
scabug opened this Issue Apr 14, 2010 · 18 comments

Comments

Projects
None yet
1 participant
@scabug

scabug commented Apr 14, 2010

Is there a good reason not to implement l.foldRight(z)(f) as l.reverse.foldLeft(z)(flip(f)), or some variation? This would avoid the stack overflow that results when using foldRight with large sequences. As it is implemented, the function is not very useful except for toy examples.

@scabug

This comment has been minimized.

scabug commented Apr 14, 2010

@scabug

This comment has been minimized.

scabug commented Apr 19, 2010

@pchiusano said:
Here's a possible implementation of foldRight - this could be added to the Traversable trait as it depends only on foreach.

def foldRight[A,B](z: B)(op: (A,B) => B): B = {
  import scala.collection.mutable.ArrayStack
  val s = new ArrayStack[A] 
  foreach(a => s += a)
  s.elements.foldLeft(z)((b,a) => op(a,b))                 
}

Here's another implementation that I think could have better gc performance:

def foldRight[A,B](z: B)(op: (A,B) => B): B = {
  import scala.collection.mutable.ArrayStack
  val s = new ArrayStack[A] 
  foreach(a => s += a)
  var r = z           
  while (!s.isEmpty) { r = op(s.pop, r) }
  r
}
@scabug

This comment has been minimized.

scabug commented Apr 20, 2010

@odersky said:
I don't think we want to get into this. The proposed change would penalize the performance of foldRight and confuse the computation model. If you want to fold right a large list its trivial to write

xs.reverse.foldLeft

instead of

xs.foldRight

@scabug

This comment has been minimized.

scabug commented Apr 20, 2010

@pchiusano said:
Martin, have you actually measured the performance of the two solutions? Perhaps we should do that rather than speculate. I just did some informal experiments (see below), my second version looks to be faster even for small lists.

Also, can you explain - how does it "confuse the computation model"? Think about what foldRight (the recursive version) is doing - it builds a stack of call frames containing all the elements of the input list, then pops elements from the stack, combining with the current accumulated value. The second version I gave does almost exactly the same thing, except it uses an explicit stack.

The suggestion that you should "just use foldLeft + reverse for large sequences" is not a good api decision, IMO. It's much more common for sequence processing code to be generally ignorant of the expected size of input sequences.

Incidentally, the standard library could use a version of foldRight where the operator is non-strict (this WOULD actually be a different computation model). Maybe I'll file a separate enhancement for that. :)

Note that this was just done in the interactive prompt - I'm not sure if the classes compiled there will get JIT'ed, so more controlled tests should probably be done.

scala> def foldRight[A,B](i: Iterable[A], z: B, op: (A,B) => B): B = {
     |   import scala.collection.mutable.ArrayStack
     |   val s = new ArrayStack[A]
     |   i.foreach(a => s += a)
     |   var r = z
     |   while (!s.isEmpty) { r = op(s.pop, r) }
     |   r
     | }
foldRight: [A,B](Iterable[A],B,(A, B) => B)B

scala>   def time(block: => Any) = {
     |     for (i <- 1 to 1600) block // trigger JIT on block
     |     var goodSample = false
     |     var start = 0L; var stop = 0L;
     |     var N = 500
     |     while (!goodSample) {
     |       start = System.currentTimeMillis
     |       for (i <- 1 to N) block
     |       stop = System.currentTimeMillis
     |       if (stop - start < 500) N = N*4 // require at least half a second for a decent sample
     |       else goodSample = true
     |     }
     |     val perOp = (stop.toDouble - start.toDouble) / N
     |     perOp * 10000
     |   }
time: (=> Any)Double

scala> var N = 5000
N: Int = 5000

scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res8: Double = 2890.0

scala> time((0 to N).foldRight(0)(_ + _))
res9: Double = 5625.0

scala> N = 100
N: Int = 100

scala> time((0 to N).foldRight(0)(_ + _))
res11: Double = 82.96875

scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res12: Double = 59.84375

scala> N = 50
N: Int = 50

scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res14: Double = 32.05078125

scala> time((0 to N).foldRight(0)(_ + _))
res15: Double = 41.5625
@scabug

This comment has been minimized.

scabug commented Apr 22, 2010

@pchiusano said:
Hi, I don't mean to be rude, but I'm reopening this ticket since I don't think Martin's response settled the matter and I fear the ticket may have entered a black hole upon being closed. :)

Also, in general, what is the policy as far as closing tickets? Closing a ticket without any discussion or agreement from the submitter that the issue has been addressed (or the bug really is invalid, shouldn't be fixed, etc.) is a little discouraging for the submitter. :)

@scabug

This comment has been minimized.

scabug commented Apr 23, 2010

johnconnor said:
I gave a solution in: http://lampsvn.epfl.ch/trac/scala/ticket/2818
that combines both approaches (the current one with what you are suggesting).

Unfortunate Martin did not like that either. As it stands now the function is useless -- using it, is a bug waiting to happen.

@scabug

This comment has been minimized.

scabug commented Apr 23, 2010

@pchiusano said:
Yeah, although I don't think it's worth trying to use the recursive version for small lists - it does not appear to be faster, and having to switch between the two implementations complicates the code.

@scabug

This comment has been minimized.

scabug commented Apr 27, 2010

@odersky said:
I believe I set out the reasons why this is a wontfix and I have not changed my mind. It's obvious that foldright is not tail-recursive, so we should make no effort to have it work for long lists. It would just confuse the computation model, and will not work for streams anyway. Furthermore, you then need to do the same thing for reduceRight, last, and so on and so on. I don't want to go there. I realize that seen in isolation every ticket seems hugely important and therefore one tends to overengineer, at the cost of overall simplicity and consistency. I have to be the guardian of those.

Generally, I am not super keen on tickets being reopened if there was a reasoned discussion before. Given the number of tickets, and the fact that so far I have fixed personally about one third of all opened tickets, you can do the math, and deduce how much time I have per ticket. So I really prefer using this scarce time fixing real problems than continuing dicussions. I do not mean this to be rude, but it's just the reality of things.

@scabug

This comment has been minimized.

scabug commented Apr 27, 2010

@pchiusano said:
Martin, you gave two reasons - performance and "confusion of the computation model". I addressed both in my response (I think - still not really sure what you mean by the latter). If you really think the issue is settled, well, I guess we will have to agree to disagree. :)

One thing in your latest response that did not make sense to me and makes me think we are not on the same page - you say it won't work for Streams? Why not? Since the operator of foldRight is strict in its second argument, it cannot be computed in constant space even if the collection is non-strict:

Stream.from(1).foldRight(Stream[Int]())(Stream.cons(_,_))
java.lang.StackOverflowError
        at scala.Stream$$.from(Stream.scala:168)
        at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:168)
        at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:168)
        at scala.Stream$$cons$$$$anon$$2.tail(Stream.scala:69)
        at scala.Stream.foldRight(Stream.scala:433)
        at scala.Stream.foldRight(Stream.scala:433)
        at scala.Stream.foldRight(Stream.scala:433)
        at scala.Stream...

So there's really no reason why the default implementation I gave wouldn't work for Streams. As I mentioned earlier, the standard lib could use a version of foldRight where the operator is non-strict in its second argument. This could be computed in constant space for collections with efficient head and tail methods.

All right, I'm going to stop trying to convince you. :) If you do change your mind I'd be happy to submit a patch for this and also reduceRight and any other relevant methods.

@scabug

This comment has been minimized.

scabug commented Apr 27, 2010

@paulp said:
Replying to [comment:11 pchiusano]:

All right, I'm going to stop trying to convince you. :) If you do change your mind I'd be happy to submit a patch for this and also reduceRight and any other relevant methods.

Such a change is among the easiest to do at any time (implementation change without breaking any compat) and for that reason the best thing to do with this and any issues with similar characteristics is wait. The only important thing right now is shipping 2.8. I wish I could get people applying this level of attention to the things in 2.8 which won't be so easy to change later (which is quite a lot of it.)

@scabug

This comment has been minimized.

scabug commented Jul 25, 2011

@pchiusano said:
Hi, I'd like to reopen this ticket or at least get a definite response about if it will ever be changed.

@scabug

This comment has been minimized.

scabug commented Aug 16, 2011

@paulp said:
Nobody has sufficient omniscience to say definitely it will never be changed; for all we know scala will outlive us all. However there is no sign anyone has changed their mind, so if you need to assume something about the future, I'd assume constancy on this matter.

@scabug

This comment has been minimized.

scabug commented Aug 16, 2011

@pchiusano said:
Well, I can't say I agree, but thanks for the definite response.

@scabug

This comment has been minimized.

scabug commented Aug 9, 2012

Ben Wing (benwing) said (edited on Aug 13, 2012 8:29:35 PM UTC):
I'm going to second Paul Chiusano here. I just got bitten by this problem, which turned out to be in a library I was using. Debugging issues like this are difficult in large-scale code because Java cuts off the bottom stack frames, so you can't even see which code triggered the call to foldRight.

I strongly believe we should either

(1) preferably, fix this issue as Paul C. proposed;
(2) deprecate the function and all other non-tail-recursive functions;
(3) issue a clear warning at compile time about stack overflow problems when the function is encountered;
(4) provide a "safeFoldRight" function that won't trigger stack overflows, and similarly for other dangerous functions.

At the very least, clearly document functions that aren't stack-safe, and indicate how to avoid the issue (e.g. through reverse.foldLeft).

In reality, although Martin may personally have a clear idea which functions are and aren't tail-recursively-safe, the average Scala programmer has little or no idea. As a simple example, I verified experimentally that appending to the end of a very large list does not cause a stack overflow -- but how could I possibly know that? My intuition tells me that the most straightforward implementation will be recursive in a non tail-recursive way. In general, programmers should not have to be aware of implementation details like this, and on top of this, currently there's no documentation as to which functions are stack-safe and which ones aren't.

BTW, I see that others have run into this same issue:

http://oldfashionedsoftware.com/2009/07/10/scala-code-review-foldleft-and-foldright/

http://stackoverflow.com/questions/4085118/why-foldright-and-reduceright-are-not-tail-recursive

http://www.scala-lang.org/node/5947

@scabug

This comment has been minimized.

scabug commented Aug 10, 2012

@paulp said:
Given the status and comment history of this ticket, you probably need to express your view on one of the mailing lists to obtain any result.

@scabug

This comment has been minimized.

scabug commented Aug 13, 2012

Ben Wing (benwing) said:
OK thanks, Paul.

@scabug

This comment has been minimized.

scabug commented Oct 19, 2012

@paulp said:
For the record, #6543 gives us an example of the scala compiler blowing the stack via foldRight.

@scabug

This comment has been minimized.

scabug commented Aug 26, 2013

@SethTisue said:
duplicate of #2818 which was fixed in 2.10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment