Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of .fold changed with Scala 2.13 #11550

Closed
DaniRey opened this issue May 31, 2019 · 21 comments

Comments

Projects
None yet
8 participants
@DaniRey
Copy link

commented May 31, 2019

The behavior of .fold changed (at least for ArrayOps) with Scala 2.13. If there is only one element in the Array, this element is returned, without executing the fold operation, together with the neutral element.

This breaks code which expects the neutral element to be applied at least once. Although it is correct according to the documentation, it is unexpected to many developers.

z a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication).

Example

object Fold extends App {
  val a = "rocks".split(" ").fold("")(_ + "scala " + _ )
  println(a)
}

Prints scala rocks with 2.12, but prints only rocks with 2.13. See https://scastie.scala-lang.org/QFBe1PcoTDu08EbvJMR88A

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

Note that the following also give different answers presently:

Array(1).fold(2)(_ * _)  // 1
List(1).fold(2)(_ * _)   // 2
Vector(1).fold(2)(_ * _) // 2

Given that 2 isn't a zero of _ * _, this is I guess not strictly illegal, but it sure is surprising, especially since you might be thinking that you're good with using the zero as an initial value as long as the operator is commutative and distributive. (I.e. "you can rearrange it in any order".)

@dwijnand

This comment has been minimized.

Copy link
Member

commented May 31, 2019

So if this is undefined behaviour (using fold with unlawful zero and combine operator) that has changed, that's not a regression/bug is it?

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

I would be more inclined to say that this illustrates that the API for fold is buggy and should be fixed or the method should be deprecated. An API for which undefined behavior is easy to evoke by accident is not a very desirable API.

@Jasper-M

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Perhaps fold should enforce its precondition with assert(op(z, z) == z). Or maybe it should indeed be deprecated. Weren't the parallel collections its raison d'être?

@dwijnand

This comment has been minimized.

Copy link
Member

commented May 31, 2019

@Ichoran Normally we refer to behaviours or implementations as buggy. I agree with you that the API is buggy.

@ashawley

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Seems like an implementation bug to me:

scala> Array(1).fold(2)(_ * _)
res0: Int = 1

scala> Array(1).foldLeft(2)(_ * _)
res1: Int = 2

scala> Array(1).foldRight(2)(_ * _)
res2: Int = 2

@SethTisue SethTisue added this to the 2.13.1 milestone May 31, 2019

@SethTisue SethTisue modified the milestones: 2.13.1, 2.14.0-M1 May 31, 2019

@SethTisue

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Seems like an implementation bug to me

Why? given

z a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication).

@SethTisue

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Weren't the parallel collections its raison d'être?

Yes, absolutely. It was never intended as "foldLeft or foldRight I don't care which", which seems to be how @ashawley is thinking of it.

I don't see a bug here except perhaps a naming bug. Should we give it a more elaborate name than fold, to signal more strongly that it is a different beast?

@SethTisue

This comment has been minimized.

Copy link
Member

commented May 31, 2019

Or maybe it should indeed be deprecated

That seems like a good suggestion, since I can't think of a use case for it outside of the parallel collections?

maybe @axel22 has an opinion here

@szeiger

This comment has been minimized.

Copy link

commented May 31, 2019

I agree that this is not a bug. Any code that relied on fold to use the neutral element exactly once would already have broken in 2.12 with parallel collections. Now that they don't share the same interfaces anymore we are free to deprecate and remove fold from the standard library if people find it too confusing. It still exists because it can be implemented more efficiently in some cases (but that's only because it is free to use the neutral element any number of times).

@ashawley

This comment has been minimized.

Copy link
Member

commented May 31, 2019

z a neutral element for the fold operation

I saw this, but this reads like a suggestion to the user, and not a requirement of the implementation.

@eed3si9n

This comment has been minimized.

Copy link
Member

commented May 31, 2019

z a neutral element for the fold operation;

I think the key part of the Scaladoc is what follows that sentence.

may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication).

Thus given Array(1).fold(2)(_ * _) the following are all valid answers according to Scaladoc:

  • 1
  • 2
  • 4
  • 6
  • and so on...
@ashawley

This comment has been minimized.

Copy link
Member

commented May 31, 2019

That would be interesting. Still, it's just a wordy way to describe to the user that z is supposed to not do anything.

@eed3si9n

This comment has been minimized.

Copy link
Member

commented May 31, 2019

IMO calculation order being non-deterministic doesn't necessarily lead to parallel collections. If that's the case, we should do something about Set(1, 2, 3, 4, 5).toSeq as well.

A better API might be to define scala.math.CommutativeMonoid and accept an instance of CommutativeMonoid[Int] as a parameter like we do with scala.math.Ordering. I also agree the name fold probably should change.

@DaniRey

This comment has been minimized.

Copy link
Author

commented May 31, 2019

Can we add information about this change in behavior to https://docs.scala-lang.org/overviews/core/collections-migration-213.html?

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

@eed3si9n - This isn't just an issue about calculation order, though. The API requires not just commutativity and distributivity of the operation op, but also that z is an identity element. This is only important for parallel computations where you unpack the fold to such an extent that it might even be empty, and then want only a value back.

In particular, Set(1, 2).fold(z)(_ + _) could evaluate to 1 + 2 or 2 + 1 or z + 1 + 2 or 2 + z + z + z + z + 1 + z with the promises that the API currently gives.

DaniRey added a commit to unic/ScalaWebTest that referenced this issue May 31, 2019

Add support for Scala 2.13, remove support for Scala 2.10 #72
  - compile against Scala 2.12.8 instead of 2.12.7
  - add Scala 2.13.0-RC3
  - remove Scala 2.10.7
  - keep default Scala version at 2.12.x because of https://youtrack.jetbrains.com/issue/SCL-15558
  - upgrade ScalaTest dependency to 3.0.8-RC5
  - upgrade Selenium to 3.141.59
  - upgrade HtmlUnit to 2.35.1
  - upgrade slf4j to 1.7.26
  - upgrade play-json to 2.8.0-M1 for Scala 2.13 and 2.12
  - upgrade play-json to 2.7.0 for Scala 2.11.x (as version 2.8.0-M1 is not available for 2.11.x)
  - upgrade sbt from 0.13.16 to 0.13.18 (intermediate step before Update to SBT 1.0 - #75)
  - getting rid of procedure syntax (f(): Unit = {} instead of f(){}), because it is deprecated with 2.13
  - replace .fold with .mkString, because it is simpler and because of scala/bug#11550 (which is not a bug, just a change in behavior)

DaniRey added a commit to unic/ScalaWebTest that referenced this issue May 31, 2019

Add support for Scala 2.13, remove support for Scala 2.10 #72
  - compile against Scala 2.12.8 instead of 2.12.7
  - add Scala 2.13.0-RC3
  - remove Scala 2.10.7
  - keep default Scala version at 2.12.x because of https://youtrack.jetbrains.com/issue/SCL-15558
  - upgrade ScalaTest dependency to 3.0.8-RC5
  - upgrade Selenium to 3.141.59
  - upgrade HtmlUnit to 2.35.1
  - upgrade slf4j to 1.7.26
  - upgrade play-json to 2.8.0-M1 for Scala 2.13 and 2.12
  - upgrade play-json to 2.7.0 for Scala 2.11.x (as version 2.8.0-M1 is not available for 2.11.x)
  - upgrade sbt from 0.13.16 to 0.13.18 (intermediate step before Update to SBT 1.0 - #75)
  - getting rid of procedure syntax (f(): Unit = {} instead of f(){}), because it is deprecated with 2.13
  - replace .fold with .mkString, because it is simpler and because of scala/bug#11550 (which is not a bug, just a change in behavior)
@ashawley

This comment has been minimized.

Copy link
Member

commented May 31, 2019

In scala/scala#8114, I try to fix fold for Array by just making it consistent with fold elsewhere in collections by calling foldLeft. Presumably, there isn't a reason to re-implement fold in ArrayOps, nor a reason to have a different behavior from 2.12 when size is one.

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

@ashawley - Well, if people do a lot of folds over arrays of size one, it makes a huge speed difference that you just return the only value; and for small ones, the overhead of actually using rather than skipping the zero element may be noticeable. That said, I would anticipate that the difference is small.

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

Initial testing in Thyme says that the removed version takes only about 60% of the time of the foldLeft version with a single element, and only about 70% with four elements. However, it crosses over at five, which makes me think that a lot of the difference is the manual loop unrolling of one more element. And for large arrays (100), the removed version actually takes about 20% longer.

All kinda sketchy because I am doing it by hand in the REPL, but it looks self-consistent.

@Ichoran

This comment has been minimized.

Copy link

commented May 31, 2019

Note: the timings are slightly different depending on whether you pull out val length = xs.length - 1 or not, but the take-home message is similar--each is better in a certain regime. (In the case without val inline, foldLeft wins at medium sizes and at large sizes both are identical.)

@ashawley

This comment has been minimized.

Copy link
Member

commented May 31, 2019

That is interesting. Usingreduce, instead of fold, could perhaps be an optimization for such cases.

@dwijnand dwijnand modified the milestones: 2.14.0-M1, 2.13.1 Jun 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.