# lampepfl/dotty

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Scala Wart: Convoluted de-sugaring of for-comprehensions #2573

Open
opened this Issue May 29, 2017 · 20 comments

## Comments

Projects
None yet
8 participants
Contributor

### lihaoyi commented May 29, 2017 • edited

 Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed Scala lets you write for-comprehensions, which are converted into a chain of `flatMap`s an `map`s as shown below: ```@ val (x, y, z) = (Some(1), Some(2), Some(3)) x: Some[Int] = Some(1) y: Some[Int] = Some(2) z: Some[Int] = Some(3) @ for{ i <- x j <- y k <- z } yield i + j + k res40: Option[Int] = Some(6) @ desugar{ for{ i <- x j <- y k <- z } yield i + j + k } res41: Desugared = x.flatMap{ i => y.flatMap{ j => z.map{ k => i + j + k } } }``` I have nicely formatted the desugared code for you, but you can try this yourself in the Ammonite Scala REPL to verify that this is what the for-comprehension gets transformed into. This is a convenient way of implementing nested loops over lists, and happily works with things that aren't lists: `Option`s (as shown above), `Future`s, and many other things. You can also assign local values within the for-comprehension, e.g. ```@ for{ i <- x j <- y foo = 5 k <- z } yield i + j + k + foo res42: Option[Int] = Some(11)``` The syntax is a bit wonky (you don't need a `val`, you can't define `def`s or `class`es or run imperative commands without `_ = println("debug")`) but for simple local assignments it works. You may expect the above code to be transformed into something like this ```res43: Desugared = x.flatMap{ i => y.flatMap{ j => val foo = 5 z.map{ k => i + j + k } } }``` But it turns out it instead gets converted into something like this: ```@ desugar{ for{ i <- x j <- y foo = 5 k <- z } yield i + j + k + foo } res43: Desugared = x.flatMap(i => y.map{ j => val foo = 5 scala.Tuple2(j, foo) }.flatMap((x\$1: (Int, Int)) => (x\$1: @scala.unchecked) match { case Tuple2(j, foo) => z.map(k => i + j + k + foo) } ) )``` Although it is roughly equivalent, and ends up with the same result in most cases, this output format is tremendously convoluted and wastefully inefficient (e.g. creating and taking-apart unnecessary `Tuple2`s). As far as I can tell, there is no reason at all not to generated the simpler version of the code shown above. PostScript: Notably, these two desugarings do not always produce the same results! The current desugaring behaves weirdly in certain cases; here is one that just bit me an hour ago: ```Welcome to Scala 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112). Type in expressions for evaluation. Or try :help. scala> for{ | x <- Right(1).right | y = 2 | z <- Right(3).right | } yield x + y + z :13: error: value flatMap is not a member of Product with Serializable with scala.util.Either[Nothing,(Int, Int)] x <- Right(1).right ^ :16: error: type mismatch; found : Any required: String } yield x + y + z ^``` This specific problem has gone away in 2.12 because `Either` doesn't need `.right` anymore, but the language-level problem is still there: `y = 2` ends up causing strange, difficult-to-debug errors due to the weirdness of the desugaring. This would not be an issue at all given the desugaring I proposed.
Contributor

### odersky commented Jun 19, 2017

 It would be great if we could use a more efficient and intuitive desugaring. But there's the problem with conditions. How would you translate the following: ``````for { x <- List(1, 2, 3) y = x * x if y * x > 4 } yield y + x `````` I don't think there's a way to do it that does not essentially fall back to the scheme we have. We could think of using different schemes for for-comprehensions with val defs before tests and for-comprehensions without. But then the whole thing would get more complex, not less so. Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.
Contributor

### lihaoyi commented Jun 19, 2017 • edited

 Here's one idea for solving the "val assignment before if" problem. Not sure if it's too crazy, but what if your example ```for { x <- List(1, 2, 3) y = x * x if y * x > 4 } yield y + x``` Desugared into ```{ val tmp = List(1, 2, 3) tmp.flatMap{ x => val y = x * x tmp.point(()).filter(y * x > 4).map{ _ => y + x } } }``` Where in addition to needing `.withFilter` on a type to make `if`-guards work, we would ask for a monadic `.point`. `List().point(x)` would be `List(x)`, and similar things could be defined for `Option`, `Future`, etc.. like Scalaz/Cats already do After `.point` and `.filter` are called, the "normal" `.map` or `.flatMap` takes over from there. The choice of methods is a bit arbitrary, there are other pairs of methods that could work, e.g. `.pointUnit: M[Unit]`, rather than a more generic `.point: M[T]` or `.point` and `.empty` rather than `.point` and `.filter`. We could also make it implicit based-on-type (e.g. with some sort of `def point[T](t: T)(implicit point: Point[T]) = point` and `point(tmp)`) rather than as a method, but all that is orthogonal to the core idea. I think this should work for all arrangements of `val` assignments, `if`-guards and `<-` bindings, since in this scheme `if`-guards: `if foo` basically desugar into a `_ <- tmp.point(()).filter(foo)` binding on `tmp` object that was last `<-`ed. There are probably other subtleties with this (performance???) that I haven't thought through, and of course it's a much bigger change than the initial post describes. But the fundamental idea seems like it might work, and even with the unusualness of `.point` it would still be a simpler desugaring than the status quo. EDIT: on further thought, `.point` feels a lot less sketchy than `.forUnit`, so restructured the comment around that
Member

### smarter commented Jun 19, 2017

 @lihaoyi This is pretty close to the way `guard` works in Haskell: https://en.wikibooks.org/wiki/Haskell/Alternative_and_MonadPlus#guard
Contributor

### odersky commented Jun 19, 2017

 It would indeed be interesting to look into how we can use something like `guard` for an alternative desugaring.
Contributor

### lihaoyi commented Jun 19, 2017 • edited

 Thanks for the pointer @smarter! If we follow Haskell's `guard`, and thus choose to go with `point` and `empty` rather than `point` and `filter`, the desugaring ends up being quite nice: ```for { x <- List(1, 2, 3) y = x * x if y * x > 4 } yield y + x``` ```{ val tmp = List(1, 2, 3) tmp.flatMap{ x => val y = x * x if (!(y * x > 4)) tmp.empty else tmp.point(()).map(_ => y + x) // or flatMap if there's more stuff } }``` The `tmp.point(()).map` is a bit redundant: it could be just `tmp.point(...)` in this case: ```{ val tmp = List(1, 2, 3) tmp.flatMap{ x => val y = x * x if (!(y * x > 4)) tmp.empty else tmp.point(y + x) } }``` and if there are subsequent generators we wouldn't need it at all: ```for { x <- List(1, 2, 3) y = x * x if y * x > 4 z <- 0 until y } yield y + x + z``` ```{ val tmp = List(1, 2, 3) tmp.flatMap{ x => val y = x * x if (!(y * x > 4)) tmp.empty else (0 until y).map(z => y + x + z) } }``` I think this beautifully captures the symmetry between the `if`-guard within the for-comprehension and the equivalent `if`-statement in the desugared `flatMap`/`map` code. Alternately, we could go with `flatMap` and always include a `point` at the end, and not use `map` at all. I think that's what Haskell basically does? I guess there's some value in making sure both `point` and `map` always end up being called, even if just for the not-changing-too-many-things-at-once value? If so we could stick with `tmp.point(()).map`. Not really sure what the tradeoffs here are Anyway, I think any of these variants, if we could make it work, would be superior to the status quo desugaring
Contributor

### odersky commented Jun 20, 2017

 It would be great if this could work, but there's a problem. ```for { x <- List(1, 2, 3) y = x * x if y * x > 4 z <- 0 until y } yield y + x + z``` ```{ val tmp = List(1, 2, 3) tmp.flatMap{ x => val y = x * x if (!(y * x > 4)) tmp.empty else (0 until y).map(z => y + x + z) } }``` What is the type of the if-else? It's the lub of the type of `tmp.empty`, which is `List[Int]` and the type of the mapped range, which is `IndexedSeq[Int]`. So the result type is `Seq` here which is fine. But in general the type of the argument to `flatMap` can be completely unrelated to the type of the original generator. Our translations work regardless of types, so we cannot assume that the generated if-else would be typable. Logically we'd have to pick the empty on the argument to `flatmap` argument instead of the empty of the original generator. But I don't see a general way to get that.
Contributor

### odersky commented Jun 20, 2017

 EDIT: Maybe it's not so bad. We could simply require that `empty` is of a type which is compatible with the argument type of the flatMap. Is there a use for-expressions where this would not be the case? So if we accept that, the next problem is how to retrofit all collections and other types on which for-expressions with ifs are performed so that they have an `empty`. That's still a large problem.

Closed

Open

Contributor

### ShaneDelmore commented Jul 20, 2017

 What is the empty value for Either[A, B]? It seems it must be Left[A] but I don’t know where I could get an A value from, unless there was the ability to put an implicit left in scope that would be used by the for comprehension desugaring.
Contributor

### lihaoyi commented Jul 20, 2017

 `Either[A, B]` does not support `.filter` even today, so we probably don't need to worry about it not supporting `.filter` due to lack of empty values or whatever ```@ Right(1).filter(_ => false) cmd0.sc:1: value filter is not a member of scala.util.Right[Nothing,Int] val res0 = Right(1).filter(_ => false) ^ Compilation Failed```
Contributor

### ShaneDelmore commented Jul 20, 2017

 It’s true, but it does confuse the heck out of everyone the first time they hit it so if we were able to come up with a solution at the same time it would be useful. The only idea I have so far is as mentioned above, allow an implicit empty to be found in scope which would allow the use of filter as long as the user is able to supply an empty in scope, which for me would have been possible every time I have run into the issue.
Contributor

### lihaoyi commented Jul 20, 2017

 I think the correct answer to the confusion is to get people used to the fact that you can't always `filter` on a monad. There is a huge number of monads for which `filter` doesn't make sense: `State`, `Reader`, `Writer`, `Function0`, etc.. In a way I feel that Scala's "put `.filter` on anything" convention is itself a bit of a wart
Contributor

### ShaneDelmore commented Jul 20, 2017

 You are winning me over, I agree, the confusion has always arisen from the expectation that it would just work and eliminating that expectation is probably a better way to go.

### pathikrit commented Aug 6, 2017

 Actually ran into a big wart recently related to this. See @frosforever's SO post here: https://stackoverflow.com/questions/45333889/regex-unapply-in-for-comprehension-with-if-guard-not-compiling tl;dr - This does not compile: ```for { s <- List.empty[String] regex <- List.empty[scala.util.matching.Regex] regex(ss) = s if ss == "foo" } yield s``` But removing the if: ```for { s <- List.empty[String] regex <- List.empty[scala.util.matching.Regex] regex(ss) = s } yield s``` or rearranging the order of the two lists in the for comprehension: ```for { regex <- List.empty[scala.util.matching.Regex] s <- List.empty[String] regex(ss) = s if ss == "foo" } yield s``` makes it compile
Contributor

### odersky commented Jan 24, 2018

 It would be nice if we could improve on this. But we'd need someone to put in the effort.

Closed

### LPTK commented Apr 21, 2018 • edited

 Can we at least do something about imperative `for` comprehensions? They have no justification for generating such horrible code. The following: `for { x <- xs; y = x; if y > 0; z <- zs } println(x + z)` should generate: ```xs.foreach { x => val y = x if (y > 0) zs.foreach { z => println(x + z) } }``` instead of the current convoluted and inefficient: ```xs.map { x => val y = x (x, y) }.withFilter { case (x,y) => y > 0 }.foreach { case (x,y) => zs.foreach { z => println(x + z) } }``` which creates an intermediate list, allocates tuples, and calls four closure-consuming methods instead of just 2. No wonder people are surprised how much more efficient it is to use `while` loops rather than `for` loops in Scala 😅 I believe the fix for this one is a low-hanging fruit. I could try to do it at some point if people agree with the idea.

### LPTK commented Apr 21, 2018 • edited

 As for functional loops, I think it could be solved in a less artificial way by using `flatten` and a simple method called `flattenOptions`, of type `M[Option[A]] => M[A]`. This code: `for { x <- xs; t = x+1; if t > 0; y <- ys } yield x + y` would produce: ```xs.map { x => val t = x+1 if (t > 0) Some(ys.map { y => x + y }) else None }.flattenOptions.flatten``` and this code: `for { x <- xs; y <- ys; t = y+1; if x > t } yield x + y` would produce: ```xs.flatMap { x => ys.map { y => val t = y+1 if (x > t) Some(x + y) else None }.flattenOptions }``` Compared to `point` and `empty`, which have no business whatsoever being defined on monad/collection values, `flattenOptions` is an independently useful method. In the case of Scala standard collections, it can actually be the same as `flatten` (which only requires the element type to be traversable, and so is `Option`). In addition, `flattenOptions` can somewhat easily be defined as an extension method: ```implicit class FlattenOptions[A,B](xs: FilterMonadic[Option[A],B]) { def flattenOptions[That](implicit cbf: CanBuildFrom[B,A,That]) = xs.withFilter(_.isDefined).map(_.get) }``` Also, not sure if it would fit in the complexity budget because it makes things less regular, but I would still prefer something easier like: `for { x <- xs; y <- ys if x > 0 } yield x + y` to desugar to the more natural: ```xs.flatMap { x => ys.filter(y => x > 0).map { y => x + y } }``` which does not allocate any options, instead of: ```xs.flatMap { x => ys.map { y => if (x > 0) Some(x + y) else None }.flattenOptions }```
Member

### mdmilosz commented Apr 30, 2018 • edited

 Hi all. I've been thinking about this issue for a while and it looks like the main issue stems from how assignment and condition clauses are desugared in the `for`-comprehensions. With that assumption, I started considering a plausible unification, so to speak, of the different clauses. Let be given the following `for`-comprehension: ``` for { x <- List(1, 2, 3) z = 1 if x > z y <- List(10, 20, 30) } yield x + y``` Let's assume the existence of such a unary constructor `single(a: A): F[A]` that we can rewrite `z = 1` as `z <- single(1)`. (For instance, for the `List` type, `single(a: A): List[A] = List(a)`.) Analogically, let's assume the existence of such a unary constructor `cond(b: Boolean): F[Unit]` that we can rewrite `if x > z` as `_ <- cond(x > z)`. (For instance, for the `List` type, `cond(b: Boolean): List[Unit] = List(()).filter(_ => b)`.) Then we can write the above `for`-comprehension as follows: ``` for { x <- List(1, 2, 3) z <- single(1) _ <- cond(x > z) y <- List(10, 20, 30) } yield x + y``` Or, after replacing the constructors with their implementation: ``` for { x <- List(1, 2, 3) z <- List(1) _ <- List(()).filter(_ => x > z) y <- List(10, 20, 30) } yield x + y``` It might look that, since we only added boilerplate, the resulting desugared code would be longer and more complex. However, if my IDE and Lihaoyi's Ammonite shell are to be believed, the opposite happens. (In fact, the desugared version of the original, generated by IntelliJ IDEA with the Scala plugin, doesn't even compile!) I've tried to write a minimalistic type class system in 2.12 that would encompass implicit resolution of the `single` and `cond` constructors, but encountered either type mismatch errors or ambiguous implicit values errors whenever I tried to implement the constructors for more than one type. Regardless of the need of providing `simple` and `cond` for every type that might need to be used in a `for`-comprehension, there are some other drawbacks to this concept: the source code needs to be pre-processed even before applying the desugaring stage (perhaps a bit of this logic could be refined and integrated into the desugaring proper?); the source after desugaring might need to be post-processed in order to simplify some specific cases (e.g. `single(x).flatMap(f)` → `f(x)`, `single(x).map(f)` → `single(f(x))` etc; as above, maybe it could be turned into a part of a larger piece of logic). I would be happy to receive feedback on whether there is any merit to this idea.

### pathikrit commented Apr 30, 2018

 What is the downside to just desugaring into what this does: https://github.com/oleg-py/better-monadic-for? This is the least surprising behaviour (it is what an user expects in her mental model) and does not break current code (even if it does you will get a compile error)
Contributor

### Blaisorblade commented May 1, 2018 • edited

 This is the least surprising behaviour (it is what an user expects in her mental model) "Desugar bindings as vals instead of tuples" might be fine, but desugaring without `withFilter` isn't what I expect as an user. and does not break current code (even if it does you will get a compile error) Which compile error? If some members fail the match you get a runtime failure instead of skipping the members — nothing in the README makes me expect errors here: ```object TestExtract { def unapply(x: Any): Boolean = false } 1 match { case TestExtract() => 1; case _ => 0 } for (TestExtract() <- List(1, 2, 3)) yield ()``` The only error I anticipate is for extractors that affect the expected type, unlike the above one. Tho the README doesn't present any such example. `(_: List[Int]).map( { case (x, y) => x + y })` EDIT: just confirmed that code gives a runtime failure. ```// without plugin scala> for (TestExtract() <- List(1, 2, 3)) yield () res1: List[Unit] = List() // with plugin scala> for (TestExtract() <- List(1, 2, 3)) yield () scala.MatchError: 1 (of class java.lang.Integer) at .\$anonfun\$res2\$1(:13) at scala.runtime.java8.JFunction1\$mcVI\$sp.apply(JFunction1\$mcVI\$sp.java:12) at scala.collection.immutable.List.map(List.scala:283) ... 36 elided``` Of course that is not meant as a realistic example, but using an extractor with type `unapply(x: T)` on a `List[T]` will not give compiler errors, whether the extractor is going to always succeed, or not. What you'd want is to only omit `withFilter` for irrefutable patterns, that is (maybe) those built out of the following elementary irrefutable patterns: extractors returning type `Some[...]` or `true.type` matches on the only case class of a sealed hierarchy maybe others ???

Open

### odersky referenced this issue May 31, 2018

Merged

#### Stop rewriting withFilter to filter #4602

to join this conversation on GitHub. Already have an account? Sign in to comment