New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala Wart: Convoluted de-sugaring of for-comprehensions #2573

Open
lihaoyi opened this Issue May 29, 2017 · 20 comments

Comments

Projects
None yet
8 participants
@lihaoyi
Copy link
Contributor

lihaoyi commented May 29, 2017

Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed


Scala lets you write for-comprehensions, which are converted into a chain
of flatMaps an maps as shown below:

@ val (x, y, z) = (Some(1), Some(2), Some(3))
x: Some[Int] = Some(1)
y: Some[Int] = Some(2)
z: Some[Int] = Some(3)
@ for{
    i <- x
    j <- y
    k <- z
  } yield i + j + k
res40: Option[Int] = Some(6)
@ desugar{
    for{
      i <- x
      j <- y
      k <- z
    } yield i + j + k
  }
res41: Desugared = x.flatMap{ i => 
  y.flatMap{ j => 
    z.map{ k => 
      i + j + k
    }
  }
}

I have nicely formatted the desugared code for you, but you can try this
yourself in the Ammonite Scala REPL to
verify that this is what the for-comprehension gets transformed into.

This is a convenient way of implementing nested loops over lists, and happily
works with things that aren't lists: Options (as shown above), Futures,
and many other things.

You can also assign local values within the for-comprehension, e.g.

@ for{
    i <- x
    j <- y
    foo = 5
    k <- z
  } yield i + j + k + foo
res42: Option[Int] = Some(11)

The syntax is a bit wonky (you don't need a val, you can't define defs or
classes or run imperative commands without _ = println("debug")) but for
simple local assignments it works. You may expect the above code to be
transformed into something like this

res43: Desugared = x.flatMap{ i => 
  y.flatMap{ j =>
    val foo = 5
    z.map{ k => 
      i + j + k
    }
  }
}

But it turns out it instead gets converted into something like this:

@ desugar{
    for{
      i <- x
      j <- y
      foo = 5
      k <- z
    } yield i + j + k + foo
  }
res43: Desugared = x.flatMap(i => 
  y.map{ j =>
    val foo = 5
    scala.Tuple2(j, foo)
  }.flatMap((x$1: (Int, Int)) => 
    (x$1: @scala.unchecked) match {
    case Tuple2(j, foo) => z.map(k => i + j + k + foo)
    }
  )
)

Although it is roughly equivalent, and ends up with the same result in most
cases, this output format is tremendously convoluted and wastefully inefficient
(e.g. creating and taking-apart unnecessary Tuple2s). As far as I can tell,
there is no reason at all not to generated the simpler version of the code
shown above.

PostScript:

Notably, these two desugarings do not always produce the same results! The current desugaring behaves weirdly in certain cases; here is one that just bit me an hour ago:

Welcome to Scala 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala>  for{
     |    x <- Right(1).right
     |    y = 2
     |    z <- Right(3).right
     |  } yield x + y + z
<console>:13: error: value flatMap is not a member of Product with Serializable with scala.util.Either[Nothing,(Int, Int)]
          x <- Right(1).right
            ^
<console>:16: error: type mismatch;
 found   : Any
 required: String
        } yield x + y + z
                    ^

This specific problem has gone away in 2.12 because Either doesn't need .right anymore, but the language-level problem is still there: y = 2 ends up causing strange, difficult-to-debug errors due to the weirdness of the desugaring. This would not be an issue at all given the desugaring I proposed.

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jun 19, 2017

It would be great if we could use a more efficient and intuitive desugaring. But there's the problem with conditions. How would you translate the following:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x

I don't think there's a way to do it that does not essentially fall back to the scheme we have. We could think of using different schemes for for-comprehensions with val defs before tests and for-comprehensions without. But then the whole thing would get more complex, not less so.

Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.

@lihaoyi

This comment has been minimized.

Copy link
Contributor

lihaoyi commented Jun 19, 2017

Here's one idea for solving the "val assignment before if" problem. Not sure if it's too crazy, but what if your example

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x

Desugared into

{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    tmp.point(()).filter(y * x > 4).map{ _ =>
      y + x
    }
  }
}

Where in addition to needing .withFilter on a type to make if-guards work, we would ask for a monadic .point. List().point(x) would be List(x), and similar things could be defined for Option, Future, etc.. like Scalaz/Cats already do

After .point and .filter are called, the "normal" .map or .flatMap takes over from there.

The choice of methods is a bit arbitrary, there are other pairs of methods that could work, e.g. .pointUnit: M[Unit], rather than a more generic .point: M[T] or .point and .empty rather than .point and .filter. We could also make it implicit based-on-type (e.g. with some sort of def point[T](t: T)(implicit point: Point[T]) = point and point(tmp)) rather than as a method, but all that is orthogonal to the core idea.

I think this should work for all arrangements of val assignments, if-guards and <- bindings, since in this scheme if-guards:

if foo

basically desugar into a

_ <- tmp.point(()).filter(foo)

binding on tmp object that was last <-ed.

There are probably other subtleties with this (performance???) that I haven't thought through, and of course it's a much bigger change than the initial post describes. But the fundamental idea seems like it might work, and even with the unusualness of .point it would still be a simpler desugaring than the status quo.

EDIT: on further thought, .point feels a lot less sketchy than .forUnit, so restructured the comment around that

@smarter

This comment has been minimized.

Copy link
Member

smarter commented Jun 19, 2017

@lihaoyi This is pretty close to the way guard works in Haskell: https://en.wikibooks.org/wiki/Haskell/Alternative_and_MonadPlus#guard

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jun 19, 2017

It would indeed be interesting to look into how we can use something like guard for an alternative desugaring.

@lihaoyi

This comment has been minimized.

Copy link
Contributor

lihaoyi commented Jun 19, 2017

Thanks for the pointer @smarter! If we follow Haskell's guard, and thus choose to go with point and empty rather than point and filter, the desugaring ends up being quite nice:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else tmp.point(()).map(_ => y + x) // or flatMap if there's more stuff
  }
}

The tmp.point(()).map is a bit redundant: it could be just tmp.point(...) in this case:

{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else tmp.point(y + x)
  }
}

and if there are subsequent generators we wouldn't need it at all:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else (0 until y).map(z => y + x + z)
  }
}

I think this beautifully captures the symmetry between the if-guard within the for-comprehension and the equivalent if-statement in the desugared flatMap/map code.

Alternately, we could go with flatMap and always include a point at the end, and not use map at all. I think that's what Haskell basically does?

I guess there's some value in making sure both point and map always end up being called, even if just for the not-changing-too-many-things-at-once value? If so we could stick with tmp.point(()).map. Not really sure what the tradeoffs here are

Anyway, I think any of these variants, if we could make it work, would be superior to the status quo desugaring

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jun 20, 2017

It would be great if this could work, but there's a problem.

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else (0 until y).map(z => y + x + z)
  }
}

What is the type of the if-else? It's the lub of the type of tmp.empty, which is List[Int] and the type of the mapped range, which is IndexedSeq[Int]. So the result type is Seq here which is fine. But in general the type of the argument to flatMap can be completely unrelated to the type of the original generator. Our translations work regardless of types, so we cannot assume that the generated if-else would be typable. Logically we'd have to pick the empty on the argument to flatmap argument instead of the empty of the original generator. But I don't see a general way to get that.

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jun 20, 2017

EDIT: Maybe it's not so bad. We could simply require that empty is of a type which is compatible with the argument type of the flatMap. Is there a use for-expressions where this would not be the case?

So if we accept that, the next problem is how to retrofit all collections and other types on which for-expressions with ifs are performed so that they have an empty. That's still a large problem.

@ShaneDelmore

This comment has been minimized.

Copy link
Contributor

ShaneDelmore commented Jul 20, 2017

What is the empty value for Either[A, B]? It seems it must be Left[A] but I don’t know where I could get an A value from, unless there was the ability to put an implicit left in scope that would be used by the for comprehension desugaring.

@lihaoyi

This comment has been minimized.

Copy link
Contributor

lihaoyi commented Jul 20, 2017

Either[A, B] does not support .filter even today, so we probably don't need to worry about it not supporting .filter due to lack of empty values or whatever

@ Right(1).filter(_ => false)
cmd0.sc:1: value filter is not a member of scala.util.Right[Nothing,Int]
val res0 = Right(1).filter(_ => false)
                    ^
Compilation Failed
@ShaneDelmore

This comment has been minimized.

Copy link
Contributor

ShaneDelmore commented Jul 20, 2017

It’s true, but it does confuse the heck out of everyone the first time they hit it so if we were able to come up with a solution at the same time it would be useful. The only idea I have so far is as mentioned above, allow an implicit empty to be found in scope which would allow the use of filter as long as the user is able to supply an empty in scope, which for me would have been possible every time I have run into the issue.

@lihaoyi

This comment has been minimized.

Copy link
Contributor

lihaoyi commented Jul 20, 2017

I think the correct answer to the confusion is to get people used to the fact that you can't always filter on a monad. There is a huge number of monads for which filter doesn't make sense: State, Reader, Writer, Function0, etc.. In a way I feel that Scala's "put .filter on anything" convention is itself a bit of a wart

@ShaneDelmore

This comment has been minimized.

Copy link
Contributor

ShaneDelmore commented Jul 20, 2017

You are winning me over, I agree, the confusion has always arisen from the expectation that it would just work and eliminating that expectation is probably a better way to go.

@pathikrit

This comment has been minimized.

Copy link

pathikrit commented Aug 6, 2017

Actually ran into a big wart recently related to this. See @frosforever's SO post here: https://stackoverflow.com/questions/45333889/regex-unapply-in-for-comprehension-with-if-guard-not-compiling

tl;dr -

This does not compile:

for {
    s <- List.empty[String]
    regex <- List.empty[scala.util.matching.Regex]
    regex(ss) = s
    if ss == "foo"
} yield s

But removing the if:

for {
    s <- List.empty[String]
    regex <- List.empty[scala.util.matching.Regex]
    regex(ss) = s
} yield s

or rearranging the order of the two lists in the for comprehension:

for {
    regex <- List.empty[scala.util.matching.Regex]
    s <- List.empty[String]
    regex(ss) = s
    if ss == "foo"
} yield s

makes it compile

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jan 24, 2018

It would be nice if we could improve on this. But we'd need someone to put in the effort.

@LPTK

This comment has been minimized.

Copy link

LPTK commented Apr 21, 2018

Can we at least do something about imperative for comprehensions? They have no justification for generating such horrible code.

The following:

for { x <- xs; y = x; if y > 0; z <- zs } println(x + z)

should generate:

xs.foreach { x =>
  val y = x
  if (y > 0) zs.foreach { z =>
    println(x + z)
  }
}

instead of the current convoluted and inefficient:

xs.map { x =>
  val y = x
  (x, y)
}.withFilter {
  case (x,y) => y > 0
}.foreach {
  case (x,y) => zs.foreach { z => println(x + z) }
}

which creates an intermediate list, allocates tuples, and calls four closure-consuming methods instead of just 2.

No wonder people are surprised how much more efficient it is to use while loops rather than for loops in Scala 😅

I believe the fix for this one is a low-hanging fruit. I could try to do it at some point if people agree with the idea.

@LPTK

This comment has been minimized.

Copy link

LPTK commented Apr 21, 2018

As for functional loops, I think it could be solved in a less artificial way by using flatten and a simple method called flattenOptions, of type M[Option[A]] => M[A].

This code:

for { x <- xs; t = x+1; if t > 0; y <- ys } yield x + y

would produce:

xs.map { x =>
  val t = x+1
  if (t > 0) Some(ys.map { y =>
    x + y
  })
  else None
}.flattenOptions.flatten

and this code:

for { x <- xs; y <- ys; t = y+1; if x > t } yield x + y

would produce:

xs.flatMap { x =>
  ys.map { y =>
    val t = y+1
    if (x > t) Some(x + y)
    else None
  }.flattenOptions
}

Compared to point and empty, which have no business whatsoever being defined on monad/collection values, flattenOptions is an independently useful method. In the case of Scala standard collections, it can actually be the same as flatten (which only requires the element type to be traversable, and so is Option).

In addition, flattenOptions can somewhat easily be defined as an extension method:

implicit class FlattenOptions[A,B](xs: FilterMonadic[Option[A],B]) {
  def flattenOptions[That](implicit cbf: CanBuildFrom[B,A,That]) =
    xs.withFilter(_.isDefined).map(_.get)
}

Also, not sure if it would fit in the complexity budget because it makes things less regular, but I would still prefer something easier like:

for { x <- xs; y <- ys if x > 0 } yield x + y

to desugar to the more natural:

xs.flatMap { x =>
  ys.filter(y => x > 0).map { y =>
    x + y
  }
}

which does not allocate any options, instead of:

xs.flatMap { x =>
  ys.map { y =>
    if (x > 0) Some(x + y)
    else None
  }.flattenOptions
}
@smarter

This comment has been minimized.

Copy link
Member

smarter commented Apr 21, 2018

@mdmilosz

This comment has been minimized.

Copy link

mdmilosz commented Apr 30, 2018

Hi all.

I've been thinking about this issue for a while and it looks like the main issue stems from how assignment and condition clauses are desugared in the for-comprehensions. With that assumption, I started considering a plausible unification, so to speak, of the different clauses.

Let be given the following for-comprehension:

      for {
        x <- List(1, 2, 3)
        z = 1
        if x > z
        y <- List(10, 20, 30)
      } yield x + y

Let's assume the existence of such a unary constructor single(a: A): F[A] that we can rewrite z = 1 as z <- single(1). (For instance, for the List type, single(a: A): List[A] = List(a).)

Analogically, let's assume the existence of such a unary constructor cond(b: Boolean): F[Unit] that we can rewrite if x > z as _ <- cond(x > z). (For instance, for the List type, cond(b: Boolean): List[Unit] = List(()).filter(_ => b).)

Then we can write the above for-comprehension as follows:

      for {
        x <- List(1, 2, 3)
        z <- single(1)
        _ <- cond(x > z)
        y <- List(10, 20, 30)
      } yield x + y

Or, after replacing the constructors with their implementation:

    for {
      x <- List(1, 2, 3)
      z <- List(1)
      _ <- List(()).filter(_ => x > z)
      y <- List(10, 20, 30)
    } yield x + y

It might look that, since we only added boilerplate, the resulting desugared code would be longer and more complex. However, if my IDE and Lihaoyi's Ammonite shell are to be believed, the opposite happens. (In fact, the desugared version of the original, generated by IntelliJ IDEA with the Scala plugin, doesn't even compile!)

I've tried to write a minimalistic type class system in 2.12 that would encompass implicit resolution of the single and cond constructors, but encountered either type mismatch errors or ambiguous implicit values errors whenever I tried to implement the constructors for more than one type.

(Here's a small semi-working example using one type — GenTraversable.)

Regardless of the need of providing simple and cond for every type that might need to be used in a for-comprehension, there are some other drawbacks to this concept:

  • the source code needs to be pre-processed even before applying the desugaring stage (perhaps a bit of this logic could be refined and integrated into the desugaring proper?);

  • the source after desugaring might need to be post-processed in order to simplify some specific cases (e.g. single(x).flatMap(f)f(x), single(x).map(f)single(f(x)) etc; as above, maybe it could be turned into a part of a larger piece of logic).

I would be happy to receive feedback on whether there is any merit to this idea.

@pathikrit

This comment has been minimized.

Copy link

pathikrit commented Apr 30, 2018

What is the downside to just desugaring into what this does: https://github.com/oleg-py/better-monadic-for? This is the least surprising behaviour (it is what an user expects in her mental model) and does not break current code (even if it does you will get a compile error)

@Blaisorblade

This comment has been minimized.

Copy link
Contributor

Blaisorblade commented May 1, 2018

This is the least surprising behaviour (it is what an user expects in her mental model)

"Desugar bindings as vals instead of tuples" might be fine, but desugaring without withFilter isn't what I expect as an user.

and does not break current code (even if it does you will get a compile error)

Which compile error? If some members fail the match you get a runtime failure instead of skipping the members — nothing in the README makes me expect errors here:

object TestExtract { def unapply(x: Any): Boolean = false }
1 match { case TestExtract() => 1; case _ => 0 }
for (TestExtract() <- List(1, 2, 3)) yield ()

The only error I anticipate is for extractors that affect the expected type, unlike the above one. Tho the README doesn't present any such example.

(_: List[Int]).map( { case (x, y) => x + y })

EDIT: just confirmed that code gives a runtime failure.

// without plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
res1: List[Unit] = List()

// with plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
scala.MatchError: 1 (of class java.lang.Integer)
  at .$anonfun$res2$1(<console>:13)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:12)
  at scala.collection.immutable.List.map(List.scala:283)
  ... 36 elided

Of course that is not meant as a realistic example, but using an extractor with type unapply(x: T) on a List[T] will not give compiler errors, whether the extractor is going to always succeed, or not. What you'd want is to only omit withFilter for irrefutable patterns, that is (maybe) those built out of the following elementary irrefutable patterns:

  • extractors returning type Some[...] or true.type
  • matches on the only case class of a sealed hierarchy
  • maybe others ???
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment