Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP-62 - For comprehension improvements #79

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

KacperFKorban
Copy link
Member

No description provided.

@SethTisue
Copy link
Member

SethTisue commented Feb 20, 2024

about allowing a = ... as the first clause, there is previous discussion (and links to other previous discussions) at scala/bug#907 — opened by me all the way back in 2008!

@nafg
Copy link

nafg commented Feb 20, 2024

Just to be clear, is this 3 orthogonal changes or do they build on each other? Just curious.

@bishabosha
Copy link
Member

bishabosha commented Feb 20, 2024

  1. Avoiding redundant map calls if the yielded value is the same as the last bound value.

perhaps some code out there has side effecting map that could change semantics if dropped?

@KacperFKorban
Copy link
Member Author

Just to be clear, is this 3 orthogonal changes or do they build on each other? Just curious.

Theses can be 3 independent changes. 1. and 2. use some of the same new desugaring rules and are implemented in the same commit in the draft implementation, but it is definitely possible to implement just one of them.

@KacperFKorban
Copy link
Member Author

perhaps some code out there has side effecting map that could change semantics if dropped?

Yes, that is possible. But if that's the case, then using map for such an operation probably isn't a good idea in the first place.


This change is binary and TASTY compatible since for-comprehensions are desugared in the Typer. Thus, both class and TASTY files only ever use the desugared versions of programs.

While this change is forward source compatible, it is not backward compatible, as it accepts more syntax.
Copy link

@lihaoyi lihaoyi Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty clear that many of the changes in this proposal can potentially change the behavior of user code. We could argue that such user code is badly written, but nevertheless it is a concern, and we should be rigorous about listing out what kinds of user code would have their behavior changed by this proposal.

Given that we are already choosing to potentially change the behavior of user code via these changes, I wonder if we should take it further. e.g. rather than generating

    Some(1).map { a =>
      val b = a
      (a, b)
    }.withFilter { case (a, b) =>
      b > 1
    }.map { case (a, b) =>
      a + b
    }

we could generate something like

    Some(1).flatMap { a =>
      val b = a
      if (!(b > 1)) Option.empty
      else Some(a + b)
    }

The latter is what people would write by hand, and doesn't have the same surprising behavior around order of evaluation and laziness that the former does (surprising behavior that I have myself been bitten by several times)

If we're going to be changing the desugaring and potentially breaking user code, I feel like it makes sense to change it directly to a desugaring that makes sense, rather than changing it half-way and paying the cost in breakage without reaping the full benefits of a simpler desugaring

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that particular example, how would the compiler know that it should desugar to Option.empty instead of Some.empty? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but AFAIK this would require some refactoring around desugaring of fors. Right now, the desugaring is purely syntactic and it deals with untyped trees. So to implement the desugaring based on empty in an elegant way, fors would have to become fully fledged (typed) trees.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could provide implicitly[Empty[Some[Int]]] or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this would still require the type info. Implementing empty as a method on the class instead of the module might work, but that seems the opposite of elegant.

I think that this change might be way less disruptive than it may seem. Maybe we could run the open community build with vs without the feature flag and check the results. (It's far from good testing, but better than speculation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the old desugaring could remain supported under a flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this would still require the type info. Implementing empty as a method on the class instead of the module might work, but that seems the opposite of elegant.

In fact we already have it. There is an empty method on most (all?) stdlib collections. But we also need a singleton method and we don't have that (the Some in the last line of the example)

Some(1).flatMap { a =>
      val b = a
      if (!(b > 1)) this.empty
      else this.singleton(a + b)
    }

Otherwise we'd need the types, and we definitely do not want that! The untyped translation of for expression is one of the major distinguishing features of Scala.

Copy link

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to go the implicit route, we might be able to do so via an untyped desugaring as well. Something like:

trait Empty[T]{
  def apply(): T
}
object Empty{
  def of[T](v: => T)(implicit e: Empty[T]) = e
}


trait Single[T]{
  def apply(t: T): T
}
object Single{
  def of[T](v: => T)(implicit e: Single[T]) = e
}

Some(1).flatMap { a =>
  val b = a
  if (!(b > 1)) Empty.of(this).apply()
  else Single.of(this).apply(a + b)
}

This could be useful if e.g. we wanted to provide the Single.of value without needing to add a def singleton on every type, allowing us to move forward without modifications to existing standard library classes (e.g. if the std lib was still frozen)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to make this change independent from unfreezing the stdlib would be great. But I'm not sure if it's that straightforward to create those typelcasses. For the sake of inference, we would like the typeclass to be contravariant (we want Empty[Option[?]] <: Empty[Some[?]]). But then IMHO we can't implement it, since it's not the case that ∀(Opt <: Option[?]). None <: Opt. On the other hand, if we make the typeclass invariant, then it will just infer the more specific type. (But maybe I'm missing something obvious)

One additional concern I have here is that this isn't a correct lookup. In this case, it's an owner class/package reference. To get the "monad" lookup, we would have to lift the first GenFrom expression and use it as a lookup afterwards, like so:

val a$lifted = Some(1)
a$lifted.flatMap { a =>
  val b = a
  if (b > 1) a$lifted.singleton(a + b)
  else a$lifted.empty
}

If done correctly, the lifted val should always be immediately before the reference, but might add a lazy to be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the by name parameter, as long as the value is bound to a variable so you could have something like Empty.of[b.type]

@bjornregnell
Copy link

Related about for-comprehensions but with focus on performance of for-do:

https://august.nagro.us/scala-for-loop.html

A quote from the above blog post by @AugustNagro :
"I wonder if for-do could be updated to behave like the loop() overloads, falling back to foreach(..) when required. I’m sure it would make some codebases a lot faster."

If we revisit de-sugaring of for-comprehensions perhaps we should also include possible performance optimizations to for-do in the same go?

@odersky
Copy link
Contributor

odersky commented Feb 21, 2024

@bjornregnell If we can unfreeze the standard library, performance for for loops is already a solved issue. We simply make foreach on Range an inline method. And we should be able to do that very soon, it seems.

@odersky
Copy link
Contributor

odersky commented Feb 21, 2024

I believe that is still not quite correct:

val a$lifted = Some(1)
a$lifted.flatMap { a =>
  val b = a
  if (b > 1) a$lifted.singleton(a + b)
  else a$lifted.empty
}

This only works of the result of the function passed to the flatmap is the same as the carrier type of the flatmap. But it general it need not be.

@bjornregnell
Copy link

We simply make foreach on Range an inline method.

Aha! If it can be solved in the std library rather than by the compiler this is yet another cool evidence about the power of the language :) (Again so glad that I'm doing Scala).

@odersky
Copy link
Contributor

odersky commented Feb 21, 2024

I think changes (1) and (2) are definitely worthwhile.

(3) (eliding map) is tricky since it changes behavior if the mapimplementation is side-effecting. A discussion took place in scala/scala3#16703 which is a (closed) PR linked from the SIP.

I wonder whether we should wait with (3) until we have a better grip on effects. There are a number of places where this would be beneficial. But it's clearly a long-term goal only.

Or, try a cooperative solution: If the final expression is x <- xs yield x, translate that to xs.map(Identity) with a special library defined Identity object with an identity apply method. Then the implementations of map can decide to short-circuit identity.

Or, maybe we can declare that side-effecting map is extremely rare and do the spec change anyway. I am on the fence for this one.

@JD557
Copy link

JD557 commented Feb 21, 2024

Or, try a cooperative solution: If the final expression is x <- xs yield x, translate that to xs.map(Identity) with a special library defined Identity object with an identity apply method. Then the implementations of map can decide to short-circuit identity.

I wonder if this behavior would make more sense in mapConserve, with for simply desugaring the last map to mapConserve instead.
Although this would obviously be a huge breaking change.

@odersky
Copy link
Contributor

odersky commented Feb 21, 2024

About the general problem of translating vals in a natural way even in the presence of conditionals: I thought up a scheme that fits all the criteria, but it is not simple.

The criteria being:

  1. Untyped translation
  2. No requirement that methods taking part in the translation have any specific type.
  3. It should be possible to define a lifting behavior that simply maps a for expression to an AST describing its contents.

The new scheme does not need withFilter anymore, but needs three other functions instead (I am using collection meaning for intuition):

  • mapDefined: Takes a function and returns all defined elements obtained by applying the function to the receiver collection. What is "defined" is up to the implementor of mapDefined to decide.
  • flatMapDefined: Takes a function and concatenates all defined collections obtained by applying the function to the receiver collection.
  • applyFilter: Takes a predicate p and a body returning a collection, and marks all elements for which p is false as undefined.

Example

The following for expression mixes generators, val definitions, and conditionals:

    for
      x <- List(1, 2, 3)
      y = x + x
      if x >= 2
      i <- List.range(0, y)
      z = i * i
      if z % 2 == 0
    yield
      i * x

The expression would be translated to the following code:

    val xs = List(1, 2, 3)
    xs.flatMapDefined: x =>
      val y = x + x
      xs.applyFilter(x >= 2):
        val is = List.range(0, y)
        is.mapDefined: i =>
          val z = i * i
          is.applyFilter(z % 2 == 0):
            i * x

I verified that the two expressions give the same results if the three methods are defined like this:

extension [A](as: List[A])

  def applyFilter[B](p: => Boolean)(b: => B): Option[B] =
    if p then Some(b) else None

  def flatMapDefined[B](f: A => Option[IterableOnce[B]]): List[B] =
    as.flatMap: x =>
      f(x).getOrElse(Nil)

  def mapDefined[B](f: A => Option[B]): List[B] =
    as.flatMap(f)

Here we use an Option type for tracking definedness. That's not the only possible choice. here is another one:

object UNDEFINED

extension [A](as: Vector[A])

  def applyFilter[B](p: => Boolean)(b: => B): B | UNDEFINED.type =
    if p then b else UNDEFINED

  def flatMapDefined[B](f: A => IterableOnce[B] | UNDEFINED.type): Vector[B] =
    as.flatMap: x =>
      f(x) match
        case UNDEFINED => Nil
        case y: IterableOnce[B] => y

  def mapDefined[B](f: A => B | UNDEFINED.type): Vector[B] =
    as.flatMap: x =>
      f(x) match
        case UNDEFINED => Nil
        case y: B => y :: Nil

Or, as a third possibility, mapDefined, flatMapDefined and applyFilter could simply generate ASTs that represent their applications.

In the Scala collections library, we build most collections using buffers. In this case mapDefined and flatMapDefined could simply push only defined elements into the buffer, so their implementation could be quite efficient.

Discussion

We can pick old or new scheme depending on the availability of mapDefined, flatMapDefined, and applyFilter vswithFilter on the first generator collection. If there are no conditionals, the new scheme would generate exactly the same code as under points (1) and (2) of this SIP.

The upside of this scheme is that we get nice untupled vals even in the presence of conditionals. The downside is that instead of withFilter we need three new methods to implement that scheme. On the other hand, withFilter turns out to be quite hard to implement since it has to fuse with any following map, flatMap, withFilter, and foreach call. So while the interface comprising three methods instead of one is more complex, the actual implementation might be simpler.

About efficiency: it depends on the implementation of the collection methods. If there are no vals it will be hard to beat a good implementation of withFilter but one might come close. If there are vals the new scheme looks significantly faster since it avoids the tupling operations.

Formal translation rules for the new scheme.

The syntax of for expressions is described as follows:

forExpr ::= for F yield E

where

F ::= x = E; F
      x <- E; G
G ::= []
      x = E; G
      if E; G
      x <- E; G

The translation scheme is defined with semantic brackets {...}. There are two closely related forms, depending whether we see a prefix F up to the first generator or a general sequence G following it:

  • { for F yield E }c where c = undefined
  • { for G yield E }c where c is a reference to the generator preceding the G sequence

The translation is defined as follows:

{ for [] yield E }c              =  E
{ for x = Ex; G yield E }c       =  val x = Ex; { for G yield E }c
{ for if Ep; G yield E}c         =  c.applyFilter(Ep)({ for G yield E }c)
{ for x <- Ex; G yield E }c      =  val c1 = Ex; c1.BIND(x => { for G yield E }c1)    (c1 fresh)

      where BIND = flatMapDefined if isGen(G), isFilter(G)
                 = mapDefined     if !isGen(G), isFilter(G)
                 = flatMap        if isGen(G), !isFilter(G)
                 = map            if !isGen(G), !isFilter(G)

                 isFilter(if E; S)
                 isFilter(x = E; S)      if  isFilter(S)

                 isGen(x <- E; S)
                 isGen(x = E; S)         if isGen(S)
                 isGen(if E; S)          if isGen(S)

@lihaoyi
Copy link

lihaoyi commented Feb 21, 2024

I like the mapDefined/flatMapDefined/applyFilter encoding.

As far as I can tell, it basically is the "minimal" encoding necessary to represent the for comprehension. The various Some/None/Empty/Singleton/etc. things we discussed earlier are collapsed relatively neatly into those three operations.

And as a result of being the minimal encoding of the for comprehension, it seems to be the most direct as well: the for comprehension becomes this set of nested function calls. No tuple packing/unpacking weirdness like the status quo, no ad-hoc if-else-empty conditionals like some of the alternate proposals.

I don't think it's any more complex than what we have now. Rather, it takes a bunch of complexity in the desugaring today and moves it into the implementation of three core methods, allowing the desugared code to be far simpler and allowing freedom to implement those three core methods in a wider variety of ways

x <- combineM(a, b)
yield x
```

Copy link

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere, we discussed how making the final map optional would be a breaking change. But what if we went the haskell route, and only removed the final map if the final yield is also removed?

    val a = largeExpr(b)
    for
      b <- doSth(a)
      combineM(a, b)
  1. This is currently invalid syntax, so no backwards compat concerns
  2. We get to remove the useless map from the generated code
  3. We also get to remove the useless yield from the source code!
  4. The correspondence between the map and yield is direct: yield means map, no-yield means no-map. No "magic" map-removal depending on the shape of the final yield expression
  5. The final source code is shorter and more concise
  6. The final source code looks more like other languages with similar syntax, e.g. Haskell's do or F#'s return! in computation expressions

Seems like a win-win-win-win-win-win overall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting possibility. Parsing will be a problem since we don't know anymore whether the next token sequence is a pattern or an expression. I am sure it can be overcome, but don't know how much complexity it would add to the parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this change, but I didn't manage to get it to work in the parser. (I tried going in the reflect-like direction to get better syntax for this case)
But if we were to be able to distinguish a pattern from an expression, then yet another problem is automatically solved -- we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One workaround from a pre-SIP is to have a yield <- monadicExpr syntax.

Copy link

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield <- looks super ugly syntactically IMO, but semantically it's exactly the same as yield from in python and return! in F#. Maybe we can tweak the syntax to make it prettier, but the idea of using a keyword to work around parser ambiguity seems reasonable if a 'direct' syntwx proves difficult to implement

we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One issue here is we have two things we may want to simplify

  • _ <- too
  • _ = foo

For non-trailing raw expressions in a for block, we can only assign one meaning, and the other meaning will still have to be explicitly spelled out as above. Maybe that's ok, but it's something worth considering

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be controversial to reuse return as in

for x <- List(1,2,3) return List(x, x)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be against reviving return for several reasons, including: 1) yet another syntax to learn besides do and yield, but worse: 2) my experience is that many learners have all sorts of connotations on the return keyword with all sorts of misconceptions about how it works in Scala so I think it is best to say "don't use return" and for some it takes a while not to end everything with return if they are used to that...

@odersky
Copy link
Contributor

odersky commented Feb 23, 2024

About the translation of for expressions with ifs. I omitted before patterns in generators and in particular case p <- expr generators which translate to withFilter applications. Here are translation rules that handle these as well:

{ for [] yield E }c              =  E
{ for p = Ep; G yield E }c       =  val p = Ep; { for G yield E }c
{ for if Ep; G yield E}c         =  c.applyFilter(Ep)({ for G yield E }c)
{ for p <- Ep; G yield E }c      =  val c1 = Ep; c1.BIND{ case p => { for G yield E }c1 }    (c1 fresh)
{ for case p <- Ep; G yield E }c =  { for $x <- Ep; if $x match case p => true case _ => false; p = $x@RuntimeChecked; G yield E }c

For symmetry could also add case binders case p = e, which could translate analogously to case p <- e:

{ for case p = Ep; G yield E }c  =  { for $x = Ep; if $x match case p => true case _ => false; p = $x@RuntimeChecked; G yield E}c

@odersky
Copy link
Contributor

odersky commented Feb 23, 2024

About final map elision. I think I now prefer the original proposal (3) over the alternatives to change the syntax. Side effecting maps should be extreme outliers, and should not be something we wish to keep at all costs. So I believe changing the spec is fine.

@lihaoyi
Copy link

lihaoyi commented Feb 23, 2024

I'm not sure side effecting maps are such outliers. At the very least, the map is needed to force the result of a previous lazy withFilter call. That surely must count as a side effect?

There are also the various Future/Streaming libraries where the final map does affect the topology of the computation graph, albeit in a probably benign way

@odersky
Copy link
Contributor

odersky commented Feb 23, 2024

withFilter won't be a problem since it is only used in cases like this:

for 
  x <- xs
  if p(x)
yield 
  x

But in that case we would not drop the map.

@KacperFKorban
Copy link
Member Author

I think that the desugaring proposed by @odersky will result in a cleaner scheme than my suggested improvements. The only downside is that it is more disruptive, since it requires implementing new functions.

So what should be the steps now?
Do we draft an alternative SIP for this approach and let the SIP committee decide?
Do we consider this SIP as a short-term solution and wait for the stdlib to unfreeze with the new scheme? (I think that at least (1) from this SIP would be nice to have decently quickly)

@odersky
Copy link
Contributor

odersky commented Feb 27, 2024

I am open merging the proposals into one, but we could also have two different ones. As far as I can see, the original SIP scheme is a subset of my proposed extensions, so it could go in separately. On the other hand, maybe it's more economic to discuss everything together.

I believe even the extended the SIP can in principle be adopted without waiting for stdlib to be unfrozen. It's just that its improvements would be more useful if they could be used with stdlib collections.

We can discuss what to do at the next SIP meeting.

@lihaoyi
Copy link

lihaoyi commented Feb 29, 2024

I saw that the 3.4.0 release notes contain:

.withFilter is also no longer inserted for a pattern in a generator unless prefixed by case

That is a breaking change similar to the ones discussed in this proposal, as we have no guarantees that withFilter is pure, even though by convention is typically is.

I suppose that sets the precedence that it's OK to do similar potentially-breaking changes going forward? That category would include the final .map removal proposed here, as well as change in the desugaring to a different set of methods

@bishabosha
Copy link
Member

bishabosha commented Mar 1, 2024

That is a breaking change similar to the ones discussed in this proposal, as we have no guarantees that withFilter is pure, even though by convention is typically is.

edit: If we go with the precedent from this change, then I think we would have a few minor releases where you introduce the warning that its going to change

@lrytz
Copy link
Member

lrytz commented Mar 1, 2024

Untyped translation

We can pick old or new scheme depending on the availability of mapDefined, flatMapDefined, and applyFilter vswithFilter on the first generator collection

So the first generator would be typed to figure out which desugaring to apply?

@lihaoyi
Copy link

lihaoyi commented Mar 16, 2024

The consensus in the SIP committee was that we would prefer the more aggressive change, if it is possible in the short-medium term, rather than making an incremental change now and then possibly making another change a short time later. But that is all contingent on whether or not the more aggressive change, i.e. the desugaring proposed by @odersky, is viable.

So the first generator would be typed to figure out which desugaring to apply?

@odersky this point is worth discussing. AFAIK, one of the goals of many of these desugarings is to make them purely syntactic. The alternate proposal you made would require typechecking the first generator in order to decide how to perform the desugaring.

  1. Is that going to be an implementation issue? Given the desugaring has always been syntactic rather than type-driven, this would be a major shift in how/when such a desugaring can take place, and it seems like something that may have require significant changes in the compiler implementation.

  2. Is it possible to workaround the requirement for typechecking? e.g. could we provide some kind of transparent inline extension methods mapDefined/flatMapDefined/applyFilter that expand to the previous desugaring, such that the compiler can naively desugar to mapDefined/flatMapDefined/applyFilter, and have subsequent typechecking decide whether a defined implementation exists, or whether we use the transparent inline extensions to fall back to the current behavior using the current methods?

@odersky
Copy link
Contributor

odersky commented Mar 16, 2024

@lihaoyi Yes, I think something like that will be needed. Using inline extension methods for that is an interesting idea!

@lrytz
Copy link
Member

lrytz commented Mar 17, 2024

  1. Given the desugaring has always been syntactic rather than type-driven

I believe we had a similar situation in Scala 2.8 where filter was replaced by withFilter, but filter was still supported?

@odersky
Copy link
Contributor

odersky commented Mar 24, 2024

I believe we had a similar situation in Scala 2.8 where filter was replaced by withFilter, but filter was still supported?

Yes indeed. At the time we did not have inline extension methods, so we had to base it on membership.

With inline extension methods, it's still a bit tricky. Here's a possible scheme:

  1. Instead of desugaring a for expression directly wait until typing. When hitting the first generator of a for expression that's followed by a filter, try to expand it with mapDefined or flatMapDefined . If that works, expand the rest of the for expression with the new scheme.
  2. Otherwise, try to translate with withFilter. If that works, expand the rest of the for expression with the old scheme. (including the improvements in the core proposal).
  3. If neither works, report any type errors with the new scheme.

With extension methods we can retro-fit the new scheme to existing libraries such as stdlib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet