Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Tuples with HList-like structure #2199

Closed
wants to merge 11 commits into from

Conversation

OlivierBlanvillain
Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain commented Apr 7, 2017

This PR redesigns Tuples to use an HList-like structure instead of flat case classes (scala.TupleN):

package dotty

trait Tuple
trait TupleCons[+H, +T <: Tuple] extends Tuple
object Unit extends Tuple

object TupleCons {
  def apply[H, T <: Tuple](h: H, t: T): TupleCons[H, T] = ...
  def unapply[H, T <: Tuple](t: TupleCons[H, T]): Pair[H, T] =...
}

These types (and values) are users facing: desugaring turns (a, b, ...) into TupleCons[a, TupleCons[b, ...Unit]] (and TupleCons(a, TupleCons(b, ...()))). At run-time, tuples are represented using case classes for small arities and an Array for large arities.

A new mini-phase named TupleRewrites takes care of optimizing apply and unapply expressions corresponding to creation and pattern matching on tuples. To preserve binary compatibility, Scala2Unpickler is modified to transform tuple types to the new representation, TypeErasure then takes care of erasing (small) tuple types back to scala.TupleN.

TODO:

  • Restore PatmatExhaustivityTests. The exhaustivity checks need to be updated to properly handle the new representation.
  • Restore runCompilerFromInterface. This now needs an explicit dotty-library dependency. It used to be inherited "automatically" from the dependsOn(dotty-library) in dotty-compiler.
  • Change patdef desugaring? The current implementation uses .unapply + direct calls to _i which is still going to be capped at 22. We could lift that by desugaring patdef using vars and pattern matching instead.
  • Restore pretty printing for tuples.

@felixmulder
Copy link
Contributor

felixmulder commented Apr 7, 2017 via email

@sjrd
Copy link
Member

sjrd commented Apr 7, 2017

GitHub gets messy with reordered commits in rebases I've noticed

It always does that. GitHub has this stupid notion that the commits of a PR should be displayed in timestamp order, instead of in history order. It's only a display thing, though, so I doubt it would affect your CI (it certainly never affected ours).

@@ -6,7 +6,7 @@ case class Var() extends Expr
object Analyzer {
def substitution(expr: Expr, cls: (Var,Var)): Expr =
expr match {
case cls._2 => cls._1 // source of the error
case e if e == cls._2 => cls._1 // source of the error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _2 is now added thought implicit conversion, the pattern used in this case is no longer stable. (see commit message)

}

c2 match {
case C2(a, b, c) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you detail what this one expands to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unchanged. (the first commits come from #1938 which this PR is rebased on to be able to test case class with size > 23)

v22: Int

// We can't go above that since patDef is implemented with direct _i calls.
// It's should be possible to update the desugaring to use var + pattern
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: “it should”

(1d, 2d, 3d)

val l3: (String, Boolean, Double, Double, Double) =
l1 ++ l2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@DanyMariaLee
Copy link

Great news!! Will also be available everything from shapeless to operate with HLists the way we used to?

@szeiger
Copy link
Member

szeiger commented Apr 7, 2017

I'd like to point you to https://github.com/szeiger/ErasedTypes which I wrote half a decade ago with the purpose of eventually replacing Scala tuples with HArrays. It contains abstractions over tuple arity that may be useful to add (although some will not work in the same way with Dotty making arbitrary type projections illegal). It also has a clean separation between cons-style HLists and flat HArrays (but I recommend looking at typical use cases to determine whether or not this should be preferred over the implementation here which combines the two).

@@ -120,7 +120,7 @@ object Test {
v22: Int

// We can't go above that since patDef is implemented with direct _i calls.
// It's should be possible to update the desugaring to use var + pattern
// Its should be possible to update the desugaring to use var + pattern
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It?

@OlivierBlanvillain
Copy link
Contributor Author

@DanyMariaLee It seems like the current trend is to remove things from the standard library, not add more, so I'm not really sure it shapeless style operations on tuples would have a place in the standard library.

Miles has been talking for a while about a possible shapeless redesign based on Symmetric Monoidal Category (see this branch) that would allow operations like HList#length and Coproduct#length to be factored out into a single implementation over a common structure. The tuple types introduced in this PR would fit perfectly with this design!

@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Apr 25, 2017

@szeiger Thanks for pointing out your HArrays project, it's something I studied briefly before I started working on this. I think the main difference with this PR is that here nothing is intended to be exposed to users besides the TupleCons and Unit types, the fact that tuples are backed by case classes then by arrays is an implementation detail.

The question of array vs linked list representation is easily answered with a binary compatibility requirement: it wouldn't make much sense to us a flat representation up to size 22, then switch to linked afterward. However, if we lower that limit to a more reasonable size, then the question opens again :)

OlivierBlanvillain added a commit to OlivierBlanvillain/scala that referenced this pull request Apr 25, 2017
This change is needed for scala/scala3#2199.

The idea is to have `type Tuple2[A, B] = (A, B) // Desugars into TupleCons[A, TupleCons[B, Unit]]` in Predef, so that users can refer to "a tuple of size 2" either with `(A, B)` or `Tuple2[A, B]`. It's still possible to refer to the actually underlying class with the explicit prefix, `scala.Tuple2`. Because this code is inside the scala package and mixes the two syntax in breaks with the above PR...
@szeiger
Copy link
Member

szeiger commented Apr 26, 2017

When it comes to such low-level constructs as tuples you may not want to hide too much of the details for performance reasons. I haven't done any benchmarking but exposing the TupleN classes should in theory enable better optimizations because you get monomorphic dispatch.

The motivation for having both HList and HArray in my implementation was not a mixed implementation of tuples depending on size. It doesn't do that (tuples are always flat) and for predictable performance I think it shouldn't do it. But at the type level HArrays are nested, so if you already have a proper HList type you can use that as part of the HArray type, i.e. "here's a flat implementation that corresponds to the following HList". Building up from Nat to HList and finally HArray was a natural thing to do given that my goal was not only to remove the Tuple22 limit but also to allow useful abstractions over arity.

Why would you still have special implementations up to 22 here anyway? I think you should lower that number considerably. 22 was originally chosen as large enough to represent what Scala needed, not because of performance considerations. The latter should be the driving factor for determining a good size for Dotty if you support arbitrary tuple sizes anyway. You are not bound by binary compatibility and you have Scalafix to take care of source compatibility.

@DarkDimius
Copy link
Member

Why would you still have special implementations up to 22 here anyway? I think you should lower that number considerably. 22 was originally chosen as large enough to represent what Scala needed, not because of performance considerations.

While originally 22 wasn't chosen due to performance considerations, It would be nice to keep performance properties of existing code. I'd prefer to keep 22 limit in place and not lower it.

@OlivierBlanvillain
Copy link
Contributor Author

Why would you still have special implementations up to 22 here anyway?

The current very practical reason is that dotty test suite currently uses the 2.11.x standard library compiled by scalac. Methods like Map.apply and Map.head accept and return scala.Tuple2 instances, which is handled in this PR by preserving binary compatibility.

I think you should lower that number considerably.

I definitely agree. I did a quick search on a large corpus of open-source Scala project (~600K LOC), and every tuple instantiated, pattern-matched on or accessed with _i explicitly was on tuple of size 6 or below. Above that, it's only generic operations (for example, in Play), for which both array and linked list representation should be superior. It also seems that the majority of generic operations are based on head & tail, which obviously works better on a linked representation.

TupleN classes should in theory enable better optimizations because you get monomorphic dispatch.

The tuple interface has no methods, so they shouldn't be dynamic dispatch happening on these tuples. Non-generic operations with statically known length use a cast and a direct call, generic operations must rely on pattern matching. If we imagine an up to size 6 case classes then linked list representation, and always put the TupleCons case first, we should get about the same performance than a linked list for head & tail.

I should probably plug in a few benchmarks I did just on the representations themselves: bench.pdf. Each representation is special cased up to size 4 with case classes. UnrolledHList is a linked list with 3 different cons nodes, holding 1, 2 and 3 heads respectively (actual definitions).

@odersky
Copy link
Contributor

odersky commented Apr 29, 2017

Is this ready for review? In this case it would be good to rebase first.

Changes are as follows:

- New types for tuples in dotty library: Tuple = Unit | TupleCons Head Tail <: Tuple

- Desugaring uses this structure instead of the scala.TupleN for types and expressions

- New TupleRewrites phase does rewrites the HList structure into scala.TupleN like case classes (for small arities) or a case class wrapping an Array (for large arities)
To stay binary compatibility with scala 2.11 binaries, we hijack the
unpickling of types to fake an HList structure. Later on in erasure, HList
types are eraised back to scala.TupleN (for N <= 22). In addition because the
actual scala.TupleN classes are neither `<: Tuple` now `<: TupleCons`, these
types needs to be systematically araised to Object.

In addition to all these carefully compensating hacks, this also imposes a new
contain: the dotty-library, which defines several `Tuple/TupleCons` methods,
can should now *always* be compiled by dotty itself. Indeed, when compiled
with scalac, these types will stay as they are instead of being eraised to
Object.
I couldn't get rid of the `dotty-library` project completely, it's still used the first time sbt-briged is compiled with dotty. The dependencies are as follows:

- (no dep) compile `dotty-compile` with scalac [0]
- (no dep) compile `dotty-library` with scalac [1]
- ([0], [1]) → compile `dotty-sbt-bridge` with dotty [2]
- ([0], [2]) → compile `dotty-library-bootstraped` with dotty + sbt

After that, [1] should never be used again.
- dotc/scala-collections.blacklist:

Blacklist Unit and TupleN from scala-collections test

- tests/neg/i1653.scala:

Having an additional superclass on Unit seams to affect the error here (not sure why).

- tests/pos/t0301.scala:

Since `_2` is now added thought implicit conversion, the pattern used in this case is no longer stable.

- tests/pos/tcpoly_bounds1.scala:

This test uses the fact that `(A, B) <: Product2[A, B]`, which is no longer the case.

- tests/repl/patdef.check:

Having the `_i` via implicit conversions remove the @unchecked here

- tests/run/tuple-match.scala:

Replace Tuple1 in pattern matching with explicit TupleCons pattern. TupleN can still be used in apply position (defined as defs in DottyPredef), but not in unapply position.

- tests/run/unapply.scala:

Also uses the fact that `(A, B) <: Product2[A, B]`.

- tests/run/withIndex.scala:

Might be a limitation in typer?:

14 |        case _: Array[Tuple2[_, _]] => true
   |                      ^^^^^^^^^^^^
   |unreducible application of higher-kinded type [A, B] => dotty.TupleCons[A, dotty.TupleCons[B, Unit]] to wildcard arguments
The LogPendingSubTypesThreshold change is requires for tests on size ~25 tuples that would otherwise trigger -Yno-deep-subtypes
Because all tests depend on dotty-library-bootstrapped publishLocal is always needed to run tests.
@OlivierBlanvillain
Copy link
Contributor Author

@odersky yes! The last missing part is to update pattern matching exhaustivity. The current implementation is special cased for tuples; to obtain the same warnings when using hlists this code would need to be either generalised for GADTs or special cased for the new structure.

@mdedetrich
Copy link

mdedetrich commented May 1, 2017

@szeiger

Why would you still have special implementations up to 22 here anyway? I think you should lower that number considerably. 22 was originally chosen as large enough to represent what Scala needed, not because of performance considerations. The latter should be the driving factor for determining a good size for Dotty if you support arbitrary tuple sizes anyway. You are not bound by binary compatibility and you have Scalafix to take care of source compatibility.

AFAIK in scalac, 22 was chosen actually due to the Scala runtime size. The blowup in jar size for the various Tuple/Product/Function etc classes that get generated is polynomial relative to the arity, and once you start going above 22 you start adding megs onto the Scala runtime

@OlivierBlanvillain

It seems like the current trend is to remove things from the standard library, not add more, so I'm not really sure it shapeless style operations on tuples would have a place in the standard library.

I think that there should be an exception for shapeless style operations because they tend to be central to how the language is used. A lot of the things which shapeless provides, if they were baked into the actual scalac backend, would be a lot more performant and they would also remove a huge amount of boilerplate.

That doesn't mean the entirety of shapeless should be in Scalac, but a lot of the tuple/case class conversion functions, and other methods of abstracting over arity I think would be welcome. Its painful to see languages like Ruby and Python able to do something in one line which Scala doesn't have an answer for without shapeless (or it does but it takes 20 lines)

@odersky odersky self-requested a review May 5, 2017 12:59
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work! I have lots of comments on details but the general approach looks good.

One overall remark: Have a closer look what goes on in Definitions. It seems a lot of the things I criticise really come down to the fact that you don't make optimal use of the things defined there.

case 0 => unitLiteral
case _ if ctx.mode is Mode.Type =>
// Transforming Tuple types: (T1, T2) → TupleCons[T1, TupleCons[T2, Unit]]
val nil: Tree = TypeTree(defn.UnitType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use hnilType for consistency with hconsType?

ts.foldRight(nil)(hconsType)
case _ =>
// Transforming Tuple trees: (T1, T2, ..., TN) → TupleCons(T1, TupleCons(T2, ... (TupleCons(TN, ())))
val nil: Tree = unitLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again for naming consistency, use hnil, hcons?

@@ -140,7 +140,7 @@ object Config {
final val LogPendingUnderlyingThreshold = 50

/** How many recursive calls to isSubType are performed before logging starts. */
final val LogPendingSubTypesThreshold = 50
final val LogPendingSubTypesThreshold = 70
Copy link
Contributor

@odersky odersky May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this increased?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some tests with size 24 tuples that triggers this limit, I'm going to revert this change and run these test without -Yno-deep-subtypes.

lazy val ProductNType = mkArityArray("scala.Product", MaxTupleArity, 0)
lazy val TupleNType = mkArityArray("scala.Tuple", MaxImplementedTupleArity, 1)
lazy val TupleNSymbol = TupleNType.map(t => if (t == null) t else t.classSymbol)
lazy val DottyTupleNType = mkArityArray("dotty.DottyTuple", MaxImplementedTupleArity, 1)
Copy link
Contributor

@odersky odersky May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined like this TupleNSymbol is an Array[AnyRef]. It's better to use if (t == null) NoSymbol which makes it an Array[Symbol].

lazy val TupleNType = mkArityArray("scala.Tuple", MaxImplementedTupleArity, 1)
lazy val TupleNSymbol = TupleNType.map(t => if (t == null) t else t.classSymbol)
lazy val DottyTupleNType = mkArityArray("dotty.DottyTuple", MaxImplementedTupleArity, 1)
lazy val DottyTupleNModule = DottyTupleNType.map(t => if (t == null) t else t.classSymbol.companionModule.symbol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -758,10 +758,18 @@ class Typer extends Namer with TypeAssigner with Applications with Implicits wit
else errorType(i"wrong number of parameters, expected: ${protoFormals.length}", tree.pos)

/** Is `formal` a product type which is elementwise compatible with `params`? */
def ptIsCorrectProduct(formal: Type) = {
def ptIsCorrectProduct(formal: Type): Boolean = {
val cons = defn.TupleConsType.symbol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defn.TupleConsClass (once it is introduced, see other comment)

val cons = defn.TupleConsType.symbol

// Flatten types nested in TupleCons as a List[Type].
def tupleType(t: Type, acc: List[Type] = Nil): List[Type] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to tupleTypeArgs

@@ -0,0 +1,471 @@
package dotty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the src_dotc scheme or the other aspects of the build. Let's discuss this aspect with the dotty team as a whole.


def Tuple1[A](a: A): TC[A, Unit] = TC(a, ())

implicit class Tuple2Assessors[A, B](l: TC[A, TC[B, Unit]]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessors? Should it not be Accessors?

@julienrf
Copy link
Collaborator

julienrf commented Jun 30, 2017

What’s the status of this PR?

Also, this might be out of the scope of this PR but I think you should not stop in the mid-way and do the same for sealed traits ;) (using Either instead of TupleCons).

@OlivierBlanvillain
Copy link
Contributor Author

OlivierBlanvillain commented Jun 30, 2017

@julienrf Lots of the complexity of this PR is due to have dotty as cross complied scalac/dotty project. Since going for a bootstrap only compiler is very close on the road-map, I'm going to wait for that before coming back to this PR.

Sum types will hopefully be the follow up :)

@antonkulaga
Copy link

Why would you still have special implementations up to 22 here anyway? I think you should lower that number considerably. 22 was originally chosen as large enough to represent what Scala needed

Actually, I hitted 22 fields limit in some real-case scenarios (when I had to write large case classes to parse large cvs-s)

@odersky odersky mentioned this pull request Apr 1, 2018
@joan38
Copy link
Contributor

joan38 commented Apr 26, 2018

Hi guys,
Any news on this feature? Is it still just paused?
Cheers

@ctongfei
Copy link

I wonder if it is possible to create a typesafe apply method on this HList-like Tuple type.

Something like

trait Tuple { self: A =>
  def apply[I <: Int & Singleton, X](i: I)
    (implicit ev: TupleIndexer.Aux[A, I, X]): X = ev(self)
}

So you could do the following at compile time:

val t: (Int, String, Double)
t(0) // type: Int
t(1) // type: String
t(2) // type: Double
t(3) // ERROR: does not compile

@OlivierBlanvillain
Copy link
Contributor Author

@joan38 It's postponed until scalac can emit TASTY and we can recompile everything from TASTY, given than the stdlib and thus the entire world has a binary dependency on Tuple2. Until then, the hacks needed in this PR to retain binary compatibility are a bit too much IMO.

@joan38
Copy link
Contributor

joan38 commented May 25, 2018

Thanks @OlivierBlanvillain for the update. So I guess until 2.14 millstone appears? This feature is really promising.

@Glavo
Copy link
Contributor

Glavo commented May 25, 2018

I am worried that this change will slow down the compilation speed.

@felixmulder
Copy link
Contributor

@Glavo - don't be :)

Also, dotty has a benchmarking suite that is being used for a lot of pull requests. So now the compiler team can actually know if something would slow down the compiler before merging.

@nicolasstucki
Copy link
Contributor

@OlivierBlanvillain now we have another implementation of the same idea, can we close this PR? Or is there something useful that was not covered by the current implementation?

@allanrenucci
Copy link
Contributor

Maybe we should extract the test cases

@joan38
Copy link
Contributor

joan38 commented Oct 16, 2018

@nicolasstucki @allanrenucci Could you point out to the other implementation?

@allanrenucci
Copy link
Contributor

@joan38 #4938

OlivierBlanvillain added a commit to dotty-staging/dotty that referenced this pull request Oct 16, 2018
4 of these tests are failing, I put them in pending a trace.
Intrestingly they all fail in different ways:

Compiler crash
tests/pending/tuple-patmat-extract.scala

Compilation error
tests/pending/tuple-for-comprehension.scala

Runtime error
tests/pending/tuple-erased.scala

Doesn't pass Ycheck
tests/pending/tuple-cons.scala
OlivierBlanvillain added a commit to dotty-staging/dotty that referenced this pull request Oct 16, 2018
4 of these tests are failing, I put them in pending a trace.
Intrestingly they all fail in different ways:

Compiler crash
tests/pending/tuple-patmat-extract.scala

Compilation error
tests/pending/tuple-for-comprehension.scala

Runtime error
tests/pending/tuple-erased.scala

Doesn't pass Ycheck
tests/pending/tuple-cons.scala
OlivierBlanvillain added a commit to dotty-staging/dotty that referenced this pull request Oct 16, 2018
4 of these tests are failing, I put them in pending for now.
Interestingly each of them fails in different ways:

Compiler crash
tests/pending/tuple-patmat-extract.scala

Compilation error
tests/pending/tuple-for-comprehension.scala

Runtime error
tests/pending/tuple-erased.scala

Doesn't pass Ycheck
tests/pending/tuple-cons.scala
@nicolasstucki
Copy link
Contributor

There is also #5182, where the macro implementation changed completely.

nicolasstucki added a commit that referenced this pull request Oct 17, 2018
@lJoublanc
Copy link

How will this interact with the standard collections zip operator, which is left-associative (rather than right-associative, like the implementation suggested here)?
It would be nice to be able to say e.g. List(a) zip List(b) zip List(c) === List((a,(b,c))) === List((a,b,c)), but with the existing associativity of zip, you end up with List(((a,b),c)) <> List((a,b,c)).

@AnirudhVyas
Copy link

AnirudhVyas commented Mar 30, 2020

has this been added to Dotty already?

@nicolasstucki
Copy link
Contributor

Yes. Under scala.Tuple. (a,b, c) is equivalent to a *: b *: c *: ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet