Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Records To Dotty #8

Open
odersky opened this issue Nov 18, 2015 · 18 comments

Comments

@odersky
Copy link

commented Nov 18, 2015

This is a rough sketch of a possible design of records for Dotty

Types

A single-field record type is of the form (a: T) where a is an identifier and T is a type. It expands into an instance of a trait Labelled$a[T]. We assume that for every field name a present in a program the following trait will be automatically generated:

trait Labelled$a[+T](val a: T)

A multi-field record type (a_1: T_1, ..., a_n: T_n) is equivalent to the intersection of single-field record types (a_1: T_1) & ... & (a_n: T_n). Since & is commutative, order of fields does not matter.

A row type is a tuple of label/value pairs. Each label is a string literal. Unlike for record types, order of labels does matter in a row type.

The base trait Record is defined as follows:

trait Record { def row: Row }

Here, Row is assumed to be a generic base type of HLists. Record values are instances of Record types that refine the type of row.

Values

A record value is of the form (a_1 = v_1, ..., a_n = v_n). Assuming the values v_i have types T_i this is a shorthand for

new Record with
        Labelled$a_1[T_1](v_1) with ...
        ...
        Labelled$a_n[T_n](v_n) {
   def row = (("a_1", a_1), ..., ("a_n", a_n))
}

TODO: Define equality
TODO: Define how to create a record value from a generic HList representation - on the JDK it seems
we can use Java's Proxy mechanism for this.

@cvogt

This comment has been minimized.

Copy link

commented Nov 18, 2015

Very happy to see an effort happening into this direction in Dotty! Kicking off the discussion here with a few problems that need to be solved. Maybe answers to some of them already exist and would be good to capture here.

  1. If two separate compilation units both define (foo = "bar"), will both have their own trait Labelled$foo defined or will generation happen once in a global place (e.g. at linking time)? If not global time, how will they be compatible with each other? Do we anticipate a solution for this without runtime reflection?
  2. Is the fact that records are composed from traits in any way relevant to the user or just an implementation detail that is not even supposed to leak? Or in other words just an internal way to encode records using existing Dotty ASTs instead of new constructs?
  3. How will records behave with regards to subtyping when checking an expected record signature against a given record type? I suppose this just derives from hot Dotty treats & intersection types. Are there docs/papers about that aspect yet?
  4. One very interesting aspect of the ScalaRecords implementation pushed by @vjovanov and others is virtualization in the sense that records are just a type-safe view into an arbitrary data structure (defaulting to an underlying Map). This does allow using records where runtime performance is critical. Is there a story for that?
  5. It would be a nice property if Dotty's Record syntax could be used as a surface syntax for different record implementations. This would probably remove implementation specific complexity from the compiler. It would be especially helpful as records are not a very common language construct in type-safe languages yet and real world experience is still to be gained. Allowing different implementations would help exploring the solution space faster (and for example allow eventual standardization of a solution with practice proven properties).

As a side-note: If Record syntax would be realized as a purely syntactic desugaring, a significant difference to existing desugarings like for-comprehensions would probably be that there needs to be a desugaring for types as well, not only values.

@cvogt

This comment has been minimized.

Copy link

commented Nov 18, 2015

@soc

This comment has been minimized.

Copy link

commented Nov 18, 2015

Very interesting! What use-case did you have in mind for this? Improving the fundamental structure, a general building block for things like HLists or improving things like joins in Slick (a bit like these anonymous types in LINQ)?
Will Tuples be subsumed by this? How will subtyping work in general between row types and classes?

@odersky

This comment has been minimized.

Copy link
Author

commented Nov 18, 2015

@cvogt

If two separate compilation units both define (foo = "bar"), will both have their own trait Labelled$foo defined or will generation happen once in a global place (e.g. at linking time)?

I don't think it matters. We can generate the class as often as we like - it will always be the same class. Of course, the compiler can avoid generating if it knows it exists already.

Is the fact that records are composed from traits in any way relevant to the user or just an implementation detail that is not even supposed to leak? Or in other words just an internal way to encode records using existing Dotty ASTs instead of new constructs?

Since Labelled traits have $'s in them they are considered as implementation detail. The Record trait matters though. I.e you could write

Record & { name: String, age: Int }

and get the capability to enumerate all values via row.

How will records behave with regards to subtyping when checking an expected record signature against a given record type? I suppose this just derives from hot Dotty treats & intersection types.

Exactly.

One very interesting aspect of the ScalaRecords implementation pushed by @vjovanov and others is virtualization in the sense that records are just a type-safe view into an arbitrary data structure (defaulting to an underlying Map). This does allow using records where runtime performance is critical. Is there a story for that?

I think Proxy could be the link here. Also to be able to use the syntax with different implementations.

@viktorklang

This comment has been minimized.

Copy link

commented Nov 18, 2015

Last time I checked JDK Proxies were quite slow.

@DarkDimius

This comment has been minimized.

Copy link
Member

commented Nov 18, 2015

@viktorklang more details on JDK proxy performance:
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7123493

Some small details can be found here: https://github.com/tootedom/concurrent-reflect-proxy
The last link additionally provides a faster implementation of Proxy classes, but even those are ridiculously slow.

@cvogt

This comment has been minimized.

Copy link

commented Nov 18, 2015

@odersky I could see Proxy as a building block, if we create a compile time Proxy. It sounds like a quite different approach to desugaring than for other Scala concepts though.

@odersky

This comment has been minimized.

Copy link
Author

commented Nov 19, 2015

@soc The original use case is indeed database tables. It complements tuples, does not subsume them. It also assumes that Tuples get some kind of HList structure.

@szeiger

This comment has been minimized.

Copy link

commented Nov 19, 2015

If two separate compilation units both define (foo = "bar"), will both have their own trait Labelled$foo defined or will generation happen once in a global place (e.g. at linking time)?

I don't think it matters. We can generate the class as often as we like - it will always be the same class. Of course, the compiler can avoid generating if it knows it exists already.

I don't think this would work in environments which enforce Classloader isolation, like OSGi or JEE, and possibly Java 9. If multiple parts of your app contain independently generated copies of these classes, you would either have to export them, which causes conflicts, or make them private, which prevents records from working across modules.

@soc

This comment has been minimized.

Copy link

commented Nov 19, 2015

Adding to the thing that @szeiger said: Isn't the class file placement problem very similar to that of the discarded idea to add interfaces to things to make structural types fast?

@szeiger

This comment has been minimized.

Copy link

commented Nov 19, 2015

As an aside, I'm actually skeptical about using records for database joins nowadays. It's simple in relational algebra but not a good fit for a language like Scala. First you have to give up on classes, traits, nesting, and any kind of abstraction that goes above flat records of primitive values. It's not how Scala usually works and even modern SQL databases are more expressive than that.

Then comes the matter of missing values. Inner joins are good for toy implementations but in the real world you need outer joins, too, If you have nullable primitives like C#, you can at least do:

 (v1,...,vn) ⟕ (w1,...,wn) → (v1,...,vn, (nullable w1),...,(nullable wn))

With Option instead of nullable values in Scala, it gets uglier because t → Option[t] is not idempotent:

 (v1,...,vn, o1,...,on) ⟕ (w1,...,wn, p1,...,pn)
        → (v1,...,vn, o1,...,on, Option[w1],...,Option[wn], p1,...,pn)
    where o and p are Option types and v and w are non-Option types

Contrast that with a left outer join in Slick which is simply:

C[v] leftJoin C'[w] on ((c, w) → Boolean) → C[(v, Option[w])]

No distinction between nullable and non-nullable source fields, no need for flat tuples. Semantics are not 100% identical (in Slick you can distinguish between a result row where the right-hand side was missing and one where the right-hand side was matched as all null values) but in practice it usually doesn't matter. You have to give up natural joins but that seems like a small price to pay.

@odersky

This comment has been minimized.

Copy link
Author

commented Nov 19, 2015

@szeiger I weakened the language to admit other implementation schemes that would not rely on class generation. Classloaders are really a nuisance, it would be so nice to be able to ignore them. Maybe using the upcoming(?) ClassDynamic?

@Jentsch

This comment has been minimized.

Copy link

commented Mar 2, 2016

Maybe not that relevant, but looked into the cases where Scala use not classes but tuples nowadays and found some cases where records could make code more readable. Assuming that a record {a: 1, b: 2} is somehow a tuple (1, 2) (as defined by the row function) one could come up with the following ideas:

Pattern matching

Names in pattern matching would be an option, this would be useful in cases where most of the extracted values not required (currently leading to many underscores) or some got just confused by the order of the values.

case class Person(name: String, age: Int, <more fields>)

val persons: List[Person])
persons.collect { 
  case Person(age = a) if a < 18 => "Child"
  case Person(age = 18, name = n) => name + "!" // picked only the the required fields
  case Person(name, _, _, _) => name // old syntax should be still valid
}

This is a problem that we currently have in some production code, mostly in tests with Scala Test matchPattern.

Magnet pattern

Records could address the problem in the magnet pattern, that no named parameters are supported. (See)

Give tuple fields a concrete name

The next code snipped could be nice for Scala beginners when writing there first for-each-loop over a map:

class Map[K, V] extends Iterable[{key: K, value: V}] { … }
val map = new Map[String, String] { … }

map foreach { entry =>
  println(entry.key + ": " + entry.value)
}

Preserving parameter names after tupled

I think the last example is more interesting from documentation perspective. Currently we lost the parameter names of a method after wrapping them into other functions like scalaz.Memo leaving the caller guessing which parameter does what.

def div(dividend: Int, divisor: Int) = dividend / divisor
val memoDiv = scalaz.Memo.weakHashMapMemo((div _).tupled)
// memoDiv is a Function1[{dividend: Int, divisor: Int}, Int],
// not just a Function1[(Int, Int), Int]

memoDiv(divisor = 3, dividend = 9)
memoDiv(9, 3)

The last line would require that a (Int, Int) is somehow a {a: Int, b: Int}. This relation shouldn't transitive (like an implicit conversion), since this would lead with the assumption above that a {a: Int, b: Int} is a (Int, Int) that a {a: Int, b: Int} is a {x: Int, y: Int}.

@julienrf

This comment has been minimized.

Copy link

commented Jun 30, 2016

It would be great to have something similar for sum types.

For instance, consider the following type hierarchy:

sealed trait Foo
case class Bar(s: String, i: Int) extends Foo
case class Baz(b: Boolean) extends Foo

It could desugar to the following:

sealed trait Foo extends Sum[Either[("Bar".type, Bar), Either[("Baz".type, Baz), Nothing]]]
case class Bar(s: String, i: Int) extends Foo {
  val sum = Left(("Bar", this))
}
case class Baz(b: Boolean) extends Foo {
  val sum = Right(Left(("Baz", this)))
}

Where Sum[A] is defined as follows:

trait Sum[A] {
  def sum: A
}

This would enable generic programming on sum types rather than just records:

trait ToJson[A] {
  def toJson(a: A): Json
}
implicit def toJsonSum[A](implicit
  toJsonA: ToJson[A]
): ToJson[Sum[A]] =
  new ToJson[Sum[A]] {
    def toJson(sumA: Sum[A]) = toJsonA.toJson(sumA.sum)
  }

implicit def toJsonEither[A, B](implicit
  toJsonA: ToJson[A],
  toJsonB: ToJson[B]
): ToJson[Either[(String, A), B]] =
  new ToJson[Either[(String, A), B]] {
    def toJson(eitherAB: Either[(String, A), B]): Json =
      eitherAB match {
        case Left((name, a)) => Json.obj(name -> toJsonA.toJson(a))
        case Right(b) => toJsonB.toJson(b)
      }
  }

implicit def toJsonNothing: ToJson[Nothing] =
  new ToJson[Nothing] {
    def toJson(nothing: Nothing): Json = sys.error("This can not happen")
  }
@letalvoj

This comment has been minimized.

Copy link

commented Jul 23, 2018

The reason why Dataframes are so much more pleasant to work with in language like Python is that typing them strictly is a pain... It would be awesome the be able to keep notion about column types in a Record like fashion.

The ultimate dream of mine is something like (vague pseudo scala code):

val df: Dataframe[{name:String, date:LocalDateTime}] = ???

def addDayOfWeek[T <: {name:String, date:LocalDateTime}]
                (df:Dataframe[T]): Dataframe[T | {dayOfWeek: Int}] = 
       df append df.map(_.date.getDayOfWeek)

val df2: Dataframe[{name:String, date:LocalDateTime, dayOfWeek: Int}] = addDayOfWeek(df)

where it would have to hold that

{a: A} & {b: B} =:= {a: A, b: B}

I am no expert in this area but I feel like having a first class support for Record types in the language (compare to having it done by macros in shapeless) might be the way to get closer to that point.

I'd love to contribute. I wish I had these ideas back when I was a student and could do that as a part of some thesis.

I see that Record Types are supposed to be in progress. How far am I from foreseeable reality? Am I breaking any of the assumptions or laws of the dot calculus?

@Blaisorblade

This comment has been minimized.

Copy link

commented Aug 30, 2018

Apparently, this proposal is dormant and overlaps with structural types (#1886), which are also meant to support the database use case and hasn't been mentioned here yet. Is there still a use case for records as a separate feature?

OTOH, further library support for structural types might be needed — see
lampepfl/dotty#1886 (comment).

@milessabin

This comment has been minimized.

Copy link

commented Aug 30, 2018

Most likely there will be to support generic programming.

@Krever

This comment has been minimized.

Copy link

commented Nov 24, 2018

For the record, I found the following recent work about records for scala/dotty:
https://www.youtube.com/watch?v=ntrSagXL200
http://www.csc.kth.se/~phaller/doc/karlsson-haller18-scala.pdf

It would be nice to have it in Dotty.

@liufengyun liufengyun transferred this issue from lampepfl/dotty May 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.