Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd Records To Dotty #964
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cvogt
Nov 18, 2015
Contributor
Very happy to see an effort happening into this direction in Dotty! Kicking off the discussion here with a few problems that need to be solved. Maybe answers to some of them already exist and would be good to capture here.
- If two separate compilation units both define
(foo = "bar"), will both have their owntrait Labelled$foodefined or will generation happen once in a global place (e.g. at linking time)? If not global time, how will they be compatible with each other? Do we anticipate a solution for this without runtime reflection? - Is the fact that records are composed from traits in any way relevant to the user or just an implementation detail that is not even supposed to leak? Or in other words just an internal way to encode records using existing Dotty ASTs instead of new constructs?
- How will records behave with regards to subtyping when checking an expected record signature against a given record type? I suppose this just derives from hot Dotty treats
&intersection types. Are there docs/papers about that aspect yet? - One very interesting aspect of the ScalaRecords implementation pushed by @vjovanov and others is virtualization in the sense that records are just a type-safe view into an arbitrary data structure (defaulting to an underlying Map). This does allow using records where runtime performance is critical. Is there a story for that?
- It would be a nice property if Dotty's Record syntax could be used as a surface syntax for different record implementations. This would probably remove implementation specific complexity from the compiler. It would be especially helpful as records are not a very common language construct in type-safe languages yet and real world experience is still to be gained. Allowing different implementations would help exploring the solution space faster (and for example allow eventual standardization of a solution with practice proven properties).
As a side-note: If Record syntax would be realized as a purely syntactic desugaring, a significant difference to existing desugarings like for-comprehensions would probably be that there needs to be a desugaring for types as well, not only values.
|
Very happy to see an effort happening into this direction in Dotty! Kicking off the discussion here with a few problems that need to be solved. Maybe answers to some of them already exist and would be good to capture here.
As a side-note: If Record syntax would be realized as a purely syntactic desugaring, a significant difference to existing desugarings like for-comprehensions would probably be that there needs to be a desugaring for types as well, not only values. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
soc
Nov 18, 2015
Very interesting! What use-case did you have in mind for this? Improving the fundamental structure, a general building block for things like HLists or improving things like joins in Slick (a bit like these anonymous types in LINQ)?
Will Tuples be subsumed by this? How will subtyping work in general between row types and classes?
soc
commented
Nov 18, 2015
|
Very interesting! What use-case did you have in mind for this? Improving the fundamental structure, a general building block for things like HLists or improving things like joins in Slick (a bit like these anonymous types in LINQ)? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
odersky
Nov 18, 2015
Contributor
If two separate compilation units both define (foo = "bar"), will both have their own trait Labelled$foo defined or will generation happen once in a global place (e.g. at linking time)?
I don't think it matters. We can generate the class as often as we like - it will always be the same class. Of course, the compiler can avoid generating if it knows it exists already.
Is the fact that records are composed from traits in any way relevant to the user or just an implementation detail that is not even supposed to leak? Or in other words just an internal way to encode records using existing Dotty ASTs instead of new constructs?
Since Labelled traits have $'s in them they are considered as implementation detail. The Record trait matters though. I.e you could write
Record & { name: String, age: Int }
and get the capability to enumerate all values via row.
How will records behave with regards to subtyping when checking an expected record signature against a given record type? I suppose this just derives from hot Dotty treats & intersection types.
Exactly.
One very interesting aspect of the ScalaRecords implementation pushed by @vjovanov and others is virtualization in the sense that records are just a type-safe view into an arbitrary data structure (defaulting to an underlying Map). This does allow using records where runtime performance is critical. Is there a story for that?
I think Proxy could be the link here. Also to be able to use the syntax with different implementations.
I don't think it matters. We can generate the class as often as we like - it will always be the same class. Of course, the compiler can avoid generating if it knows it exists already.
Since Labelled traits have $'s in them they are considered as implementation detail. The
and get the capability to enumerate all values via
Exactly.
I think |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
viktorklang
commented
Nov 18, 2015
|
Last time I checked JDK Proxies were quite slow. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
DarkDimius
Nov 18, 2015
Member
@viktorklang more details on JDK proxy performance:
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7123493
Some small details can be found here: https://github.com/tootedom/concurrent-reflect-proxy
The last link additionally provides a faster implementation of Proxy classes, but even those are ridiculously slow.
|
@viktorklang more details on JDK proxy performance: Some small details can be found here: https://github.com/tootedom/concurrent-reflect-proxy |
smarter
added
the
itype:language enhancement
label
Nov 18, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cvogt
Nov 18, 2015
Contributor
@odersky I could see Proxy as a building block, if we create a compile time Proxy. It sounds like a quite different approach to desugaring than for other Scala concepts though.
|
@odersky I could see Proxy as a building block, if we create a compile time Proxy. It sounds like a quite different approach to desugaring than for other Scala concepts though. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
odersky
Nov 19, 2015
Contributor
@soc The original use case is indeed database tables. It complements tuples, does not subsume them. It also assumes that Tuples get some kind of HList structure.
|
@soc The original use case is indeed database tables. It complements tuples, does not subsume them. It also assumes that Tuples get some kind of HList structure. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
szeiger
Nov 19, 2015
If two separate compilation units both define (foo = "bar"), will both have their own trait Labelled$foo defined or will generation happen once in a global place (e.g. at linking time)?
I don't think it matters. We can generate the class as often as we like - it will always be the same class. Of course, the compiler can avoid generating if it knows it exists already.
I don't think this would work in environments which enforce Classloader isolation, like OSGi or JEE, and possibly Java 9. If multiple parts of your app contain independently generated copies of these classes, you would either have to export them, which causes conflicts, or make them private, which prevents records from working across modules.
szeiger
commented
Nov 19, 2015
I don't think this would work in environments which enforce Classloader isolation, like OSGi or JEE, and possibly Java 9. If multiple parts of your app contain independently generated copies of these classes, you would either have to export them, which causes conflicts, or make them private, which prevents records from working across modules. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
soc
Nov 19, 2015
Adding to the thing that @szeiger said: Isn't the class file placement problem very similar to that of the discarded idea to add interfaces to things to make structural types fast?
soc
commented
Nov 19, 2015
|
Adding to the thing that @szeiger said: Isn't the class file placement problem very similar to that of the discarded idea to add interfaces to things to make structural types fast? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
szeiger
Nov 19, 2015
As an aside, I'm actually skeptical about using records for database joins nowadays. It's simple in relational algebra but not a good fit for a language like Scala. First you have to give up on classes, traits, nesting, and any kind of abstraction that goes above flat records of primitive values. It's not how Scala usually works and even modern SQL databases are more expressive than that.
Then comes the matter of missing values. Inner joins are good for toy implementations but in the real world you need outer joins, too, If you have nullable primitives like C#, you can at least do:
(v1,...,vn) ⟕ (w1,...,wn) → (v1,...,vn, (nullable w1),...,(nullable wn))
With Option instead of nullable values in Scala, it gets uglier because t → Option[t] is not idempotent:
(v1,...,vn, o1,...,on) ⟕ (w1,...,wn, p1,...,pn)
→ (v1,...,vn, o1,...,on, Option[w1],...,Option[wn], p1,...,pn)
where o and p are Option types and v and w are non-Option types
Contrast that with a left outer join in Slick which is simply:
C[v] leftJoin C'[w] on ((c, w) → Boolean) → C[(v, Option[w])]
No distinction between nullable and non-nullable source fields, no need for flat tuples. Semantics are not 100% identical (in Slick you can distinguish between a result row where the right-hand side was missing and one where the right-hand side was matched as all null values) but in practice it usually doesn't matter. You have to give up natural joins but that seems like a small price to pay.
szeiger
commented
Nov 19, 2015
|
As an aside, I'm actually skeptical about using records for database joins nowadays. It's simple in relational algebra but not a good fit for a language like Scala. First you have to give up on classes, traits, nesting, and any kind of abstraction that goes above flat records of primitive values. It's not how Scala usually works and even modern SQL databases are more expressive than that. Then comes the matter of missing values. Inner joins are good for toy implementations but in the real world you need outer joins, too, If you have nullable primitives like C#, you can at least do:
With
Contrast that with a left outer join in Slick which is simply:
No distinction between nullable and non-nullable source fields, no need for flat tuples. Semantics are not 100% identical (in Slick you can distinguish between a result row where the right-hand side was missing and one where the right-hand side was matched as all null values) but in practice it usually doesn't matter. You have to give up natural joins but that seems like a small price to pay. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
odersky
Nov 19, 2015
Contributor
@szeiger I weakened the language to admit other implementation schemes that would not rely on class generation. Classloaders are really a nuisance, it would be so nice to be able to ignore them. Maybe using the upcoming(?) ClassDynamic?
|
@szeiger I weakened the language to admit other implementation schemes that would not rely on class generation. Classloaders are really a nuisance, it would be so nice to be able to ignore them. Maybe using the upcoming(?) ClassDynamic? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Jentsch
Mar 2, 2016
Maybe not that relevant, but looked into the cases where Scala use not classes but tuples nowadays and found some cases where records could make code more readable. Assuming that a record {a: 1, b: 2} is somehow a tuple (1, 2) (as defined by the row function) one could come up with the following ideas:
Pattern matching
Names in pattern matching would be an option, this would be useful in cases where most of the extracted values not required (currently leading to many underscores) or some got just confused by the order of the values.
case class Person(name: String, age: Int, <more fields>)
val persons: List[Person])
persons.collect {
case Person(age = a) if a < 18 => "Child"
case Person(age = 18, name = n) => name + "!" // picked only the the required fields
case Person(name, _, _, _) => name // old syntax should be still valid
}This is a problem that we currently have in some production code, mostly in tests with Scala Test matchPattern.
Magnet pattern
Records could address the problem in the magnet pattern, that no named parameters are supported. (See)
Give tuple fields a concrete name
The next code snipped could be nice for Scala beginners when writing there first for-each-loop over a map:
class Map[K, V] extends Iterable[{key: K, value: V}] { … }
val map = new Map[String, String] { … }
map foreach { entry =>
println(entry.key + ": " + entry.value)
}Preserving parameter names after tupled
I think the last example is more interesting from documentation perspective. Currently we lost the parameter names of a method after wrapping them into other functions like scalaz.Memo leaving the caller guessing which parameter does what.
def div(dividend: Int, divisor: Int) = dividend / divisor
val memoDiv = scalaz.Memo.weakHashMapMemo((div _).tupled)
// memoDiv is a Function1[{dividend: Int, divisor: Int}, Int],
// not just a Function1[(Int, Int), Int]
memoDiv(divisor = 3, dividend = 9)
memoDiv(9, 3)The last line would require that a (Int, Int) is somehow a {a: Int, b: Int}. This relation shouldn't transitive (like an implicit conversion), since this would lead with the assumption above that a {a: Int, b: Int} is a (Int, Int) that a {a: Int, b: Int} is a {x: Int, y: Int}.
Jentsch
commented
Mar 2, 2016
|
Maybe not that relevant, but looked into the cases where Scala use not classes but tuples nowadays and found some cases where records could make code more readable. Assuming that a record {a: 1, b: 2} is somehow a tuple (1, 2) (as defined by the row function) one could come up with the following ideas: Pattern matchingNames in pattern matching would be an option, this would be useful in cases where most of the extracted values not required (currently leading to many underscores) or some got just confused by the order of the values. case class Person(name: String, age: Int, <more fields>)
val persons: List[Person])
persons.collect {
case Person(age = a) if a < 18 => "Child"
case Person(age = 18, name = n) => name + "!" // picked only the the required fields
case Person(name, _, _, _) => name // old syntax should be still valid
}This is a problem that we currently have in some production code, mostly in tests with Scala Test matchPattern. Magnet patternRecords could address the problem in the magnet pattern, that no named parameters are supported. (See) Give tuple fields a concrete nameThe next code snipped could be nice for Scala beginners when writing there first for-each-loop over a map: class Map[K, V] extends Iterable[{key: K, value: V}] { … }
val map = new Map[String, String] { … }
map foreach { entry =>
println(entry.key + ": " + entry.value)
}Preserving parameter names after tupledI think the last example is more interesting from documentation perspective. Currently we lost the parameter names of a method after wrapping them into other functions like scalaz.Memo leaving the caller guessing which parameter does what. def div(dividend: Int, divisor: Int) = dividend / divisor
val memoDiv = scalaz.Memo.weakHashMapMemo((div _).tupled)
// memoDiv is a Function1[{dividend: Int, divisor: Int}, Int],
// not just a Function1[(Int, Int), Int]
memoDiv(divisor = 3, dividend = 9)
memoDiv(9, 3)The last line would require that a (Int, Int) is somehow a {a: Int, b: Int}. This relation shouldn't transitive (like an implicit conversion), since this would lead with the assumption above that a {a: Int, b: Int} is a (Int, Int) that a {a: Int, b: Int} is a {x: Int, y: Int}. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
julienrf
Jun 30, 2016
It would be great to have something similar for sum types.
For instance, consider the following type hierarchy:
sealed trait Foo
case class Bar(s: String, i: Int) extends Foo
case class Baz(b: Boolean) extends FooIt could desugar to the following:
sealed trait Foo extends Sum[Either[("Bar".type, Bar), Either[("Baz".type, Baz), Nothing]]]
case class Bar(s: String, i: Int) extends Foo {
val sum = Left(("Bar", this))
}
case class Baz(b: Boolean) extends Foo {
val sum = Right(Left(("Baz", this)))
}Where Sum[A] is defined as follows:
trait Sum[A] {
def sum: A
}This would enable generic programming on sum types rather than just records:
trait ToJson[A] {
def toJson(a: A): Json
}implicit def toJsonSum[A](implicit
toJsonA: ToJson[A]
): ToJson[Sum[A]] =
new ToJson[Sum[A]] {
def toJson(sumA: Sum[A]) = toJsonA.toJson(sumA.sum)
}
implicit def toJsonEither[A, B](implicit
toJsonA: ToJson[A],
toJsonB: ToJson[B]
): ToJson[Either[(String, A), B]] =
new ToJson[Either[(String, A), B]] {
def toJson(eitherAB: Either[(String, A), B]): Json =
eitherAB match {
case Left((name, a)) => Json.obj(name -> toJsonA.toJson(a))
case Right(b) => toJsonB.toJson(b)
}
}
implicit def toJsonNothing: ToJson[Nothing] =
new ToJson[Nothing] {
def toJson(nothing: Nothing): Json = sys.error("This can not happen")
}
julienrf
commented
Jun 30, 2016
•
|
It would be great to have something similar for sum types. For instance, consider the following type hierarchy: sealed trait Foo
case class Bar(s: String, i: Int) extends Foo
case class Baz(b: Boolean) extends FooIt could desugar to the following: sealed trait Foo extends Sum[Either[("Bar".type, Bar), Either[("Baz".type, Baz), Nothing]]]
case class Bar(s: String, i: Int) extends Foo {
val sum = Left(("Bar", this))
}
case class Baz(b: Boolean) extends Foo {
val sum = Right(Left(("Baz", this)))
}Where trait Sum[A] {
def sum: A
}This would enable generic programming on sum types rather than just records: trait ToJson[A] {
def toJson(a: A): Json
}implicit def toJsonSum[A](implicit
toJsonA: ToJson[A]
): ToJson[Sum[A]] =
new ToJson[Sum[A]] {
def toJson(sumA: Sum[A]) = toJsonA.toJson(sumA.sum)
}
implicit def toJsonEither[A, B](implicit
toJsonA: ToJson[A],
toJsonB: ToJson[B]
): ToJson[Either[(String, A), B]] =
new ToJson[Either[(String, A), B]] {
def toJson(eitherAB: Either[(String, A), B]): Json =
eitherAB match {
case Left((name, a)) => Json.obj(name -> toJsonA.toJson(a))
case Right(b) => toJsonB.toJson(b)
}
}
implicit def toJsonNothing: ToJson[Nothing] =
new ToJson[Nothing] {
def toJson(nothing: Nothing): Json = sys.error("This can not happen")
} |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
letalvoj
Jul 23, 2018
The reason why Dataframes are so much more pleasant to work with in language like Python is that typing them strictly is a pain... It would be awesome the be able to keep notion about column types in a Record like fashion.
The ultimate dream of mine is something like (vague pseudo scala code):
val df: Dataframe[{name:String, date:LocalDateTime}] = ???
def addDayOfWeek[T <: {name:String, date:LocalDateTime}]
(df:Dataframe[T]): Dataframe[T & {dayOfWeek: Int}] =
df append df.map(_.date.getDayOfWeek)
val df2: Dataframe[{name:String, date:LocalDateTime, dayOfWeek: Int}] = addDayOfWeek(df)where it would have to hold that
{a: A} & {b: B} =:= {a: A, b: B}I am no expert in this area but I feel like having a first class support for Record types in the language (compare to having it done by macros in shapeless) might be the way to get closer to that point.
I'd love to contribute. I wish I had these ideas back when I was a student and could do that as a part of some thesis.
I see that Record Types are supposed to be in progress. How far am I from foreseeable reality? Am I breaking any of the assumptions or laws of the dot calculus?
letalvoj
commented
Jul 23, 2018
•
|
The reason why The ultimate dream of mine is something like (vague pseudo scala code): val df: Dataframe[{name:String, date:LocalDateTime}] = ???
def addDayOfWeek[T <: {name:String, date:LocalDateTime}]
(df:Dataframe[T]): Dataframe[T & {dayOfWeek: Int}] =
df append df.map(_.date.getDayOfWeek)
val df2: Dataframe[{name:String, date:LocalDateTime, dayOfWeek: Int}] = addDayOfWeek(df)where it would have to hold that {a: A} & {b: B} =:= {a: A, b: B}I am no expert in this area but I feel like having a first class support for I'd love to contribute. I wish I had these ideas back when I was a student and could do that as a part of some thesis. I see that Record Types are supposed to be in progress. How far am I from foreseeable reality? Am I breaking any of the assumptions or laws of the dot calculus? |
odersky commentedNov 18, 2015
This is a rough sketch of a possible design of records for Dotty
Types
A single-field record type is of the form
(a: T)whereais an identifier andTis a type. It expands into an instance of a traitLabelled$a[T]. We assume that for every field nameapresent in a program the following trait will be automatically generated:A multi-field record type
(a_1: T_1, ..., a_n: T_n)is equivalent to the intersection of single-field record types(a_1: T_1) & ... & (a_n: T_n). Since&is commutative, order of fields does not matter.A row type is a tuple of label/value pairs. Each label is a string literal. Unlike for record types, order of labels does matter in a row type.
The base trait
Recordis defined as follows:Here,
Rowis assumed to be a generic base type of HLists. Record values are instances ofRecordtypes that refine the type ofrow.Values
A record value is of the form
(a_1 = v_1, ..., a_n = v_n). Assuming the valuesv_ihave typesT_ithis is a shorthand forTODO: Define equality
TODO: Define how to create a record value from a generic HList representation - on the JDK it seems
we can use Java's Proxy mechanism for this.