Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enum construct #1970

Closed
odersky opened this issue Feb 13, 2017 · 126 comments
Closed

Add enum construct #1970

odersky opened this issue Feb 13, 2017 · 126 comments

Comments

@odersky
Copy link
Contributor

odersky commented Feb 13, 2017

Introduction

This is a proposal to add an enum construct to Scala's syntax. The construct is intended to serve at the same time as a native implementation of enumerations as found in other languages and as a more concise notation for ADTs and GADTs. The proposal affects the Scala definition and its compiler in the following ways:

  • It adds new syntax, including a new keyword, enum.
  • It adds code to the scanner and parser to support the new syntax
  • It adds new rules for desugaring enums.
  • It adds a predefined trait scala.Enum and a predefined runtime class scala.runtime.EnumValues.

This is all that's needed. After desugaring, the resulting programs are expressible as normal Scala code.

Motivation

enums are essentially syntactic sugar. So one should ask whether they are necessary at all. Here are some issues that the proposal addresses:

  1. Enumerations as a lightweight type with a finite number of user-defined elements are not very well supported in Scala. Using integers for this task is tedious and loses type safety. Using case objects is less efficient and gets verbose as the number of values grows. The existing library-based approach in the form of Scala's Eumeration object has been criticized for being hard to use and for lack of interoperability with host-language enumerations. Alternative approaches, such as Enumeratum fix some of these issues, but have their own tradeoffs.

  2. The standard approach to model an ADT uses a sealed base class with final case classes and objects as children. This works well, but is more verbose than specialized syntactic constructs.

  3. The standard approach keeps the children of ADTs as separate types. For instance, Some(x) has type Some[T], not Option[T]. This gives finer type distinctions but can also confuse type inference. Obtaining the standard ADT behavior is possible, but very tricky. Essentially, one has to make the case class abstract and implement the apply method in the companion object by hand.

  4. Generic programming techniques need to know all the children types of an ADT or a GADT. Furthermore, this information has to be present during type-elaboration, when symbols are first completed. There is currently no robust way to do so. Even if the parent type is sealed, its compilation unit has to be analyzed completely to know its children. Such an analysis can potentially introduce cyclic references or it is not guaranteed to be exhaustive. It seems to be impossible to avoid both problems at the same time.

I think all of these are valid criticisms. In my personal opinion, when taken alone, neither of these criticisms is strong enough to warrant introducing a new language feature. But taking them together could shift the balance.

Objectives

  1. The new feature should allow the concise expression of enumerations.
  2. Enumerations should be efficient, even if they define many values. In particular, we should avoid defining a new class for every value.
  3. It should be possible to model Java enumerations as Scala emumerations.
  4. The new feature should allow the concise expression of ADTs and GADTs.
  5. It should support all idioms that can be expressed with case classes. In particular, we want to support type and value parameters, arbitrary base traits, self types, and arbitrary statements in a case class and its companion object.
  6. It should lend itself to generic programming

Basic Idea

We define a new kind of enum class. This is essentially a sealed class whose instances are given by cases defined in its companion object. Cases can be simple or parameterized. Simple cases without any parameters map to values. Parameterized cases map to case classes. A shorthand form enum E { Cs } defines both an enum class E and a companion object with cases Cs.

Examples

Here's a simple enumeration

enum Color { 
  case Red
  case Green
  case Blue
}

or, even shorter:

enum Color { case Red, Green, Blue }

Here's a simple ADT:

enum Option[T] {
  case Some[T](x: T)
  case None[T]()
}

Here's Option again, but expressed as a covariant GADT, where None is a value that extends Option[Nothing].

enum Option[+T] {
  case Some[T](x: T)
  case None
}

It is also possible to add fields or methods to an enum class or its companion object, but in this case we need to split the `enum' into a class and an object to make clear what goes where:

enum class Option[+T] extends Serializable {
  def isDefined: Boolean
}
object Option {
  def apply[T](x: T) = if (x != null) Some(x) else None
  case Some[+T](x: T) {
     def isDefined = true
  }
  case None {
     def isDefined = false
  }
}

The canonical Java "Planet" example (https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html) can be expressed
as follows:

enum class Planet(mass: Double, radius: Double) {
  private final val G = 6.67300E-11
  def surfaceGravity = G * mass / (radius * radius)
  def surfaceWeight(otherMass: Double) =  otherMass * surfaceGravity
}
object Planet {
  case MERCURY extends Planet(3.303e+23, 2.4397e6)
  case VENUS   extends Planet(4.869e+24, 6.0518e6)
  case EARTH   extends Planet(5.976e+24, 6.37814e6)
  case MARS    extends Planet(6.421e+23, 3.3972e6)
  case JUPITER extends Planet(1.9e+27,   7.1492e7)
  case SATURN  extends Planet(5.688e+26, 6.0268e7)
  case URANUS  extends Planet(8.686e+25, 2.5559e7)
  case NEPTUNE extends Planet(1.024e+26, 2.4746e7)

  def main(args: Array[String]) = {
    val earthWeight = args(0).toDouble
    val mass = earthWeight/EARTH.surfaceGravity
    for (p <- enumValues)
      println(s"Your weight on $p is ${p.surfaceWeight(mass)}")
  }
}

Syntax Extensions

Changes to the syntax fall in two categories: enum classes and cases inside enums.

The changes are specified below as deltas with respect to the Scala syntax given here

  1. Enum definitions and enum classes are defined as follows:

    TmplDef ::=  `enum' `class’ ClassDef
             |   `enum' EnumDef
    EnumDef ::=  id ClassConstr [`extends' [ConstrApps]] 
                 [nl] `{’ EnumCaseStat {semi EnumCaseStat} `}’
    
  2. Cases of enums are defined as follows:

     EnumCaseStat  ::=  {Annotation [nl]} {Modifier} EnumCase
     EnumCase      ::=  `case' (EnumClassDef | ObjectDef | ids)
     EnumClassDef  ::=  id [ClsTpeParamClause | ClsParamClause] 
                        ClsParamClauses TemplateOpt
     TemplateStat  ::=  ... | EnumCaseStat
    

Desugarings

Enum classes and cases expand via syntactic desugarings to code that can be expressed in existing Scala. First, some terminology and notational conventions:

  • We use E as a name of an enum class, and C as a name of an enum case that appears in the companion object of E.

  • We use <...> for syntactic constructs that in some circumstances might be empty. For instance <body> represents either the body of a case between {...} or nothing at all.

  • Enum cases fall into three categories:

    • Class cases are those cases that are parameterized, either with a type parameter section [...] or with one or more (possibly empty) parameter sections (...).
    • Simple cases are cases of a non-generic enum class that have neither parameters nor an extends clause or body. That is, they consist of a name only.
    • Value cases are all cases that do not have a parameter section but that do have a (possibly generated) extends clause and/or a body.

Simple cases and value cases are called collectively singleton cases.

The desugaring rules imply that class cases are mapped to case classes, and singleton cases are mapped to val definitions.

There are seven desugaring rules. Rules (1) and (2) desugar enums and enum classes. Rules (3) and (4) define extends clauses for cases that are missing them. Rules (4 - 6) define how such expanded cases map into case classes, case objects or vals. Finally, rule (7) expands comma separated simple cases into a sequence of cases.

  1. An enum definition

     enum E ... { <cases> }
    

    expands to an enum class and a companion object

    enum class E ...
    object E { <cases> }
    
  2. An enum class definition

     enum class E ... extends <parents> ...
    

    expands to a sealed abstract class that extends the scala.Enum trait:

    sealed abstract class E ... extends <parents> with scala.Enum ...
    
  3. If E is an enum class without type parameters, then a case in its companion object without an extends clause

     case C <params> <body>
    

    expands to

     case C <params> <body> extends E
    
  4. If E is an enum class with type parameters Ts, then a case in its companion object without an extends clause

     case C <params> <body>
    

    expands according to two alternatives, depending whether C has type parameters or not. If C has type parameters, they must have the same names and appear in the same order as the enum type parameters Ts (variances may be different, however). In this case

     case C [Ts] <params> <body>
    

    expands to

     case C[Ts] <params> extends E[Ts] <body>
    

    For the case where C does not have type parameters, assume E's type parameters are

     V1 T1 > L1 <: U1 ,   ... ,    Vn Tn >: Ln <: Un      (n > 0)
    

    where each of the variances Vi is either '+' or '-'. Then the case expands to

     case C <params> extends E[B1, ..., Bn] <body>
    

    where Bi is Li if Vi = '+' and Ui if Vi = '-'. It is an error if Bi refers to some other type parameter Tj (j = 0,..,n-1). It is also an error if E has type parameters that are non-variant.

  5. A class case

     case C <params> ...
    

    expands analogous to a case class:

     final case class C <params> ...
    

    However, unlike for a regular case class, the return type of the associated apply method is a fully parameterized type instance of the enum class E itself instead of C. Also the enum case defines an enumTag method of the form

     def enumTag = n
    

    where n is the ordinal number of the case in the companion object, starting from 0.

  6. A value case

     case C extends <parents> <body>
    

    expands to a value definition

     val C = new <parents> { <body>; def enumTag = n; $values.register(this) }
    

    where n is the ordinal number of the case in the companion object, starting from 0.
    The statement $values.register(this) registers the value as one of the enumValues of the
    enumeration (see below). $values is a compiler-defined private value in
    the companion object.

  7. A simple case

     case C
    

    of an enum class E that does not take type parameters expands to

     val C = $new(n, "C")
    

    Here, $new is a private method that creates an instance of of E (see below).

  8. A simple case consisting of a comma-separated list of enum names

    case C_1, ..., C_n
    

    expands to

     case C_1; ...; case C_n
    

    Any modifiers or annotations on the original case extend to all expanded cases.

Enumerations

Non-generic enum classes E that define one or more singleton cases are called enumerations. Companion objects of enumerations define the following additional members.

  • A method enumValue of type scala.collection.immutable.Map[Int, E]. enumValue(n) returns the singleton case value with ordinal number n.
  • A method enumValueNamed of type scala.collection.immutable.Map[String, E]. enumValueNamed(s) returns the singleton case value whose toString representation is s.
  • A method enumValues which returns an Iterable[E] of all singleton case values in E, in the order of their definitions.

Companion objects that contain at least one simple case define in addition:

  • A private method $new which defines a new simple case value with given ordinal number and name. This method can be thought as being defined as follows.

       def $new(tag: Int, name: String): ET = new E {
          def enumTag = tag
          def toString = name
          $values.register(this)   // register enum value so that `valueOf` and `values` can return it.
       }
    

Examples

The Color enumeration

enum Color { 
  case Red, Green, Blue
}

expands to

sealed abstract class Color extends scala.Enum
object Color {
  private val $values = new scala.runtime.EnumValues[Color]
  def enumValue: Map[Int, Color] = $values.fromInt
  def enumValueNamed: Map[String, Color] = $values.fromName
  def enumValues: Iterable[Color] = $values.values

  def $new(tag: Int, name: String): Color = new Color {
    def enumTag: Int = tag
    override def toString: String = name
    $values.register(this)
  }

  final case val Red: Color = $new(0, "Red")
  final case val Green: Color = $new(1, "Green")
  final case val Blue: Color = $new(2, "Blue")
}

The Option GADT

enum Option[+T] {
  case Some[+T](x: T)
  case None
}

expands to

sealed abstract class Option[+T] extends Enum
object Option {
  final case class Some[+T](x: T) extends Option[T] {
     def enumTag = 0
  }
  object Some {
    def apply[T](x: T): Option[T] = new Some(x)
  }
  val None = new Option[Nothing] {
    def enumTag = 1
    override def toString = "None"
    $values.register(this)
  }
} 

Note: We have added the apply method of the case class expansion because
its return type differs from the one generated for normal case classes.

Implementation Status

An implementation of the proposal is in #1958.

Interoperability with Java Enums

On the Java platform, an enum class may extend java.lang.Enum. In that case, the enum as a whole is implemented as a Java enum. The compiler will enforce the necessary restrictions on the enum to make such an implementation possible. The precise mapping scheme and associated restrictions remain to be defined.

Open Issue: Generic Programming

One advantage of the proposal is that it offers a reliable way to enumerate all cases of an enum class before any typechecking is done. This makes enums a good basis for generic programming. One could envisage compiler-generated hooks that map enums to their "shapes", i.e. typelevel sums of products. An example of what could be done is elaborated in a test in the dotty repo.

@liufengyun
Copy link
Contributor

A very nice explanation of the new feature 👍

There seems to be an inconsistency between the desugaring Rule 5 and the following code example:

enum Option[+T] {
  case Some(x: T)
  case None extends Option[Nothing]
}

If I understand correctly, the desugaring Rule 5 says that for the case None, it is an error for Option to take type parameters.

  1. A case without explicitly given type or value parameters but with an explicit extends clause or body

case C extends |parents| |body|

expands to a value definition

val C = new |parents| { |body|; def enumTag = n }

where n is the ordinal number of the case in the companion object, starting from 0. It is an error in this case if the enum class E takes type parameters.

Another minor question is, it seems the following code in the example expansions does not type check:

  object Some extends T => Option[T] {
    def apply[T](x: T): Option[T] = new Some(x)
  }

We need to remove the part extends T => Option[T]?

@odersky
Copy link
Contributor Author

odersky commented Feb 13, 2017

@liufengyun

If I understand correctly, the desugaring Rule 5 says that for the case None, it is an error for Option to take type parameters.

Well spotted. This clause should go to rule 6. I fixed it.

Another minor question is, it seems the following code in the example expansions does not type check

You are right. We need to drop the extends clause.

@julienrf
Copy link
Collaborator

In the following introductory example:

enum class Option[+T] extends Serializable {
  def isDefined: Boolean
}
object Option {
  def apply[T](x: T) = if (x != null) Some(x) else None
  case Some(x: T) {
     def isDefined = true
  }
  case None extends Option[Nothing] {
     def isDefined = false
  }
}

I find it a little bit confusing that in the case Some(x: T) definition the type parameter T is bound to the one defined in enum class Option[+T]. I think it is the first time that symbol binding crosses lexical scopes.

Also, how would that interact with additional type parameters?

case Some[A](x: T, a: A)

@odersky
Copy link
Contributor Author

odersky commented Feb 13, 2017

Also, how would that interact with additional type parameters?

We have to disallow that.

@szeiger
Copy link
Member

szeiger commented Feb 13, 2017

Keeping type parameters undefined looks more like an artifact of desugaring and Dotty's type system than a feature to me. Are there any cases where this would actually be useful?

enum Option[T] {
  case Some(x: T)
  case None()
}

OTOH, covariant type parameters look very useful and are common in immutable data structures. Could this case be simplified?

enum Option[+T] {
  case Some(x: T)
  case None extends Option[Nothing]
}

How about automatically filling in unused type parameters in cases as their lower (covariant) or upper (contravariant) bounds and only leaving invariant type parameters undefined?

  1. It should be possible to model Java enumerations as Scala emumerations.

Instead of only exposing Java enums to Scala in this way, Is there a well-defined subset of Scala enumerations that can be compiled to proper Java enums for the best efficiency and Java interop on the JVM?

@DarkDimius
Copy link
Member

DarkDimius commented Feb 13, 2017

I'm proposing modification to the longer syntax:

enum class Option[+T] extends Serializable {
  def isDefined: Boolean
}
object Option {
  def apply[T](x: T) = if (x != null) Some(x) else None
  case Some[T](x: T) { // <-- changed
     def isDefined = true
  }
  case None extends Option[Nothing] {
     def isDefined = false
  }
}

In this case the T is obviously bound in the scope. It still desugars to the same thing, but I feel it's more regular and it allows to rename the type argument:

enum class Option[+T] extends Serializable {
  def isDefined: Boolean
}
object Option {
  def apply[T](x: T) = if (x != null) Some(x) else None
  case Some[U](x: U) extends Option[U] { // <-- changed
     def isDefined = true
  }
  case None extends Option[Nothing] {
     def isDefined = false
  }
}

@DarkDimius
Copy link
Member

DarkDimius commented Feb 13, 2017

On the meeting, we've also proposed an additional rule:
require that all extends clauses in case-s list the enum super-class.
This will run the code below invalid:

enum class Option[+T] extends Serializable {
  def isDefined: Boolean
}
object Option {
  def apply[T](x: T) = if (x != null) Some(x) else None
  case Some(x: Int) extends AnyRef { // <-- Not part of enum
     def isDefined = true
  }
  case None extends Option[Nothing] {
     def isDefined = false
  }
}

@julienrf
Copy link
Collaborator

@DarkDimius I think this is still insufficient because it is still (a little bit) confusing that the T type parameter of case Some[T] is automatically applied to the T type parameter of the parent Option[+T] class.

Despite these inconveniences, I think that the shorter syntax is a huge benefit, so I find it acceptable to have just case Some(x: T) as a shorthand for case class Some[T](x: T) extends Option[T]. It is always possible to fallback to usual sealed traits and case classes for the cases we need more fine grained control (e.g. case class Flipped[A, B]() extends Parent[B, A] can not be expressed with case enums).

@DarkDimius
Copy link
Member

One more point discussed on the dotty meeting:

there should be additional limitation that no other class can extend abstract case class. Otherwise the supper-class isn't a sum of it's children and serialization\patmat won't be able to enumarate all children.

@DarkDimius
Copy link
Member

DarkDimius commented Feb 13, 2017

It is always possible to fallback to usual sealed traits and case classes for the cases we need more fine grained control

Sealed classes give less guarantees. The point of this addition is that you cannot get equivalent guarantees from sealed classes.

e.g. case class Flipped[A, B]() extends Parent[B, A] can not be expressed with case enums).

Given currently proposed rules it can be expressed, you simply need to write it explicitly using the longer vesion.

@AlecZorab
Copy link

AlecZorab commented Feb 13, 2017

However, unlike for a regular case class, the return type of the associated apply and copy methods is a fully parameterized type instance of the enum class E itself instead of C

Am I understanding correctly that the following occurs?

enum IntWrapper {
 case W(i:Int)
 case N
}
val i = IntWrapper(1)
some match {
  case (w:W) => 
    w.copy(i = 2)
     .copy(i = 3) //this line won't compile because the previous copy returned an IntWrapper
  case N => ???
}

If so then it seems like copy should still return C

@odersky
Copy link
Contributor Author

odersky commented Feb 13, 2017

If so then it seems like copy should still return C

That's a good argument. I dropped copy from the description.

@sjrd
Copy link
Member

sjrd commented Feb 13, 2017

Instead of only exposing Java enums to Scala in this way, Is there a well-defined subset of Scala enumerations that can be compiled to proper Java enums for the best efficiency and Java interop on the JVM?

AFAICT, any enum containing only simple cases (i.e., without ()) can be compiled to Java enums, and exposed as an enum to Java for interop. This even includes enum classes with cases that redefine members.

@wogan
Copy link

wogan commented Feb 13, 2017

For enumerations I would love to see a valueOf method with String => E type as well, to look values up by name as well as ordinal.

@ritschwumm
Copy link

ritschwumm commented Feb 13, 2017

i'd probably never use a naked Int=>E valueOf method for fear of exceptions; i'd very much prefer Int=>Option[E]. or maybe (if that's still a thing in dotty) something like paulp's structural pseudo-Option used in pattern matching.

oh, and if there is a String=>E (Option preferred, of course), then why not an E=>String, too?

@Ichoran
Copy link

Ichoran commented Feb 13, 2017

This looks great!

I don't think the long form is an improvement, though. The case keyword is all you need to disambiguate the members of the ADT from other stuff.

enum Either[+L, +R] {
  def fold[Z](f: L => Z, g: R => Z): Z
  case Left(value: L) {
    def fold[Z](f: L => Z, g: Nothing => Z) = f(value)
  }
  case Right(value: R) {
    def fold[Z](f: Nothing => Z, g: R => Z) = g(value)
  }
}

I don't see any issues here. I agree with Stefan that generics should be handled automatically by default, and have the type parameter missing and filled in as Nothing if the type is not referenced. If you want something else, you can do it explicitly.

  case Right[+L, +R](value: R) extends Either[L, R]

@lloydmeta
Copy link
Contributor

Currently this is looking great! I wrote Enumeratum and would be happy to see something like this baked into the language :)

Just a few thoughts/questions based on feedback I've received in the past:

  • It might be nice to make valueOf non-throwing (returns Option) by default. Slightly easier to reason about and might even be faster
  • A withName method might also be nice to have
  • Would it be possible to customise the enumTag for a given enum member so that users can control the resolution valueOf? If so, it might be nice to have the compiler check for uniqueness too :)

@retronym
Copy link
Member

retronym commented Feb 14, 2017

AFAICT, any enum containing only simple cases (i.e., without ()) can be compiled to Java enums, and exposed as an enum to Java for interop. This even includes enum classes with cases that redefine members.

Compiling to Java enums has some downsides:

  • The base enum type cannot have a user-selected subclass, as the one-and-only subclass slot would be taken by extending java.lang.Enum
  • Methods inherited from java.lang.Enum might not be desired (e.g. case Person(name: Name) would not be allowed because java.lang.Enum.name()String is final, so the accessor method for name would clash.
  • Crossing the threshold of "compilable to platform Enum" in either direction (by adding the first, or removing the last, case with params) would likely be a binary incompatible change.

This suggests to me that we need an opt-in (or maybe an opt-out) annotation for this compilation strategy.

@retronym
Copy link
Member

retronym commented Feb 14, 2017

Java enums are exposed the the Scala typechecker as though they were constant value definitions:

scala> symbolOf[java.lang.annotation.RetentionPolicy].companionModule.info.decls.toList.take(3).map(_.initialize.defString).mkString("\n")
res21: String =
final val SOURCE: java.lang.annotation.RetentionPolicy(SOURCE)
final val CLASS: java.lang.annotation.RetentionPolicy(CLASS)
final val RUNTIME: java.lang.annotation.RetentionPolicy(RUNTIME)

scala> showRaw(symbolOf[java.lang.annotation.RetentionPolicy].companionModule.info.decls.toList.head.info.resultType)
res24: String = ConstantType(Constant(TermName("SOURCE")))

This is something of an implementation detail, but is needed:

  • to allow references in (platform) annotation arguments, which only admit constant values
  • to keep track of the cases long enough for the pattern matcher to analyse exhaustivity/reachability.

The enums from this proposal will need a similar approach, and I think that should be specced.

@odersky
Copy link
Contributor Author

odersky commented Feb 14, 2017

@Ichoran The long form is intended to allow for

  • class members
  • companion members
  • parent types of the companion

A played with various variants but found none that was clearer than what was eventually proposed. If one is worried about scoping of the type parameter one could specify that the long form is a single syntactic construct

enum <ident> <params> extends <parents> <body> 
[object <ident> extends <parents> <body>]

and specify that any type parameters in <params> are visible in the whole construct. That would be an option.

@odersky
Copy link
Contributor Author

odersky commented Feb 14, 2017

@ritschwumm

I'd probably never use a naked Int=>E valueOf method for fear of exceptions; i'd very much prefer Int=>Option[E]. or maybe (if that's still a thing in dotty) something like paulp's structural pseudo-Option used in pattern matching.

What about making valueOf an immutable map?

@odersky
Copy link
Contributor Author

odersky commented Feb 14, 2017

@retronym Thanks for the analysis wrt Java enums. It seems like an opt-in is the best way to do it. How about we take inheritance from java.lang.Enum as our cue? I.e.

enum JavaColor extends java.lang.Enum {
   case Red
   case Green
   case Blue
}

Then there would be no surprise that we cannot redefine name because it is final in java.lang.Enum.

Also, can you suggest spec language for the constantness part?

@julienrf
Copy link
Collaborator

A withName method might also be nice to have

I agree. This would also be required for most of the useful generic programming stuff we want to do (e.g. automatically generate serializers/deserializers for enumerations based on their name rather than their ordinal).

@odersky
Copy link
Contributor Author

odersky commented Feb 14, 2017

@szeiger I agree it would be nice if we could fill in extremal types of co/contravariant enum types, i.e. expand

case None

to

case None extends Option[Nothing]

But maybe it's too much magic? Have to think about it some more.

@odersky
Copy link
Contributor Author

odersky commented Feb 14, 2017

A withName method might also be nice to have

I agree. This would also be required for most of the useful generic programming stuff we want to do (e.g. automatically generate serializers/deserializers for enumerations based on their name rather than their ordinal).

Agreed. But that means we'd have to design that feature with the generic programming stuff, because it would likely end up on the type level? Not sure abut this point.

@odersky
Copy link
Contributor Author

odersky commented Mar 4, 2017

as a type parameter of foo

Sorry, that should have been A.

@LPTK
Copy link
Contributor

LPTK commented Mar 6, 2017

For those interested, I fleshed out some implementation of my above proposal, as an experiment. See issue #2055.

@ghost
Copy link

ghost commented Apr 1, 2017

I would love this to be implemented but one thing that has to be supported is the ability to implement Visitor pattern dynamics and have more in the enum than simply the object itself. For explanation look at my post on Stack overflow: http://stackoverflow.com/questions/43152963

This is one of the best ways to leverage enums in a code base to avoid having complex switch logic.

@odersky
Copy link
Contributor Author

odersky commented Apr 4, 2017

I integrated @szeiger's proposal

How about automatically filling in unused type parameters in cases as their lower (covariant) or upper (contravariant) bounds and only leaving invariant type parameters undefined?

It's now number 3 of the new desugaring rules. The rules is complicated, but it's one stumbling block less for defining simple ADTs.

odersky added a commit to dotty-staging/dotty that referenced this issue Apr 4, 2017
`copy` should always return the type of it's rhs. The discussion of
scala#1970 concluded that no special treatment for enums is needed.
odersky added a commit to dotty-staging/dotty that referenced this issue Apr 4, 2017
Based on the discussion in scala#1970, enumeration objects now
have three public members:

 - valueOf: Map[Int, E]
 - withName: Map[String, E]
 - values: Iterable[E]

Also, the variance of case type parameters is now
the same as in the corresponding type parameter of
the enum class.
@scottcarey
Copy link

Is there a mapping from this java enum to this proposal? This is a purposely convoluted example of what you can do in Java...

  interface Mergeable<A> {
    A mergeWith(A other);
  }

  enum Color implements Mergeable<Color> {
    BLACK("As dark as can be") {
      @Override
      public Color mergeWith(Color other) {
        switch (other) {
        case BLACK:
          return BLACK;
        default:
          return GREY;
        }
      }
    },
    WHITE("Blinding white") {
      @Override
      public Color mergeWith(Color other) {
        switch (other) {
        case WHITE:
          return WHITE;
        default:
          return GREY;
        }
      }
    },
    GREY("Somewhere in between") {
      @Override
      public Color mergeWith(Color other) {
        return GREY;
      }
      public String invisibleButFindMeViaReflectionSeriously() {
        return "you found me out!";
      }
    };

    private final String description;

    Color(String description) {
      this.description = description;
    }

    public String getDescription() {
      return description;
    }
  }

syntactically, the translation would be the following, but It does not appear to be valid:

trait Mergeable[A] { def mergeWith(a: A): A }

enum Color(description: String) extends Mergeable[Color] {
  case Black("As dark as can be") {
    override def mergeWith(color: Color) = color match {
      case Black => Black
      case _ => Grey
    }
  }
  case White("Blinding White") {
    override def mergeWith(color: Color) = color match {
      case White => White
      case _ => Grey
    }
  }
  case Grey("Somewhere in between") {
    override def mergeWith(color: Color) = Grey
    def structuralOrNotIDontKnow() = "method not on super"
  }
}

What I am highlighting here are a few features of java enums:

  • A Java enum represents a single type (that can implement interfaces)
  • But there can be anonymous subtypes... implementing those interfaces
  • Each enum value can have independent values set inside it, without making a new type (the parent type has a constructor)
  • Java will encode methods not visible on the interfaces, but you can't get to them via Java syntax.

Scanning the examples in this issue, I did not see any examples where the enum had constructor parameters. I could be blind though. There is a lot of talk about encoding GADT's more simply, which I will certainly use, but not much talk about encoding multiple instances of simple stuff, like the canonical java "Planet" example (https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html), where the only difference is in the values embedded in the enum instances -- no behavior difference.

@odersky
Copy link
Contributor Author

odersky commented Apr 5, 2017

A description and an implementation of a translation to Java enums is still work to do. Since I am not very current on the details of java enums, I would appreciate help from others here.

I don't think it should be a requirement that we can support all features of java enums, though. Specifically, the example above passes the string associated with a case as a parameter (at least that's how I understand it, I might be wrong here). That's not supported in the proposal, you'd have to override toString instead.

Scanning the examples in this issue, I did not see any examples where the enum had constructor parameters

enum classes can have parameters but then enum cases need to use the usual extends syntax to pass them. There's no shorthand for this, as in Java enums.

@odersky
Copy link
Contributor Author

odersky commented Apr 5, 2017

@scottcarey I have added a Scala version of the planet example to the description above. Thanks for pointing me to the Java version!

@odersky
Copy link
Contributor Author

odersky commented Apr 5, 2017

I made two changes to the proposal.

  1. Rename the methods defined for an enumeration as follows: values -> enumValues, valueOf -> enumValue, withName -> enumValueNamed.

    The reason is that a common use case of enumerations is a wildcard import of the enum object to get all the cases, e.g. import Color._. But by the same import we also get the implicitly defined methods. So it's better that these methods have uncommon names that do not conflict by accident with something the user defined.

  2. Generate the utility methods not just for simple cases but for all singleton cases. The Java planet example could not have been written conveniently without this generalization.

@scottcarey
Copy link

scottcarey commented Apr 5, 2017

Supporting every possible quirk of Java enums is not necessary, but there is some significant overlap to take advantage of.

The best way to think about Java enums is to consider that the singleton pattern in java is:

enum ThereCanBe {
  ONLY_ONE;
}

(as recommended by EffectiveJava for a decade but ignored by bloggers talking about advantages Scala has over Java)

Decorate the enum with whatever compiler-known data (member variables) and behavior (methods) you want.
This extends to more than one instance of the type, all known in advance by the compiler, leading to efficient dispatch over them (via switch) and tools like EnumMap.

With that in mind, this proposal is essentially providing two things that I see:

  • An extension of scala object to allow for more than one instance of a singleton type -- the same thing that happens when you go past one instance in a java enum. This patches the gap that leads people to write Java enums within Scala projects, because they are so much easier to work with for this case.
  • A natural scala syntax for when we want fixed subtypes but possibly open-ended intance count that can be used for GADTs and greatly improves usability in some cases over the patterns used today.

Both cases can lead to improved performance via dispatching pattern matches with a switch over the ordinal, rather than cascaded instanceof checks.

In the first case, where everything is a singleton, this maps neatly to java enums as far as I can see.
Note, that in the case where a plain object -- not just an enum -- is a singleton, it could also be translated to a java enum in bytecode. This is interesting because enums are particularly useful in the JVM for this case:

  • It is not possible to use reflection or serialization attacks to modify them
  • The classloader does all the nasty initialization work, lazily as needed on first reference.

I have thought that top-level scala objects (plain singletons) could under the covers all be encoded on the JVM as enums, avoiding all sorts of messiness in the process. It won't work for non singleton cases, like objects inside of instances.

So as this approaches the time where the bytecode encoding to enums is considered, consider it for simple top level objects to.

enum ThereCanBe {
  case OnlyOne
}

de-sugars to roghly the following if I am reading things right:

sealed abstract class ThereCanBe
object ThereCanBe {
  val OnlyOne = new ThereCanBe {}
}

Which can be encoded as a Java enum:

enum ThereCanBe {
  OnlyOne;
}

which isn't that different than an ordinary object singleton

object ThereCanBe {
  val stuff = "stuff"
}

Which encodes as:

enum ThereCanBe {
  $Instance;
  private final String stuff = "stuff";
  public String stuff() {
    return stuff;
  }
}

After all, an enum with only one valid value is a singleton, and so is a top level object.

@odersky
Copy link
Contributor Author

odersky commented Apr 6, 2017

I have made one more tweak: enum utility methods are emitted only if there are some singleton cases and the enum class is not generic. This avoids generation of utility methods for types such as List and Option. The general rationale is that, if the enum class is generic, the utility methods would lose type precision. E.g. List.enumValues would return a Seq[List[_]], which is not that useful.

@odersky
Copy link
Contributor Author

odersky commented Apr 6, 2017

@scottcarey Interesting idea, to encode singletons as enumerations. Maybe we can use this for the Scala translation, but we'd need a lot of experimentation to find out whether it's beneficial. Note that top-level objects are already heavily optimized.

@retronym
Copy link
Member

retronym commented Apr 6, 2017

Java (language) enumerations may not have a custom superclass, so class C; object O extends C is not expressible.

Otherwise, the encoding is pretty similar to scala objects.

public enum Test {
    T1
}
public final class p1.Test extends java.lang.Enum<p1.Test> {
  public static final p1.Test T1;

  private static final p1.Test[] $VALUES;

  public static p1.Test[] values();
    Code:
       0: getstatic     #1                  // Field $VALUES:[Lp1/Test;
       3: invokevirtual #2                  // Method "[Lp1/Test;".clone:()Ljava/lang/Object;
       6: checkcast     #3                  // class "[Lp1/Test;"
       9: areturn

  public static p1.Test valueOf(java.lang.String);
    Code:
       0: ldc           #4                  // class p1/Test
       2: aload_0
       3: invokestatic  #5                  // Method java/lang/Enum.valueOf:(Ljava/lang/Class;Ljava/lang/String;)Ljava/lang/Enum;
       6: checkcast     #4                  // class p1/Test
       9: areturn

  private p1.Test();
    Code:
       0: aload_0
       1: aload_1
       2: iload_2
       3: invokespecial #6                  // Method java/lang/Enum."<init>":(Ljava/lang/String;I)V
       6: return

  static {};
    Code:
       0: new           #4                  // class p1/Test
       3: dup
       4: ldc           #7                  // String T1
       6: iconst_0
       7: invokespecial #8                  // Method "<init>":(Ljava/lang/String;I)V
      10: putstatic     #9                  // Field T1:Lp1/Test;
      13: iconst_1
      14: anewarray     #4                  // class p1/Test
      17: dup
      18: iconst_0
      19: getstatic     #9                  // Field T1:Lp1/Test;
      22: aastore
      23: putstatic     #1                  // Field $VALUES:[Lp1/Test;
      26: return
}

@ekrich
Copy link
Contributor

ekrich commented May 5, 2017

@odersky One really useful feature in Java is the EnumSet which allows grouping of your enums which I hope can be considered.
https://docs.oracle.com/javase/7/docs/api/java/util/EnumSet.html

@odersky
Copy link
Contributor Author

odersky commented May 18, 2017

There was a late change in the proposal. It now demands that all type parameters of cases are given explicitly, following @LPTK's proposal of nominal correspondence in that respect. It's more verbose but also makes it clearer what happens. Example: Previously, you wrote:

enum Option[+T] {
  case Some(x: T)
  case None
}

Now you have to write:

enum Option[+T] {
  case Some[+T](x: T)
  case None
}

@notxcain
Copy link

@odersky what are the benefits of this?

@DarkDimius
Copy link
Member

@notxcain, fixing oddities in scoping rules. Like those: #1970 (comment)

@godenji
Copy link

godenji commented May 21, 2017

Were the significant whitespace proposal to be accepted then it would appear that case could be omitted since case lines would share the same level of indentation:

enum Option[+T]
  Some[+T](x: T)
  None

IOW, the first level of indentation in an enum block would imply case Type. Would love to see the same for match (and pattern matching in general). Basically omit the case requirement from the language entirely.

odersky added a commit that referenced this issue May 22, 2017
Change enum scheme to correspond to new description in issue #1970
@scottcarey
Copy link

@retronym Also note that the java encoding of enums sets a special flag on the generated class, ACC_ENUM. This triggers a lot of special handling in the JVM. Constructors can't be called, even with reflection / unsafe. Serialization is automatic and can not be overridden.

From JLS 8.9

An enum type has no instances other than those defined by its enum constants. It is a compile-time error to attempt to explicitly instantiate an enum type (§15.9.1).

The final clone method in Enum ensures that enum constants can never be cloned, and the special treatment by the serialization mechanism ensures that duplicate instances are never created as a result of deserialization. Reflective instantiation of enum types is prohibited. Together, these four things ensure that no instances of an enum type exist beyond those defined by the enum constants.

However, although there are no standard ways to break the guarantee above, and the well known Unsafe tricks won't work, there are some holes in Oracle's JVM via internal com.sun reflection packages:

http://jqno.nl/post/2015/02/28/hacking-java-enums/

tl;dr -- the bytecode emitted can potentially leverage ACC_ENUM in certain cases to gain a tighter guarantee that singleton enums are actually singletons (per classloader, of course).

@ctongfei
Copy link

This is very exciting! I wonder if we could allow the following syntax (proposed here as implicit case) to make writing Shapeless-style typelevel functions easier:

/** Typelevel function to compute the index of the first occurrence of type [[X]] in [[L]]. */
enum IndexOf[L <: HList, X] extends DepFn0 {

  type Out <: Nat
  type Aux[L <: HList, X, N <: Nat] = IndexOf[L, X] { type Out = N }
  
  implicit case IndexOf0[T <: HList, X] extends Aux[X :: T, X, _0] {
    type Out = _0
    def apply() = Nat._0
  }

  implicit case IndexOfN[H, T <: HList, X, I <: Nat](implicit i: Aux[T, X, I]) 
    extends Aux[H :: T, X, Succ[I]] {
      type Out = Succ[I]
      def apply() = Succ[I]
    }
}

@megri
Copy link
Contributor

megri commented Jul 13, 2017

I've scanned around this topic looking for a clear rationale on why the suggestion is to have three separate constructs (enum, enum class and [enum] object). What are the benefits of this as compared to something like described here

edit: This would also avoid the "scope crossing" type parameters mentioned here

@smarter
Copy link
Member

smarter commented Jan 11, 2018

enum was merged a while ago so closing this issue, http://contributors.scala-lang.org/ is a better place to have follow-up discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests