New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SI-5918 fixes the ConstantType ugliness #1378
Conversation
review @paulp @gkossakowski |
also review by @dragos, the original author of ConstantType caching |
Wow, that's epic pull request description. :-) Thanks for explaining all of this. Highly illuminating. |
// - sym: the symbol of the actual enumeration value (VAL1) | ||
// - .owner: the ModuleClassSymbol of the enumeration (object E) | ||
// - .linkedClassOfClass: the ClassSymbol of the enumeration (class E) | ||
sym.owner.linkedClassOfClass.tpe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have some assertion that checks if passed symbol corresponds to enumeration value?
It doesn't look like this method will be too hot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only check I can imagine is the one that verifies that one of the base classes of linkedClassOfClass is, in fact, a JavaLangEnumClass. But as https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/consoleText shows, this might lead to cyclic reference errors.
Therefore I suggest we don't add any assertions.
Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/567/ |
Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1275/ |
jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/567/ |
jenkins job pr-scala-testsuite-linux-opt: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1275/ |
PLS REBUILD ALL |
Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/574/ |
Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/ |
jenkins job pr-scala-testsuite-linux-opt: Failed - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/ |
jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/574/ |
Java enum values are represented with constants wrapping corresponding Symbols. To find out the underlying type of such a constant one needs to calculate sym.owner.linkedClassOfClass.tpe (where sym represents the wrapped symbol). To quote the source code, given (in java): class A { enum E { VAL1 } } - sym: the symbol of the actual enumeration value (VAL1) - .owner: the ModuleClassSymbol of the enumeration (object E) - .linkedClassOfClass: the ClassSymbol of the enumeration (class E) Back then, as far as I can guess, linkedClassOfClass was flaky and didn't work well late in the compilation pipeline. Therefore a fix to SI-1329 introduced a caching facility. Once a ConstantType representing the type of Constant(sym) was created (I guess, during typer, when linkedClassOfClass was still working), it cached the underlying type and used it in subsequent phases. *** Unfortunately this solution, being fine for enum values, broke another flavor of constants - type wrapping constants that represent classOf (for example, Constant(IntTpe) represents the classOf[Int] constant). Type-wrapping constants are special, because their type (e.g. Class[Int] in the example from the previous paragraph) changes as the compilation progresses. Before erasure it's Class[something], and after erasure it's just Class. Therefore caching types of such constants might lead to incorrect types flying around after erasure, as described in this scala-internals thread: http://groups.google.com/group/scala-internals/browse_thread/thread/45185b341aeb6a30. *** Now when the problem is clear, the question is why didn't it happen before? That's all because of another peculiarity of the compiler. Before erasure package references (e.g. in TypeRef prefixes) are represented as ThisType(sym), where sym stands for a package class symbol. After erasure such references are represented differently, e.g. java.lang package looks like TypeRef(TypeRef(TypeRef(NoPrefix, root, Nil), java, Nil), java.lang, Nil). As described in the aforementioned thread, the incorrect caching strategy employed in UniqueConstantType mixed with other caching mechanisms in compiler effectively established a non-clearable cache that goes from Type instances to types that represent their classOfs, e.g. from String to Class[String]. So if anyone tried to typecheck a classOf after erasure, he/she would get Class[String] instead of the correct Class, and compiler would crash. Right? Nope. Before erasure String is TypeRef(ThisType(java.lang), StringSymbol, Nil), and after erasure it's TypeRef(TypeRef(...), StringSymbol, Nil), as explained above. Therefore the foul cache would contain two String types: one pre-erasure going to a pre-erasure Class[String], and another one post-erasure going to a post-erasure Class. *** This shaky balance was broken when I tried to implement class tag generation with shiny Type.erasure method that Martin just exposed in the reflection API. The erasure method partially invoked the Erasure phase, and for a String it returned its post-erasure representation (with java.lang prefix represented as TypeRef, not as ThisType). And after that I used the result of erasure to build a classOf for a class tag. Since I did it in a macro, it was typer, a pre-erasure phase. Now you understand why things broke. That classOf created a Constant wrapping a post-erasure representation of String, which cached the incorrect non-erased Class[String] type for a post-erasure type, and things exploded. You can imagine my panic! The ScalaDays deadline was near, I still had to do finishing touches to implicit macros (which I actually never had time to do), and such a fundamental thing exploded. Actually I figured out the hashing problem, but in the limited time I had I failed to understand why exactly it's happening, so I introduced the dirty workaround praised in SI-5918 and moved on. *** The story doesn't end here. Some time has passed, and I learned a lot about the compiler. I independently discovered the ThisType -> TypeRef transform that erasure applies to package references and patched Type.erasure to undo it. After all, Type.erasure is a user-facing API, and users don't need to know about post-typer implementation details. You can read more about this here: http://groups.google.com/group/scala-internals/browse_thread/thread/6d3277ae21b6d581 From what we've learned above, we can see that this Type.erasure fix made the UniqueConstantType workaround unnecessary. But I didn't know that. So imagine my surprise when I tried to remove that workaround and ran the tests only to see that nothing fails. I went back in time to April when the problem first manifested, extracted a minimized crasher and tried to use it on trunk. Again, nothing crashed. And only with the help of showRaw, I finally understood that types printed as "String" can be wildly different. The rest was a piece of cake. *** The irony is that the original reason for ConstantType caching is no longer valid. linkedClassOfClass now works fine (and files/jvm/outerEnum.scala agrees with me), so we can remove the cache altogether. So why all this story about erasure and package references? Well, I don't know. I enjoyed uncovering this mystery, so I wanted to share it with you :)
PLS REBUILD ALL |
Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/577/ |
Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1285/ |
jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/577/ |
Guys can I have an LGTM please? This pull request is a dependency for #1385, so it'd be great to get it in asap. |
s: $r.intp.global.Type = String | ||
|
||
scala> { println(afterPhase(currentRun.erasurePhase)(ConstantType(Constant(s)))) } | ||
Class[String](classOf[java.lang.String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to utilize "printAfterEachPhase" for this sort of thing in the future. It makes the checkfile both testy and educational.
Looks good to me, go ahead. |
SI-5918 fixes the ConstantType ugliness
jenkins job pr-scala-testsuite-linux-opt: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1285/ |
Java enum values are represented with constants wrapping corresponding Symbols.
To find out the underlying type of such a constant one needs to calculate
sym.owner.linkedClassOfClass.tpe (where sym represents the wrapped symbol).
To quote the source code, given (in java): class A { enum E { VAL1 } }
Back then, as far as I can guess, linkedClassOfClass was flaky and didn't
work well late in the compilation pipeline.
Therefore a fix to SI-1329 introduced a caching facility. Once a ConstantType
representing the type of Constant(sym) was created (I guess, during typer, when
linkedClassOfClass was still working), it cached the underlying type and used
it in subsequent phases.
Unfortunately this solution, being fine for enum values, broke another flavor
of constants - type wrapping constants that represent classOf (for example,
Constant(IntTpe) represents the classOf[Int] constant).
Type-wrapping constants are special, because their type (e.g. Class[Int] in the
example from the previous paragraph) changes as the compilation progresses.
Before erasure it's Class[something], and after erasure it's just Class.
Therefore caching types of such constants might lead to incorrect types
flying around after erasure, as described in this scala-internals thread:
http://groups.google.com/group/scala-internals/browse_thread/thread/45185b341aeb6a30.
Now when the problem is clear, the question is why didn't it happen before?
That's all because of another peculiarity of the compiler.
Before erasure package references (e.g. in TypeRef prefixes) are represented
as ThisType(sym), where sym stands for a package class symbol. After erasure
such references are represented differently, e.g. java.lang package looks like
TypeRef(TypeRef(TypeRef(NoPrefix, root, Nil), java, Nil), java.lang, Nil).
As described in the aforementioned thread, the incorrect caching strategy
employed in UniqueConstantType mixed with other caching mechanisms in compiler
effectively established a non-clearable cache that goes from Type instances to
types that represent their classOfs, e.g. from String to Class[String].
So if anyone tried to typecheck a classOf after erasure, he/she would get
Class[String] instead of the correct Class, and compiler would crash. Right?
Nope. Before erasure String is TypeRef(ThisType(java.lang), StringSymbol, Nil),
and after erasure it's TypeRef(TypeRef(...), StringSymbol, Nil), as explained
above. Therefore the foul cache would contain two String types: one pre-erasure
going to a pre-erasure Class[String], and another one post-erasure going to
a post-erasure Class.
This shaky balance was broken when I tried to implement class tag generation
with shiny Type.erasure method that Martin just exposed in the reflection API.
The erasure method partially invoked the Erasure phase, and for a String
it returned its post-erasure representation (with java.lang prefix represented
as TypeRef, not as ThisType). And after that I used the result of erasure
to build a classOf for a class tag. Since I did it in a macro, it was
typer, a pre-erasure phase.
Now you understand why things broke. That classOf created a Constant
wrapping a post-erasure representation of String, which cached the incorrect
non-erased Class[String] type for a post-erasure type, and things exploded.
You can imagine my panic! The ScalaDays deadline was near, I still had to do
finishing touches to implicit macros (which I actually never had time to do),
and such a fundamental thing exploded.
Actually I figured out the hashing problem, but in the limited time I had
I failed to understand why exactly it's happening, so I introduced the dirty
workaround praised in SI-5918 and moved on.
The story doesn't end here.
Some time has passed, and I learned a lot about the compiler. I independently
discovered the ThisType -> TypeRef transform that erasure applies to package
references and patched Type.erasure to undo it. After all, Type.erasure is a
user-facing API, and users don't need to know about post-typer implementation
details. You can read more about this here:
http://groups.google.com/group/scala-internals/browse_thread/thread/6d3277ae21b6d581
From what we've learned above, we can see that this Type.erasure fix made
the UniqueConstantType workaround unnecessary. But I didn't know that.
So imagine my surprise when I tried to remove that workaround and ran the tests
only to see that nothing fails. I went back in time to April when the problem
first manifested, extracted a minimized crasher and tried to use it on trunk.
Again, nothing crashed.
And only with the help of showRaw, I finally understood that types printed as
"String" can be wildly different. The rest was a piece of cake.
The irony is that the original reason for ConstantType caching is no longer
valid. linkedClassOfClass now works fine (and files/jvm/outerEnum.scala
agrees with me), so we can remove the cache altogether.
So why all this story about erasure and package references? Well, I don't know.
I enjoyed uncovering this mystery, so I wanted to share it with you :)