Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-5918 fixes the ConstantType ugliness #1378

Merged
merged 1 commit into from Sep 23, 2012

Conversation

xeno-by
Copy link
Member

@xeno-by xeno-by commented Sep 22, 2012

Java enum values are represented with constants wrapping corresponding Symbols.
To find out the underlying type of such a constant one needs to calculate
sym.owner.linkedClassOfClass.tpe (where sym represents the wrapped symbol).

To quote the source code, given (in java): class A { enum E { VAL1 } }

  • sym: the symbol of the actual enumeration value (VAL1)
  • .owner: the ModuleClassSymbol of the enumeration (object E)
  • .linkedClassOfClass: the ClassSymbol of the enumeration (class E)

Back then, as far as I can guess, linkedClassOfClass was flaky and didn't
work well late in the compilation pipeline.

Therefore a fix to SI-1329 introduced a caching facility. Once a ConstantType
representing the type of Constant(sym) was created (I guess, during typer, when
linkedClassOfClass was still working), it cached the underlying type and used
it in subsequent phases.


Unfortunately this solution, being fine for enum values, broke another flavor
of constants - type wrapping constants that represent classOf (for example,
Constant(IntTpe) represents the classOf[Int] constant).

Type-wrapping constants are special, because their type (e.g. Class[Int] in the
example from the previous paragraph) changes as the compilation progresses.
Before erasure it's Class[something], and after erasure it's just Class.

Therefore caching types of such constants might lead to incorrect types
flying around after erasure, as described in this scala-internals thread:
http://groups.google.com/group/scala-internals/browse_thread/thread/45185b341aeb6a30.


Now when the problem is clear, the question is why didn't it happen before?
That's all because of another peculiarity of the compiler.

Before erasure package references (e.g. in TypeRef prefixes) are represented
as ThisType(sym), where sym stands for a package class symbol. After erasure
such references are represented differently, e.g. java.lang package looks like
TypeRef(TypeRef(TypeRef(NoPrefix, root, Nil), java, Nil), java.lang, Nil).

As described in the aforementioned thread, the incorrect caching strategy
employed in UniqueConstantType mixed with other caching mechanisms in compiler
effectively established a non-clearable cache that goes from Type instances to
types that represent their classOfs, e.g. from String to Class[String].

So if anyone tried to typecheck a classOf after erasure, he/she would get
Class[String] instead of the correct Class, and compiler would crash. Right?

Nope. Before erasure String is TypeRef(ThisType(java.lang), StringSymbol, Nil),
and after erasure it's TypeRef(TypeRef(...), StringSymbol, Nil), as explained
above. Therefore the foul cache would contain two String types: one pre-erasure
going to a pre-erasure Class[String], and another one post-erasure going to
a post-erasure Class.


This shaky balance was broken when I tried to implement class tag generation
with shiny Type.erasure method that Martin just exposed in the reflection API.

The erasure method partially invoked the Erasure phase, and for a String
it returned its post-erasure representation (with java.lang prefix represented
as TypeRef, not as ThisType). And after that I used the result of erasure
to build a classOf for a class tag. Since I did it in a macro, it was
typer, a pre-erasure phase.

Now you understand why things broke. That classOf created a Constant
wrapping a post-erasure representation of String, which cached the incorrect
non-erased Class[String] type for a post-erasure type, and things exploded.

You can imagine my panic! The ScalaDays deadline was near, I still had to do
finishing touches to implicit macros (which I actually never had time to do),
and such a fundamental thing exploded.

Actually I figured out the hashing problem, but in the limited time I had
I failed to understand why exactly it's happening, so I introduced the dirty
workaround praised in SI-5918 and moved on.


The story doesn't end here.

Some time has passed, and I learned a lot about the compiler. I independently
discovered the ThisType -> TypeRef transform that erasure applies to package
references and patched Type.erasure to undo it. After all, Type.erasure is a
user-facing API, and users don't need to know about post-typer implementation
details. You can read more about this here:
http://groups.google.com/group/scala-internals/browse_thread/thread/6d3277ae21b6d581

From what we've learned above, we can see that this Type.erasure fix made
the UniqueConstantType workaround unnecessary. But I didn't know that.

So imagine my surprise when I tried to remove that workaround and ran the tests
only to see that nothing fails. I went back in time to April when the problem
first manifested, extracted a minimized crasher and tried to use it on trunk.
Again, nothing crashed.

And only with the help of showRaw, I finally understood that types printed as
"String" can be wildly different. The rest was a piece of cake.


The irony is that the original reason for ConstantType caching is no longer
valid. linkedClassOfClass now works fine (and files/jvm/outerEnum.scala
agrees with me), so we can remove the cache altogether.

So why all this story about erasure and package references? Well, I don't know.
I enjoyed uncovering this mystery, so I wanted to share it with you :)

@xeno-by
Copy link
Member Author

xeno-by commented Sep 22, 2012

review @paulp @gkossakowski

@xeno-by
Copy link
Member Author

xeno-by commented Sep 22, 2012

also review by @dragos, the original author of ConstantType caching

@gkossakowski
Copy link
Member

Wow, that's epic pull request description. :-)
Tweeted it here: https://twitter.com/gkossakowski/status/249483507255095296

Thanks for explaining all of this. Highly illuminating.

// - sym: the symbol of the actual enumeration value (VAL1)
// - .owner: the ModuleClassSymbol of the enumeration (object E)
// - .linkedClassOfClass: the ClassSymbol of the enumeration (class E)
sym.owner.linkedClassOfClass.tpe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have some assertion that checks if passed symbol corresponds to enumeration value?

It doesn't look like this method will be too hot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only check I can imagine is the one that verifies that one of the base classes of linkedClassOfClass is, in fact, a JavaLangEnumClass. But as https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/consoleText shows, this might lead to cyclic reference errors.

Therefore I suggest we don't add any assertions.

@scala-jenkins
Copy link

Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/567/

@scala-jenkins
Copy link

Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1275/

@scala-jenkins
Copy link

jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/567/

@scala-jenkins
Copy link

jenkins job pr-scala-testsuite-linux-opt: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1275/

@xeno-by
Copy link
Member Author

xeno-by commented Sep 23, 2012

PLS REBUILD ALL

@scala-jenkins
Copy link

Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/574/

@scala-jenkins
Copy link

Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/

@scala-jenkins
Copy link

jenkins job pr-scala-testsuite-linux-opt: Failed - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1282/

@scala-jenkins
Copy link

jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/574/

Java enum values are represented with constants wrapping corresponding Symbols.
To find out the underlying type of such a constant one needs to calculate
sym.owner.linkedClassOfClass.tpe (where sym represents the wrapped symbol).

To quote the source code, given (in java): class A { enum E { VAL1 } }
- sym: the symbol of the actual enumeration value (VAL1)
- .owner: the ModuleClassSymbol of the enumeration (object E)
- .linkedClassOfClass: the ClassSymbol of the enumeration (class E)

Back then, as far as I can guess, linkedClassOfClass was flaky and didn't
work well late in the compilation pipeline.

Therefore a fix to SI-1329 introduced a caching facility. Once a ConstantType
representing the type of Constant(sym) was created (I guess, during typer, when
linkedClassOfClass was still working), it cached the underlying type and used
it in subsequent phases.

***

Unfortunately this solution, being fine for enum values, broke another flavor
of constants - type wrapping constants that represent classOf (for example,
Constant(IntTpe) represents the classOf[Int] constant).

Type-wrapping constants are special, because their type (e.g. Class[Int] in the
example from the previous paragraph) changes as the compilation progresses.
Before erasure it's Class[something], and after erasure it's just Class.

Therefore caching types of such constants might lead to incorrect types
flying around after erasure, as described in this scala-internals thread:
http://groups.google.com/group/scala-internals/browse_thread/thread/45185b341aeb6a30.

***

Now when the problem is clear, the question is why didn't it happen before?
That's all because of another peculiarity of the compiler.

Before erasure package references (e.g. in TypeRef prefixes) are represented
as ThisType(sym), where sym stands for a package class symbol. After erasure
such references are represented differently, e.g. java.lang package looks like
TypeRef(TypeRef(TypeRef(NoPrefix, root, Nil), java, Nil), java.lang, Nil).

As described in the aforementioned thread, the incorrect caching strategy
employed in UniqueConstantType mixed with other caching mechanisms in compiler
effectively established a non-clearable cache that goes from Type instances to
types that represent their classOfs, e.g. from String to Class[String].

So if anyone tried to typecheck a classOf after erasure, he/she would get
Class[String] instead of the correct Class, and compiler would crash. Right?

Nope. Before erasure String is TypeRef(ThisType(java.lang), StringSymbol, Nil),
and after erasure it's TypeRef(TypeRef(...), StringSymbol, Nil), as explained
above. Therefore the foul cache would contain two String types: one pre-erasure
going to a pre-erasure Class[String], and another one post-erasure going to
a post-erasure Class.

***

This shaky balance was broken when I tried to implement class tag generation
with shiny Type.erasure method that Martin just exposed in the reflection API.

The erasure method partially invoked the Erasure phase, and for a String
it returned its post-erasure representation (with java.lang prefix represented
as TypeRef, not as ThisType). And after that I used the result of erasure
to build a classOf for a class tag. Since I did it in a macro, it was
typer, a pre-erasure phase.

Now you understand why things broke. That classOf created a Constant
wrapping a post-erasure representation of String, which cached the incorrect
non-erased Class[String] type for a post-erasure type, and things exploded.

You can imagine my panic! The ScalaDays deadline was near, I still had to do
finishing touches to implicit macros (which I actually never had time to do),
and such a fundamental thing exploded.

Actually I figured out the hashing problem, but in the limited time I had
I failed to understand why exactly it's happening, so I introduced the dirty
workaround praised in SI-5918 and moved on.

***

The story doesn't end here.

Some time has passed, and I learned a lot about the compiler. I independently
discovered the ThisType -> TypeRef transform that erasure applies to package
references and patched Type.erasure to undo it. After all, Type.erasure is a
user-facing API, and users don't need to know about post-typer implementation
details. You can read more about this here:
http://groups.google.com/group/scala-internals/browse_thread/thread/6d3277ae21b6d581

From what we've learned above, we can see that this Type.erasure fix made
the UniqueConstantType workaround unnecessary. But I didn't know that.

So imagine my surprise when I tried to remove that workaround and ran the tests
only to see that nothing fails. I went back in time to April when the problem
first manifested, extracted a minimized crasher and tried to use it on trunk.
Again, nothing crashed.

And only with the help of showRaw, I finally understood that types printed as
"String" can be wildly different. The rest was a piece of cake.

***

The irony is that the original reason for ConstantType caching is no longer
valid. linkedClassOfClass now works fine (and files/jvm/outerEnum.scala
agrees with me), so we can remove the cache altogether.

So why all this story about erasure and package references? Well, I don't know.
I enjoyed uncovering this mystery, so I wanted to share it with you :)
@xeno-by
Copy link
Member Author

xeno-by commented Sep 23, 2012

PLS REBUILD ALL

@scala-jenkins
Copy link

Started jenkins job pr-rangepos at https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/577/

@scala-jenkins
Copy link

Started jenkins job pr-scala-testsuite-linux-opt at https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1285/

@scala-jenkins
Copy link

jenkins job pr-rangepos: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-rangepos/577/

@xeno-by
Copy link
Member Author

xeno-by commented Sep 23, 2012

Guys can I have an LGTM please? This pull request is a dependency for #1385, so it'd be great to get it in asap.

s: $r.intp.global.Type = String

scala> { println(afterPhase(currentRun.erasurePhase)(ConstantType(Constant(s)))) }
Class[String](classOf[java.lang.String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to utilize "printAfterEachPhase" for this sort of thing in the future. It makes the checkfile both testy and educational.

@paulp
Copy link
Contributor

paulp commented Sep 23, 2012

Looks good to me, go ahead.

xeno-by added a commit that referenced this pull request Sep 23, 2012
SI-5918 fixes the ConstantType ugliness
@xeno-by xeno-by merged commit 6ba05fd into scala:2.10.x Sep 23, 2012
@scala-jenkins
Copy link

jenkins job pr-scala-testsuite-linux-opt: Success - https://scala-webapps.epfl.ch/jenkins/job/pr-scala-testsuite-linux-opt/1285/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants