Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate the zinc compiler bridge into this codebase [ci: last-only] #8531

Closed
wants to merge 452 commits into from

Conversation

lrytz
Copy link
Member

@lrytz lrytz commented Nov 7, 2019

Zinc is the library that sbt and other build tools use to handle incremental compilation of Scala projects.

Zinc does more than just new scala.compiler.Main().run(args):

  • it changes the compiler's reporter to route messages through zinc's reporter
  • it adds compiler phases to extract information about the code being compiled (API structure, dependencies between source files), which is used to decide what needs to be re-compiled when a source changes

The Scala compiler does not have a stable binary interface to allow these extensions - in fact, Scala compilers are generally binary incompatible, even minor versions.

To make zinc work with a wide range of Scala compilers (2.10 - 2.13), it compiles a set of source files (the "compiler bridge", mistakingly called "compiler-interface" in the message below) against every Scala version:

[info] 'compiler-interface' not yet compiled for Scala 2.13.1. Compiling...

This requires the Scala compiler to remain source compatible with zinc's compiler bridge. This solution works, but is brittle because the accessed interface of the compiler is not specified:

  • it's difficult to find out if zinc still works when refactoring the compiler
  • shims are needed to maintain compatibility

The compiler bridge's source code in zinc is also difficult to maintain because it needs to be source compatible with a large range of Scala compilers.

The solution (also implemented in dotty) is to integrate the compiler bridge into the Scala compiler, which means every Scala release ships with the binary compiler bridge.

Compiler Interface and Compiler Bridge

The compiler interface is a Java API that defines how zinc calls a Scala compiler, and how the compiler talks back to zinc. This interface is not new, it exists in the zinc repository. For example, AnalysisCallback defines the methods that the compiler invokes to pass the API and dependency information back to zinc.

The compiler bridge is the code that zinc compiles on the fly to interact with a Scala compiler, it uses the compiler interface as a dependency.

In order to move the compiler bridge into the compiler, the Sscala compiler needs a dependency on the compiler interface. Currently, the compiler-interface.jar is released together with zinc.

Long-term, the plan is to migrate the compiler-interface source code to a separate repository (a first version is in https://github.com/sbt/compiler-interface) and publish releases from there. This would allow evolving the compiler interface independently from zinc.

This PR pulls in the compiler bridge source code and splits it up in two parts:

  1. Code that extends the compiler (compiler phases, Global subclasses, Reporter subclasses) is added to src/compiler, in scala.tools.nsc.interactive
  2. Code that invokes the compiler (CompilerInterface, ConsoleInterface, ScaladocInterface) is added to a new project src/compiler-bridge

We could put everything into src/compiler-bridge, I would be fine with this solution. On the upside, it would avoid adding a dependency (on the compiler-interface) to scala-compiler.

We cannot put the CompilerInterface / ConsoleInterface / ScaladocInterface into src/compiler for two reasons

  • it depends on more than just the compiler, namely the repl and scaladoc, which have their own sbt projects
  • it defines classes in a different package than scala.tools.nsc (currently in xsbt, but we should change this, discussion below)

As of this PR, a separate scala-compiler-bridge artifact is published with the release. We could merge the classfiles into scala-compiler.jar, like we do for the repl, scaladoc, etc. An advantage of keeping it separate is that one could back-release the bridge for old compilers, or release a new bridge version for some existing Scala release.

Testing

Tested with sbt setupPublishCoreNonOpt publishLocal, then sbt -Dstarr.version=2.13.2-bin-5f36246-SNAPSHOT dist/mkQuick library/doc. The binary bridge is being used (find ~/.sbt -name '*5f36246*' is empty).

Credits

None of the creative work here is my doing:

  • @eed3si9n crated a branch pulling in the compiler bridge from zinc in spring 2019
    • I re-did the branch to catch more of the zinc history (directory renames), also because his branch includes the history of dotty's compiler bridge, which is not needed.
    • The commit messages in 45d9e79 and d864e3e show the commands that were used to pull in the zinc history
  • @smarter figured out the necessary classloader hack (Add support for incremental compilation smarter/dotty-bridge#1) which we pulled in from dotty
  • @adriaanm's spring 2016 branch which integrates the sbt compiler phases into the copiler was the basis for dotty's compiler bridge implementation

Follow-ups

There's probably more, but here's a shortlist:

  • @adriaanm's branch combines zinc's API and dependency extraction phases into a single traversal for efficiency, we should do that too
  • The entry points that zinc invokes to run the compiler are xsbt.CompilerInterface, xsbt.ConsoleInterface and xsbt.ScaladocInterface. I think zinc itself also defines classes in the xsbt package, which would lead to a split package. We should probably define a new package name for the bridge.
  • Zinc's DualLoader causes issues that have to be worked around. This should be fixed in zinc instead.

Duhemm and others added 30 commits January 17, 2016 22:02
…hods-before-and-after-erasure-in-extractapi

FPORT: Consider signatures of methods before and after erasure in ExtractAPI
Rewritten from sbt/zinc@e7aee52
... not exitingPostErasure, as this phase-travel crashes the compile
(it's only really meant for going back in time, right?)

Rewritten from sbt/zinc@ad50437
For example, when the `--sourcepath` option is provided
and the refchecks phase compiles an annotation found
on a referenced symbol from the sourcepath.

`compileLate` assumes that all non-sentinel compiler
phases can be down cast to `GlobalPhase`.

This commit changes the two phases in SBT to extend
this instead of `Phase`. This has the knock on benefit
of simplifying the phases by letting the `GlobalPhase.run`
iterator over the list of compilation units and feed them
to us one by one.

I checked that the test case failed before making each
change.

Rewritten from sbt/zinc@2f9a27f
Call `initialize` in case symbol's `info` hadn't been completed
during normal compilation.

Also, normalize to the class symbol immediately.

Add a TODO regarding only looking at class symbols,
and thus ignoring the term symbol for objects,
as the corresponding class symbol has all the relevant info.

Rewritten from sbt/zinc@f10bcb6
The only aspect of the self variable that's relevant for
incremental compilation is its explicitly declared type,
and only when it's different from the type of the class that declares it.

Technically, any self type that's a super type of the class could be ignored,
as it cannot affect external use (instantiation/subclassing) of the class.

Rewritten from sbt/zinc@cc0e66e
Motivated because we want to make it more robust & configurable.
Original motivation was to diagnose a cyclic type representation,
likely due to an f-bounded existential type, as illustrated by the following:

```
class Dep {
  // The API representation for `bla`'s result type contains a cycle
  // (an existential's type variable's bound is the existential type itself)
  // This results in a stack overflow while showing the API diff.
  // Note that the actual result type in the compiler is not cyclic
  // (the f-bounded existential for Comparable is truncated)
  def bla(c: Boolean) = if (c) new Value else "bla"
}

class Value extends java.lang.Comparable[Value] { def compareTo(that: Value): Int = 1 }
```

Limit nesting (`-Dsbt.inc.apidiff.depth=N`, where N defaults to `2`),
and number of declarations shown for a class/structural type
(via `sbt.inc.apidiff.decls`, which defaults to `0` -- no limit).

Limiting nesting is crucial in keeping the size of api diffs of large programs
within a reasonable amount of RAM...

For example, compiling the Scala library, the API diff with nesting at `4`
exhausts 4G of RAM...

Rewritten from sbt/zinc@f382cf1
For refinement types, the Structure was already restricted
to declarations (and not inherited members), but all base types
were still included for a refinement's parents, which would
create unwieldy, and even erroneous (cyclic) types by expanding
all constituents of an intersection type to add all base types.

Since the logic already disregarded inherited members, it seems
logical to only include direct parents, and not all ancestor types.

```
class Dep {
  def bla(c: Boolean) = if (c) new Value else "bla"
}

class Value extends java.lang.Comparable[Value] { def compareTo(that: Value): Int = 1 }
```

Rewritten from sbt/zinc@f739386
Specialize two implementations for each value of the `inherit` boolean argument.
Also use a more direct way of distinguishing declared and inherited members.

backwards compat for source-dependencies/inherited-dependencies

Rewritten from sbt/zinc@3a2c558
Also a bit more complete: handle SelectFromTypeTree,
consider the self type an inheritance dependency,
and flatten any refinement types in inherited types,
to get to the symbols of their parents, instead of the
useless symbol of the refinement class.

Include inheritance dependencies in regular ones

Also, update test to reflect the self type is now seen as an inheritance dependency.

self types are local, so don't treat them like inherited types

note inheritanceSymbols dealiases, where allSymbols is constructed differently

fix NPE in source-dependencies/macro-annotation

Rewritten from sbt/zinc@8b14801
Since pickled annotated types and symbols only mention
static annotations, whereas compilation from source
sees all annotations, we must explicitly filter annotations
in the API representation using the same criteria as the pickler,
so that we generate the same API when compiling from source
as when we're loading classfiles.

Rewritten from sbt/zinc@0f77be9
This is a fixup of 0f61629.
That commit assumed that dealiasing is being done for types referred in
self type. It was changed to not do that but the test wasn't updated.
Unfortunately, that mistake slipped by during PR review because unit tests
of compileInterface were not ran (see scala#2358).

Rewritten from sbt/zinc@3f0201a
Before this commit, we did not do the invalidation for methods with
multiple parameter list, the comment above `hasValueClassAsReturnType`
said:

  Note: We only inspect the "outermost type" (i.e. no recursion) because
  we don't need to inspect after erasure a function that would, for
  instance, return a function that returns a subtype of AnyVal.

But this is wrong: a method with signature:
  def foo(a: A)(b: B): C
is erased to:
  def foo(a: A, b: B): C
and not, as the comment in the code suggest, to:
  def foo(a: A): B => C
so we do need to inspect the final result type of methods, because they
can be value classes that will be erased to their underlying value.

Rewritten from sbt/zinc@c3cd903
If a method's type contains a non-primitive value class then it has two
signatures: one before erasure and one after erasure. Before this
commit, we checked if this was the case using `isAnyValSubtype`, but
this is too crude since primitive value classes are also subtypes of
`AnyVal` but do not change signature after erasure.

This commit replaces `isAnyValSubtype` by `isDerivedValueClass` which
excludes primitive value classes.

In practice, for an empty class, this reduces the size of the output of
`DefaultShowAPI` from 65 lines to 25 lines.
Before:
https://gist.github.com/smarter/cf1d6fe58efda88d6ee6#file-old-api
After:
https://gist.github.com/smarter/cf1d6fe58efda88d6ee6#file-new-api

Rewritten from sbt/zinc@5300c77
FPORT: Avoid CCE when scalac internally uses compileLate
Rewritten from sbt/zinc@768a3a8
FPORT: Include ONLY nonpublic vars in API hash of traits
Rewritten from sbt/zinc@d23fb27
Introduce a new relation declaredClasses in Relations. It tracks names
of classes declared in given source file. Names of classes are fully
qualified names as seen at pickler phase (i.e. before flattening).
Objects are represented as object name with dollar sign appended to it.

The `declaredClasses` relation will be needed to map invalidated classes
back to source files where they are defined. It's worth mentioning that
there's a relation called `classes` defined already. However, that relation
tracks names of classes as seen by the Scala compiler backend. These are
names that will later appear in bytecode. For our purposes, we want to
have names of classes as they are declared in source code.

Declared classes are tracked properly for both Scala and Java files.

While updating ScalaCompilerForUnitTesting I flipped the nameHashing flag
to be true by default. That's in sync with sbt's default value for that
flag now.

Rewritten from sbt/zinc@64f700c
Everything compiles but tests are failing. This commit should be split into two parts:
  - refactoring of Relations, TextAnalysisFormat, etc. that introduces support for relations where first element in their
     pair is not a file. At the moment, there're many places where code assumes that all relations are Relation[File, T].
   - the rest of the code

Rewritten from sbt/zinc@a1b5449
@scala-jenkins scala-jenkins added this to the 2.13.2 milestone Nov 7, 2019
We originally used 1.3.0-beta1, which is a preview version released from
the https://github.com/sbt/compiler-interface repository. However, we
can as well use the official 1.3.1 that was released with zinc 1.3.1
until the new compiler-interface repos is fleshed out and its artifacts
are supported by zinc.
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Nov 11, 2019
Use more composition and less inheritance. Remove a few unused details.
The ZincAnalyzer phase iterates through all class symbols that the
backend generates classes for, and calls callback.generatedLocalClass
for local classes.

Instead, add a hook in global when a class is generated, override
it in ZincGlobal, and invoke the callback.
@SethTisue
Copy link
Member

is this a blocker for 2.13.2?

@eed3si9n
Copy link
Member

is this a blocker for 2.13.2?

I don't think it is, since there's more stuff cooking for Zinc.

@SethTisue SethTisue modified the milestones: 2.13.2, 2.13.3 Feb 11, 2020
@SethTisue SethTisue modified the milestones: 2.13.3, 2.13.4 May 12, 2020
@dwijnand
Copy link
Member

dwijnand commented Oct 5, 2020

@lrytz happy to close this one, right?

@lrytz lrytz closed this Oct 6, 2020
@SethTisue SethTisue removed this from the 2.13.4 milestone Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet