Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend Refactoring #6012

Merged
merged 12 commits into from
Sep 1, 2017
Merged

Backend Refactoring #6012

merged 12 commits into from
Sep 1, 2017

Conversation

lrytz
Copy link
Member

@lrytz lrytz commented Jul 28, 2017

Isolate backend components to run parallel from the compiler frontend

Refactor the compiler backend to isolate components that are planned to
run in parallel (local optimizations, classfile writing) from the
compiler frontend (i.e., no access to the Global instance).

The long inheritance stack (BCodeIdiomatic - BCodeHelpers - ... -
BCodeSyncAndTry) is now isolated and simply a component of the new
CodeGen class. CodeGen itself is one component of GenBCode, the
backend class.

GenBCode has some other components
- BTypes, used in CodeGen and PostProcessor. Many of the backend
components that used to be in BTypes (inliner, inlinerHeuristics,
localOpt) now moved to the PostProcessor.

  • PostProcessor implements global optimization (inlining), local
    optimizations and classfile writing. It has itself quite a number
    of components (inliner, callGraph, classfileWriter, ...)
  • PostProcessorFrontendAccess is an interface of functionality
    required in the PostProcessor that is implemented by accessing
    the frontend (Global). It synchronizes on the single frontendLock
    object.

The PostProcessor doesn't have a Global. It can access certain operations that depend on the frontend through PostProcessorFrontendAccess, which synchronizes on the frontendLock. Also the types in CoreBTypes are computed lazily by looking at symbols, so access is synchronized on the same lock.

The BTypes component is passed into CodeGen and the PostProcessor. Note that GodeGen has a global, so it can (and does) access various parts of the PostProcessor through global.genBCode.postProcessor, for example it updates genBCode.postProcessor.callGraph.inlineAnnotatedCallsites during code gen.

Quite a few cleanups all around.

@lrytz lrytz added the WIP label Jul 28, 2017
@scala-jenkins scala-jenkins added this to the 2.12.4 milestone Jul 28, 2017
Removes the three work queues in the backend.

Splits up the backend in two main components
  - CodeGen, which has a Global
  - PostProcessor, which has a BTypes (but no Global)

CodeGen generates asm.ClassNodes and stores them in postProcessor.generatedClasses.
The code generator is invoketd through BCodePhase.apply.

The postProcessor then runs the optimizer, computes the InnerClass table and
adds the lambdaDeserialize method if necessary. It finally serializes the
classes into a byte array and writes them to disk.

The implementation of classfile writing still depends on Global. It is passed
in as an argument to the postProcessor. A later commit will move it to a context
without Global and make it thread-safe.
@lrytz lrytz mentioned this pull request Aug 7, 2017
7 tasks
@lrytz lrytz force-pushed the backendRefactor branch 4 times, most recently from 36a5cbd to cd861fd Compare August 10, 2017 20:53
BTypes is the component that's shared between CodeGen and PostProcessor.
Remove implicit conversion from LazyVar[T] to T
@lrytz lrytz force-pushed the backendRefactor branch 2 times, most recently from d468f87 to 9dd1933 Compare August 11, 2017 12:25
@lrytz lrytz changed the title [WIP] Backend Refactoring Backend Refactoring Aug 11, 2017
@lrytz lrytz removed the WIP label Aug 11, 2017
@lrytz
Copy link
Member Author

lrytz commented Aug 11, 2017

This is ready for review, I'm not planning to add more right now. There's more that could be done, but that doesn't interfere with @mkeskells's plans started in #5815.

Review by @retronym. Start looking at GenBCode and follow components from there. The structure should be much more easy to understand now.

The PostProcessor doesn't have a global. It can access certain operations that depend on the frontend through PostProcessorFrontendAccess, which synchronizes on the frontendLock. Also the types in CoreBTypes are computed lazily by looking at symbols, so access is synchronized on the same lock.

The BTypes component is passed into CodeGen and the PostProcessor. Note that GodeGen has a global, so it can (and does) access various parts of the PostProcessor through global.genBCode.postProcessor, for example it updates genBCode.postProcessor.callGraph.inlineAnnotatedCallsites during code gen.

I started a benchmark for the last commit, should be coming in here: https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=6ac6da8b61%7C58cfe9e76c&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=1500736130730&to=1502458163200

Some notes for the TODOs on making local optimizations / classfile writing work in parallel:

  • maybe use Java's concurrent data structures (instead of scala.collection.concurrent.TrieMap)
  • ByteCodeRepository: probably needs synchronization to make sure a class is not parsed twice
  • BTypesFromClassfile.classBTypeFromParsedClassfile/classBTypeFromClassNode: probalby need to make sure that the same CalssBType is not created twice concurrently

Data structures to make concurrent

  • BackendUtils.maxLocalsMaxStackComputed/indyLambdaImplMethods
  • ByteCodeRepository.javaDefinedClasses
  • LocalOpt.unreachableCodeEliminated
  • CallGraph.inlineAnnotatedCallsites/noInlineAnnotatedCallsites

@lrytz lrytz requested a review from retronym August 11, 2017 13:43
@lrytz
Copy link
Member Author

lrytz commented Aug 11, 2017

Performance is flat, build is green :)

@lrytz
Copy link
Member Author

lrytz commented Aug 11, 2017

The jardiff is non-empty (https://gist.github.com/lrytz/2079b2f9f933da06a3a55beb48c7b5fa): classes with a $deserializeLambda$ method now get an InnerClass entry for MethodHandles$Lookup. This is an accidental fix, I'll add a test for it and will figure out what fixed it.

By the way, @mkeskells, if you don't want to wait until this PR is merged you can also start working on top of my branch (https://github.com/lrytz/scala/tree/backendRefactor).

import bTypes._
import callGraph.ClosureInstantiation
import coreBTypes._
import frontendAccess.compilerSettings

// unused objects created by these constructors are eliminated by pushPop
private[this] lazy val sideEffectFreeConstructors: LazyVar[Set[(String, String)]] = perRunLazy {
private[this] lazy val sideEffectFreeConstructors: LazyVar[Set[(String, String)]] = perRunLazy(this) {
Copy link
Member

@retronym retronym Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea for this to be both a lazy val and a LazyVar? The implementation of PerRunInit is not currently threasafe (no sync around mutation/access of inits), so I guess the invariant is that we expect that sideEffectFreeConstructors will have been called prior to initialize and any multi-threading.

Copy link
Member

@retronym retronym Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the blend of locks based on lazy vals and frontEndLock needs to be considered carefully for potential deadlocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably too many implicit invariants, but here they are:

  • initialize is called in each compiler run at the beginning when the GenBCode phase starts running. There's no multithreading at this point.
  • I kept the values here as lazy vals so that only those actually accessed are ever added to the PerRunInit.this.inits buffer (and therefore re-initializes in the next run).
  • I think there can't be a deadlock here. Initializing a lazy val here will create the LazyVar and add it to the this.inits buffer, and this happens synchronized on this.
  • Completing a LazyVar synchronizes on the frontendLock.

trait PerRunInit {
private val inits = ListBuffer.empty[() => Unit]

def perRunInit(init: => Unit): Unit = inits += (() => init)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an error to call this after initialize has been called? Can we add an assertion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that it's OK to call this after initialize, like the lazy vals in CoreBTypes.


def perRunInit(init: => Unit): Unit = inits += (() => init)

def initialize(): Unit = inits.foreach(_.apply())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we discard the closures in init after forcing them, perhaps to allow some references to be GC-ed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep them around to re-initialize the LazyVars in the beginning of the next run.

@retronym
Copy link
Member

retronym commented Aug 14, 2017

(More) data structures to make concurrent:

  • PostProcessor.generatedClasses
  • originalClosureInit.inlinedClones += newClosureInit (not sure if naturally protected or not by the structure of closureInstantiations
  • InlineSuccess.downstreamLog (again, might not need it if we can reason that that parallel threads naturally work on different instances)

plain: SubItem3,
bean: SubItem3,
outFolder: scala.tools.nsc.io.AbstractFile) {
postProcessor.postProcessAndSendToDisk()
Copy link
Member

@retronym retronym Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture of generating all the ClassNodes in memory is well suited to compiling with global optimization enabled, but for non-optimized compile runs imposes an unnecessary memory burden. In that mode, I'd like to see code gen and classfile writing happen within the loop over compilation units, and make sure that we release the ClassNode instances in between.

I've sketched out what I'd like in a commit in https://github.com/scala/scala/compare/2.12.x...retronym:review/6012?expand=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. We can also run local optimizations in this mode. Can we do that in a follow-up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, so long as we don't forget before 2.12.4.

@retronym
Copy link
Member

I'd prefer to have explicit types for the component wiring, as in https://github.com/scala/scala/compare/2.12.x...retronym:review/6012?expand=1

@retronym
Copy link
Member

Could you please update the PR description with the important parts of your comment above and the motivation for the change?

@lrytz
Copy link
Member Author

lrytz commented Aug 14, 2017

(More) data structures to make concurrent:

PostProcessor.generatedClasses

It's read-only after code gen, but maybe that's not safe for a ListBuffer? Probably better be safe.

originalClosureInit.inlinedClones += newClosureInit (not sure if naturally protected or not by the structure of closureInstantiations
InlineSuccess.downstreamLog (again, might not need it if we can reason that that parallel threads naturally work on different instances)

I think in a first iteration the inliner will only run single-threaded, so these two should be OK. Inlining in parallel requires synchronization.

@lrytz
Copy link
Member Author

lrytz commented Aug 14, 2017

Will take in your explicit type annotations and update the PR descriptions, probably also improve the commit messages.

@retronym
Copy link
Member

It's read-only after code gen, but maybe that's not safe for a ListBuffer? Probably better be safe.

Depending how we implement interspersed code gen and postprocessing we might need it. Probably best to encapsulate access to it rather than exposing the buffer and synchronize defensively.

@retronym
Copy link
Member

/synch

In 2.12.3, if `$deserializeLambda$` has the only reference to
`MethodHandles$Lookup`, the corresponding entry in the InnerClass
table is missing due to an ordering issue (the method is added only
after the inner classes are visited). This was fixed in the recent
refactoring.
@lrytz
Copy link
Member Author

lrytz commented Aug 29, 2017

classes with a $deserializeLambda$ method now get an InnerClass entry for MethodHandles$Lookup.

Added a test: 5f83d78

In 2.12.3, we do it in the wrong order (https://github.com/scala/scala/blob/v2.12.3/src/compiler/scala/tools/nsc/backend/jvm/GenBCode.scala#L268-L271)

@lrytz lrytz merged commit 76b2d64 into scala:2.12.x Sep 1, 2017
sjrd added a commit to scala/scala3 that referenced this pull request Mar 28, 2023
This PR ports JVM backend refactor from Scala 2 as part of the
#14912 thread.

It squashes changes based on the PRs: 

- scala/scala#6012
- scala/scala#6057

The last refactor introducing backend parallelism
scala/scala#6124 is left for later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants