Backend Refactoring #6012

lrytz · 2017-07-28T11:31:06Z

Isolate backend components to run parallel from the compiler frontend

Refactor the compiler backend to isolate components that are planned to
run in parallel (local optimizations, classfile writing) from the
compiler frontend (i.e., no access to the Global instance).

The long inheritance stack (BCodeIdiomatic - BCodeHelpers - ... -
BCodeSyncAndTry) is now isolated and simply a component of the new
CodeGen class. CodeGen itself is one component of GenBCode, the
backend class.

GenBCode has some other components
- BTypes, used in CodeGen and PostProcessor. Many of the backend
components that used to be in BTypes (inliner, inlinerHeuristics,
localOpt) now moved to the PostProcessor.

PostProcessor implements global optimization (inlining), local
optimizations and classfile writing. It has itself quite a number
of components (inliner, callGraph, classfileWriter, ...)
PostProcessorFrontendAccess is an interface of functionality
required in the PostProcessor that is implemented by accessing
the frontend (Global). It synchronizes on the single frontendLock
object.

The PostProcessor doesn't have a Global. It can access certain operations that depend on the frontend through PostProcessorFrontendAccess, which synchronizes on the frontendLock. Also the types in CoreBTypes are computed lazily by looking at symbols, so access is synchronized on the same lock.

The BTypes component is passed into CodeGen and the PostProcessor. Note that GodeGen has a global, so it can (and does) access various parts of the PostProcessor through global.genBCode.postProcessor, for example it updates genBCode.postProcessor.callGraph.inlineAnnotatedCallsites during code gen.

Quite a few cleanups all around.

Removes the three work queues in the backend. Splits up the backend in two main components - CodeGen, which has a Global - PostProcessor, which has a BTypes (but no Global) CodeGen generates asm.ClassNodes and stores them in postProcessor.generatedClasses. The code generator is invoketd through BCodePhase.apply. The postProcessor then runs the optimizer, computes the InnerClass table and adds the lambdaDeserialize method if necessary. It finally serializes the classes into a byte array and writes them to disk. The implementation of classfile writing still depends on Global. It is passed in as an argument to the postProcessor. A later commit will move it to a context without Global and make it thread-safe.

BTypes is the component that's shared between CodeGen and PostProcessor.

Remove implicit conversion from LazyVar[T] to T

lrytz · 2017-08-11T13:43:24Z

This is ready for review, I'm not planning to add more right now. There's more that could be done, but that doesn't interfere with @mkeskells's plans started in #5815.

Review by @retronym. Start looking at GenBCode and follow components from there. The structure should be much more easy to understand now.

The PostProcessor doesn't have a global. It can access certain operations that depend on the frontend through PostProcessorFrontendAccess, which synchronizes on the frontendLock. Also the types in CoreBTypes are computed lazily by looking at symbols, so access is synchronized on the same lock.

The BTypes component is passed into CodeGen and the PostProcessor. Note that GodeGen has a global, so it can (and does) access various parts of the PostProcessor through global.genBCode.postProcessor, for example it updates genBCode.postProcessor.callGraph.inlineAnnotatedCallsites during code gen.

I started a benchmark for the last commit, should be coming in here: https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=6ac6da8b61%7C58cfe9e76c&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=1500736130730&to=1502458163200

Some notes for the TODOs on making local optimizations / classfile writing work in parallel:

maybe use Java's concurrent data structures (instead of scala.collection.concurrent.TrieMap)
ByteCodeRepository: probably needs synchronization to make sure a class is not parsed twice
BTypesFromClassfile.classBTypeFromParsedClassfile/classBTypeFromClassNode: probalby need to make sure that the same CalssBType is not created twice concurrently

Data structures to make concurrent

BackendUtils.maxLocalsMaxStackComputed/indyLambdaImplMethods
ByteCodeRepository.javaDefinedClasses
LocalOpt.unreachableCodeEliminated
CallGraph.inlineAnnotatedCallsites/noInlineAnnotatedCallsites

lrytz · 2017-08-11T16:28:47Z

Performance is flat, build is green :)

lrytz · 2017-08-11T20:07:06Z

The jardiff is non-empty (https://gist.github.com/lrytz/2079b2f9f933da06a3a55beb48c7b5fa): classes with a $deserializeLambda$ method now get an InnerClass entry for MethodHandles$Lookup. This is an accidental fix, I'll add a test for it and will figure out what fixed it.

By the way, @mkeskells, if you don't want to wait until this PR is merged you can also start working on top of my branch (https://github.com/lrytz/scala/tree/backendRefactor).

retronym · 2017-08-14T04:37:13Z

src/compiler/scala/tools/nsc/backend/jvm/analysis/BackendUtils.scala

  import bTypes._
  import callGraph.ClosureInstantiation
  import coreBTypes._
  import frontendAccess.compilerSettings

  // unused objects created by these constructors are eliminated by pushPop
-  private[this] lazy val sideEffectFreeConstructors: LazyVar[Set[(String, String)]] = perRunLazy {
+  private[this] lazy val sideEffectFreeConstructors: LazyVar[Set[(String, String)]] = perRunLazy(this) {


Is it a good idea for this to be both a lazy val and a LazyVar? The implementation of PerRunInit is not currently threasafe (no sync around mutation/access of inits), so I guess the invariant is that we expect that sideEffectFreeConstructors will have been called prior to initialize and any multi-threading.

Additionally, the blend of locks based on lazy vals and frontEndLock needs to be considered carefully for potential deadlocks.

There's probably too many implicit invariants, but here they are:

initialize is called in each compiler run at the beginning when the GenBCode phase starts running. There's no multithreading at this point.

I kept the values here as lazy vals so that only those actually accessed are ever added to the PerRunInit.this.inits buffer (and therefore re-initializes in the next run).

I think there can't be a deadlock here. Initializing a lazy val here will create the LazyVar and add it to the this.inits buffer, and this happens synchronized on this.

Completing a LazyVar synchronizes on the frontendLock.

retronym · 2017-08-14T04:38:36Z

src/compiler/scala/tools/nsc/backend/jvm/PerRunInit.scala

+trait PerRunInit {
+  private val inits = ListBuffer.empty[() => Unit]
+
+  def perRunInit(init: => Unit): Unit = inits += (() => init)


Is it an error to call this after initialize has been called? Can we add an assertion?

My thinking was that it's OK to call this after initialize, like the lazy vals in CoreBTypes.

retronym · 2017-08-14T04:39:24Z

src/compiler/scala/tools/nsc/backend/jvm/PerRunInit.scala

+
+  def perRunInit(init: => Unit): Unit = inits += (() => init)
+
+  def initialize(): Unit = inits.foreach(_.apply())


Can we discard the closures in init after forcing them, perhaps to allow some references to be GC-ed?

We need to keep them around to re-initialize the LazyVars in the beginning of the next run.

retronym · 2017-08-14T04:49:52Z

(More) data structures to make concurrent:

PostProcessor.generatedClasses
originalClosureInit.inlinedClones += newClosureInit (not sure if naturally protected or not by the structure of closureInstantiations
InlineSuccess.downstreamLog (again, might not need it if we can reason that that parallel threads naturally work on different instances)

retronym · 2017-08-14T05:20:14Z

src/compiler/scala/tools/nsc/backend/jvm/GenBCode.scala

-                     plain:      SubItem3,
-                     bean:       SubItem3,
-                     outFolder:  scala.tools.nsc.io.AbstractFile) {
+      postProcessor.postProcessAndSendToDisk()


The architecture of generating all the ClassNodes in memory is well suited to compiling with global optimization enabled, but for non-optimized compile runs imposes an unnecessary memory burden. In that mode, I'd like to see code gen and classfile writing happen within the loop over compilation units, and make sure that we release the ClassNode instances in between.

I've sketched out what I'd like in a commit in https://github.com/scala/scala/compare/2.12.x...retronym:review/6012?expand=1

That's a good suggestion. We can also run local optimizations in this mode. Can we do that in a follow-up PR?

Yep, so long as we don't forget before 2.12.4.

retronym · 2017-08-14T05:46:21Z

I'd prefer to have explicit types for the component wiring, as in https://github.com/scala/scala/compare/2.12.x...retronym:review/6012?expand=1

retronym · 2017-08-14T05:47:42Z

Could you please update the PR description with the important parts of your comment above and the motivation for the change?

lrytz · 2017-08-14T14:05:59Z

(More) data structures to make concurrent:

PostProcessor.generatedClasses

It's read-only after code gen, but maybe that's not safe for a ListBuffer? Probably better be safe.

originalClosureInit.inlinedClones += newClosureInit (not sure if naturally protected or not by the structure of closureInstantiations
InlineSuccess.downstreamLog (again, might not need it if we can reason that that parallel threads naturally work on different instances)

I think in a first iteration the inliner will only run single-threaded, so these two should be OK. Inlining in parallel requires synchronization.

lrytz · 2017-08-14T14:07:32Z

Will take in your explicit type annotations and update the PR descriptions, probably also improve the commit messages.

retronym · 2017-08-15T05:24:25Z

It's read-only after code gen, but maybe that's not safe for a ListBuffer? Probably better be safe.

Depending how we implement interspersed code gen and postprocessing we might need it. Probably best to encapsulate access to it rather than exposing the buffer and synchronize defensively.

retronym · 2017-08-20T23:14:36Z

/synch

In 2.12.3, if `$deserializeLambda$` has the only reference to `MethodHandles$Lookup`, the corresponding entry in the InnerClass table is missing due to an ordering issue (the method is added only after the inner classes are visited). This was fixed in the recent refactoring.

lrytz · 2017-08-29T16:26:33Z

classes with a $deserializeLambda$ method now get an InnerClass entry for MethodHandles$Lookup.

Added a test: 5f83d78

In 2.12.3, we do it in the wrong order (https://github.com/scala/scala/blob/v2.12.3/src/compiler/scala/tools/nsc/backend/jvm/GenBCode.scala#L268-L271)

This PR ports JVM backend refactor from Scala 2 as part of the #14912 thread. It squashes changes based on the PRs: - scala/scala#6012 - scala/scala#6057 The last refactor introducing backend parallelism scala/scala#6124 is left for later.

lrytz added the WIP label Jul 28, 2017

scala-jenkins added this to the 2.12.4 milestone Jul 28, 2017

lrytz force-pushed the backendRefactor branch from 8204b75 to 8b179cd Compare July 28, 2017 11:44

lrytz added 2 commits July 28, 2017 14:07

new CodeGen component, move bTypes to GenBCode

edea96f

lrytz force-pushed the backendRefactor branch from 8b179cd to dd1abd9 Compare July 28, 2017 12:09

move classfile writing code to context without global

5f5d525

lrytz mentioned this pull request Aug 7, 2017

Compiler performance scala/scala-dev#322

Closed

7 tasks

Fix -Ygen-asmp, minor cleanups

948fb88

lrytz force-pushed the backendRefactor branch 4 times, most recently from 36a5cbd to cd861fd Compare August 10, 2017 20:53

lrytz added 2 commits August 11, 2017 09:07

Move components from BTypes to PostProcessor

1532dcc

BTypes is the component that's shared between CodeGen and PostProcessor.

Use LazyVar for CoreBTypes

67a1693

Remove implicit conversion from LazyVar[T] to T

lrytz force-pushed the backendRefactor branch from cd861fd to 67a1693 Compare August 11, 2017 07:08

move PostProcessorFrontendAccess to a separate file

233231d

lrytz force-pushed the backendRefactor branch 2 times, most recently from d468f87 to 9dd1933 Compare August 11, 2017 12:25

Move LazyVar to BTypes, synchronize on frontendLock

1349e54

lrytz force-pushed the backendRefactor branch from 9dd1933 to 1349e54 Compare August 11, 2017 12:25

move backend state from BTypes to components where it belongs

58cfe9e

lrytz changed the title ~~[WIP] Backend Refactoring~~ Backend Refactoring Aug 11, 2017

lrytz removed the WIP label Aug 11, 2017

lrytz requested a review from retronym August 11, 2017 13:43

retronym reviewed Aug 14, 2017

View reviewed changes

retronym mentioned this pull request Aug 21, 2017

By default, only run DCE on methods with an ATHROW. #6044

Closed

retronym approved these changes Aug 30, 2017

View reviewed changes

lrytz and others added 2 commits August 31, 2017 22:25

Additional comment about LazyVar

342973b

Explicitly annotate return types in GenBCode/PostProcessor

a4842aa

lrytz force-pushed the backendRefactor branch from cf6adac to a4842aa Compare September 1, 2017 07:46

lrytz merged commit 76b2d64 into scala:2.12.x Sep 1, 2017

allanrenucci mentioned this pull request Sep 13, 2017

Meta-issue: Port Backend improvements scala/scala3#3113

Closed

10 tasks

nicolasstucki mentioned this pull request Oct 26, 2017

Support -d with .jar paths scala/scala3#3382

Merged

smarter mentioned this pull request Apr 12, 2022

Port missing changes from the Scala 2 backend (bug fixes for inner class generation, refactorings, optimizations ...) scala/scala3#14912

Closed

6 tasks

WojciechMazur mentioned this pull request May 30, 2022

Port JVM backend refactor from Scala 2 scala/scala3#15322

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend Refactoring #6012

Backend Refactoring #6012

lrytz commented Jul 28, 2017 •

edited

Loading

lrytz commented Aug 11, 2017

lrytz commented Aug 11, 2017

lrytz commented Aug 11, 2017

retronym Aug 14, 2017 •

edited

Loading

retronym Aug 14, 2017 •

edited

Loading

lrytz Aug 14, 2017

retronym Aug 14, 2017

lrytz Aug 14, 2017

retronym Aug 14, 2017

lrytz Aug 14, 2017

retronym commented Aug 14, 2017 •

edited

Loading

retronym Aug 14, 2017 •

edited

Loading

lrytz Aug 14, 2017

retronym Aug 15, 2017

retronym commented Aug 14, 2017

retronym commented Aug 14, 2017

lrytz commented Aug 14, 2017

lrytz commented Aug 14, 2017

retronym commented Aug 15, 2017

retronym commented Aug 20, 2017

lrytz commented Aug 29, 2017


		def perRunInit(init: => Unit): Unit = inits += (() => init)

		def initialize(): Unit = inits.foreach(_.apply())

Backend Refactoring #6012

Backend Refactoring #6012

Conversation

lrytz commented Jul 28, 2017 • edited Loading

lrytz commented Aug 11, 2017

lrytz commented Aug 11, 2017

lrytz commented Aug 11, 2017

retronym Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

retronym Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

lrytz Aug 14, 2017

Choose a reason for hiding this comment

retronym Aug 14, 2017

Choose a reason for hiding this comment

lrytz Aug 14, 2017

Choose a reason for hiding this comment

retronym Aug 14, 2017

Choose a reason for hiding this comment

lrytz Aug 14, 2017

Choose a reason for hiding this comment

retronym commented Aug 14, 2017 • edited Loading

retronym Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

lrytz Aug 14, 2017

Choose a reason for hiding this comment

retronym Aug 15, 2017

Choose a reason for hiding this comment

retronym commented Aug 14, 2017

retronym commented Aug 14, 2017

lrytz commented Aug 14, 2017

lrytz commented Aug 14, 2017

retronym commented Aug 15, 2017

retronym commented Aug 20, 2017

lrytz commented Aug 29, 2017

lrytz commented Jul 28, 2017 •

edited

Loading

retronym Aug 14, 2017 •

edited

Loading

retronym Aug 14, 2017 •

edited

Loading

retronym commented Aug 14, 2017 •

edited

Loading

retronym Aug 14, 2017 •

edited

Loading