[WIP] Parallelize the compiler via two-pass compilation #4767

smarter · 2018-07-06T01:25:52Z

How do you parallelize a compiler? A first approach might be: given a list of source files, split it into N groups then compile it using N compiler instances. If each file can be compiled independently this works, but usually files will refer to symbols defined in other files, so we need to know in advance about these symbols and their types. If we had previously compiled these files, this is easy: we can just read the compiler output (.class and .tasty files) to find every symbol and its type, this is how incremental compilation already works. Of course, we can't assume that the files we're trying to compile have already been compiled, but we don't need the full compiler output: we're only intested in the name and type of symbols. From this, we can sketch a simple way to parallelize the compiler:

Run a first pass on the input files that computes the type of every (non-private) definition, store the result in memory. This needs to be significantly faster than running the full compiler for this approach to be viable.
Run a second pass that splits the input files into N groups and compile each group independently, using the information from the first-pass to handle inter-group dependencies.

This is what this PR implements using the flag -parallelism N. The first pass works by running the compiler in a special mode where:

The body of a definition (def, val or var) is not typechecked and its body is replaced by ??? (unless its result type needs to be inferred, or it's a special case, see Typer#canDropBody).
Statements in a class body are dropped.
.tasty files are emitted for Java source files too.
Compilation is stopped after the Pickler phase (which will emit both .tasty files as well as empty .class files).

When compiling Dotty itself, the first pass compile time is about 20% of a regular non-parallel compilation (it'd probably be quite a bit less if we added explicit result types to every definition, I haven't tried that yet).

The second pass just runs N instances of the regular compiler, then combines the results.

The implementation complexity of this approach is very low (we only need to add a few lines of the code to the typechecker), and already gives interesting results. I haven't done any rigorous benchmarking yet, but here's what I get on my laptop (quad core, with hyper-threading enabled) when compiling Dotty itself:

	-parallelism 1 (normal single-threaded compilation)	-parallelism 2	-parallelism 4	-parallelism 8
Compile time	18.7 seconds	12.8 seconds	10.7 seconds	9.4 seconds
Speed-up compared to baseline	1x	1.46x	1.74x	1.99x

I suspect we could improve this significantly by implementing a work-stealing algorithm to decide which file will be compiled by which thread instead of simply dividing the list of files into N groups, since threads may sit idle if some groups end up being faster to compiles than other. An intermediate solution would be to get each thread to compile approximately the same number of lines of code instead of the same number of files.

An interesting property of the two-pass approach is that it can be combined with other parallelization techniques if they bear fruits:

If @gkossakowski Kentucky Mule or @xeno-by rsc manage to implement a significantly faster typechecker, we could speed-up our first pass.
If we figure out a good way to share some of the internal data structures used by the compiler between threads, like what @DarkDimius experimented with in https://github.com/lampepfl/dotty/issues/991, we could speed-up (and reduce the memory requirements) of our second pass.

Note that this PR is based on #4467, so the first 5 commits can be skipped. Note also that this is still a work in progress, but it should already work well enough for people to experiment with it. Enjoy!

Also simplify the logic to write the .tasty files.

Useful when generating tasty outline files, see the next commit.

This avoids a cycle when unpickling scala.Predef This change uncovered a bug when using -Ythrough-tasty: some trees were unpickled at the wrong phase because we use `withPhaseNoLater(ctx.picklerPhase)` in TreeUnpickler but the TASTYCompiler previously dropped the Pickler phase, so the phase change was a silent no-op. To avoid this issue, we change TASTYCompiler to not drop the Pickler phase, instead we change Pickler#run to not do anything when running with -from-tasty. We should also change how the ctx.xxxPhase methods work to avoid this kind of silent issues.

This lead to cycles when unpickling the standard library from Tasty.

Previously the parameter of a dummy constructor was emitted without the flag "Param" instead of the flag "ParamAccessor", this isn't meaningful and lead to compilation errors when unpickled from tasty outline files.

This should be replaced by flags or tags in Tasty that actually represent the semantics of each Java construct we need to encode.

When this flag is enabled: - The body of a definition (def, val or var) is not typechecked and its body is replaced by `???` (unless its result type needs to be inferred, or it's a special case, see Typer#canDropBody). - Statements in a class body are dropped. - .tasty files are emitted for Java source files too. - Compilation is stopped after the Pickler phase (which will emit both .tasty files as well as empty .class files).

Ideally, all -*path options would work with list of virtual or non-virtual directories but that's not needed to get the proof-of-concept working. So instead we just reusing the same logic that is used to make "-d" work.

When enabled, compilation will proceed in two passes: - The first pass is sequential and generates tasty outline files, these files are not written to disk but stored in memory. - The second pass splits the list of input files into N groups and compiles each group in parallel. The tasty outline files from the first pass are available on the classpath of each of these compilers, they contain the type signatures needed for the separate compilation of each group to succeed. TODO: Instead of splitting the input into N groups, implement work-stealing to avoid leaving some threads idle.

I'm working on fixing this in another branch.

smarter · 2018-07-06T01:34:44Z

It's also worth noting that in a project with two subprojects A and B, where B depends on A, the output of the first-pass compilation for A could be used as input for B to start its compilation earlier instead of waiting for A to finish compiling, @jvican has been experimenting with something like this in the context of scalac.

jvican · 2018-07-06T09:37:40Z

Yes, it will be available in Bloop 1.1.0, together with implementation notes and detailed explanation of how everything works.

xeno-by · 2018-07-07T03:53:16Z

Super exciting results! I like how you've been able to use compilation from Tasty to leverage the signatures produced in the first phase.

Here's our take on this, based on a similar architecture but compatible with Scalac: twitter/rsc#85. We have implemented support for a subset of Scala to produce signatures for automatically rewritten core of Twitter Util, and we will be working on adding support for more Scala features according to the roadmap.

smarter · 2018-07-07T07:40:22Z

Here's our take on this, based on a similar architecture but compatible with Scalac

Interesting! I think that there's one significant difference with the approach I'm taking here though, in https://github.com/twitter/rsc/blob/master/docs/language.md you state:

The biggest limitation is the lack of type inference for result types of vals, vars and defs, as well as type arguments for super constructor calls. We don't plan to address this limitation in the near future. Instead, we rely on Scalafix to automatically rewrite Scala sources to be compatible with Rsc.

By contrast, the parallelism that this PR enables works fine with code bases that do not have explicit result types everywhere, the first pass will just be slower the more it needs to do type inference. As an example, I've tweaked the core of Twitter Util to make it compile with Dotty (I should add it to the dotty-community-build and report the bugs I worked around too). On my laptop, hot compilation with -parallel 1 takes approximately 5 seconds and hot compilation with -parallel 8 takes approximately 3 seconds (I'm running this through sbt which only reports crude time estimates).

nicolasstucki · 2018-07-07T16:57:17Z

compiler/src/dotty/tools/dotc/config/ScalaSettings.scala

@@ -46,6 +46,7 @@ class ScalaSettings extends Settings.SettingGroup {
  val rewrite = OptionSetting[Rewrites]("-rewrite", "When used in conjunction with -language:Scala2 rewrites sources to migrate to new syntax")
  val silentWarnings = BooleanSetting("-nowarn", "Silence all warnings.")
  val fromTasty = BooleanSetting("-from-tasty", "Compile classes from tasty in classpath. The arguments are used as class names.")
+  val parallelism = IntSetting("-parallelism", "Number of parallel threads, 0 to use all cores.", 0)


Should it be all cores or all threads?

When parallelism is set to 0, we create one thread per "core" (for some definition of core) on your computer (to be more precise, we createruntime.getRuntime().availableProcessors() threads).

gkossakowski · 2018-07-11T15:31:01Z

This is really cool! I agree that dividing the typechecking into symbol table typechecking (signatures) and method body typechecking is the way to go. The key variable in this approach is how quickly one can compute the whole symbol table. This is the question I was researching in Kentucky Mule and I became convinced symbol table can be computed really quickly with careful effort. The symbol table calculation itself can be parallelized which came as a surprise to me. I gave a talk on this subject at SF Scala meetup two months ago.

@smarter out of curiosity, do you plan to work more on parallelization of dotty?

PS. I'm in Basel for the summer and happy to chat about this subject.

smarter · 2018-07-11T16:50:53Z

out of curiosity, do you plan to work more on parallelization of dotty?

Yes, though I have other higher priorities so not sure how much time I'll spend on it.

PS. I'm in Basel for the summer and happy to chat about this subject.

You should come to EPFL to give a talk :)

odersky · 2019-01-12T10:21:10Z

@smarter This has been inactive for 6 months. Should we keep it open?

smarter · 2019-01-12T11:47:33Z

Yes, I'm still on it, I thinking keeping PRs marked WIP open is OK.

anatoliykmetyuk · 2021-07-27T13:54:28Z

There was no activity on this one for a long while, so let's close it.

nicolasstucki and others added 19 commits July 6, 2018 01:12

Always emit .tasty files (not .hasTasty)

ca11ef4

Make tests support .hasTasty and .tasty files

20c6b81

Change -Yemit-tasty to -Yemit-tasty-in-class and invert semantics

68f932c

Fix loading tasty file from jar in concurrent setting

4b27d82

Rename isHasTastyFile to isTastyFile

5f939ef

Output .tasty files in Pickler phase

be93871

Also simplify the logic to write the .tasty files.

Support classpath with .tasty files and empty .class files

f2e704a

Useful when generating tasty outline files, see the next commit.

AbstractFile#fileOrSubdirectoryNamed: Fix race condition

78d3be7

When forcing a value class Foo, do not force BoxedFoo

04be8e0

This lead to cycles when unpickling the standard library from Tasty.

Fix dummy constructors emitted by the java parser

14986f7

Previously the parameter of a dummy constructor was emitted without the flag "Param" instead of the flag "ParamAccessor", this isn't meaningful and lead to compilation errors when unpickled from tasty outline files.

HACK: Pickle the JavaDefined flag to make Java tasty outlines work

a2f4213

This should be replaced by flags or tags in Tasty that actually represent the semantics of each Java construct we need to encode.

Support loading from virtual .tasty and .class files

e2f93ec

HACK: make -priorityclasspath take a possibly-virtual directory

09ad1cf

Ideally, all -*path options would work with list of virtual or non-virtual directories but that's not needed to get the proof-of-concept working. So instead we just reusing the same logic that is used to make "-d" work.

Work around type avoidance bugs in testcases

74a9c98

I'm working on fixing this in another branch.

Disable posTwice due to incompatibility with -parallelism

9eab717

Temporarily disable broken test

a180cfa

smarter added the stat:wip label Jul 6, 2018

xeno-by mentioned this pull request Jul 7, 2018

Laser focus on outlining twitter/rsc#85

Merged

nicolasstucki reviewed Jul 7, 2018

View reviewed changes

nicolasstucki mentioned this pull request Aug 17, 2018

Output .tasty files in Pickler phase #4955

Closed

OlivierBlanvillain assigned smarter Oct 2, 2018

smarter mentioned this pull request Apr 30, 2019

Outline Typing retronym/scala#44

Closed

bishabosha marked this pull request as draft July 8, 2021 18:36

anatoliykmetyuk closed this Jul 27, 2021

bishabosha added the itype:performance label Jul 24, 2023

bishabosha mentioned this pull request Feb 1, 2024

WIP: Experiment with 2-phase compilation with outline and batch parallel compile #19589

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Parallelize the compiler via two-pass compilation #4767

[WIP] Parallelize the compiler via two-pass compilation #4767

smarter commented Jul 6, 2018 •

edited

Loading

smarter commented Jul 6, 2018

jvican commented Jul 6, 2018

xeno-by commented Jul 7, 2018

smarter commented Jul 7, 2018

nicolasstucki Jul 7, 2018

smarter Jul 7, 2018

gkossakowski commented Jul 11, 2018

smarter commented Jul 11, 2018

odersky commented Jan 12, 2019

smarter commented Jan 12, 2019

anatoliykmetyuk commented Jul 27, 2021

[WIP] Parallelize the compiler via two-pass compilation #4767

[WIP] Parallelize the compiler via two-pass compilation #4767

Conversation

smarter commented Jul 6, 2018 • edited Loading

smarter commented Jul 6, 2018

jvican commented Jul 6, 2018

xeno-by commented Jul 7, 2018

smarter commented Jul 7, 2018

nicolasstucki Jul 7, 2018

Choose a reason for hiding this comment

smarter Jul 7, 2018

Choose a reason for hiding this comment

gkossakowski commented Jul 11, 2018

smarter commented Jul 11, 2018

odersky commented Jan 12, 2019

smarter commented Jan 12, 2019

anatoliykmetyuk commented Jul 27, 2021

smarter commented Jul 6, 2018 •

edited

Loading