Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Parallelize the compiler via two-pass compilation #4767

Closed
wants to merge 19 commits into from

Conversation

smarter
Copy link
Member

@smarter smarter commented Jul 6, 2018

How do you parallelize a compiler? A first approach might be: given a list of source files, split it into N groups then compile it using N compiler instances. If each file can be compiled independently this works, but usually files will refer to symbols defined in other files, so we need to know in advance about these symbols and their types. If we had previously compiled these files, this is easy: we can just read the compiler output (.class and .tasty files) to find every symbol and its type, this is how incremental compilation already works. Of course, we can't assume that the files we're trying to compile have already been compiled, but we don't need the full compiler output: we're only intested in the name and type of symbols. From this, we can sketch a simple way to parallelize the compiler:

  1. Run a first pass on the input files that computes the type of every (non-private) definition, store the result in memory. This needs to be significantly faster than running the full compiler for this approach to be viable.
  2. Run a second pass that splits the input files into N groups and compile each group independently, using the information from the first-pass to handle inter-group dependencies.

This is what this PR implements using the flag -parallelism N. The first pass works by running the compiler in a special mode where:

  • The body of a definition (def, val or var) is not typechecked and its body is replaced by ??? (unless its result type needs to be inferred, or it's a special case, see Typer#canDropBody).
  • Statements in a class body are dropped.
  • .tasty files are emitted for Java source files too.
  • Compilation is stopped after the Pickler phase (which will emit both .tasty files as well as empty .class files).

When compiling Dotty itself, the first pass compile time is about 20% of a regular non-parallel compilation (it'd probably be quite a bit less if we added explicit result types to every definition, I haven't tried that yet).

The second pass just runs N instances of the regular compiler, then combines the results.

The implementation complexity of this approach is very low (we only need to add a few lines of the code to the typechecker), and already gives interesting results. I haven't done any rigorous benchmarking yet, but here's what I get on my laptop (quad core, with hyper-threading enabled) when compiling Dotty itself:

-parallelism 1 (normal single-threaded compilation) -parallelism 2 -parallelism 4 -parallelism 8
Compile time 18.7 seconds 12.8 seconds 10.7 seconds 9.4 seconds
Speed-up compared to baseline 1x 1.46x 1.74x 1.99x

I suspect we could improve this significantly by implementing a work-stealing algorithm to decide which file will be compiled by which thread instead of simply dividing the list of files into N groups, since threads may sit idle if some groups end up being faster to compiles than other. An intermediate solution would be to get each thread to compile approximately the same number of lines of code instead of the same number of files.

An interesting property of the two-pass approach is that it can be combined with other parallelization techniques if they bear fruits:

Note that this PR is based on #4467, so the first 5 commits can be skipped. Note also that this is still a work in progress, but it should already work well enough for people to experiment with it. Enjoy!

nicolasstucki and others added 19 commits July 6, 2018 01:12
Also simplify the logic to write the .tasty files.
Useful when generating tasty outline files, see the next commit.
This avoids a cycle when unpickling scala.Predef

This change uncovered a bug when using -Ythrough-tasty: some trees were
unpickled at the wrong phase because we use
`withPhaseNoLater(ctx.picklerPhase)` in TreeUnpickler but the
TASTYCompiler previously dropped the Pickler phase, so the phase change
was a silent no-op. To avoid this issue, we change TASTYCompiler to not
drop the Pickler phase, instead we change Pickler#run to not do anything
when running with -from-tasty. We should also change how the
ctx.xxxPhase methods work to avoid this kind of silent issues.
This lead to cycles when unpickling the standard library from Tasty.
Previously the parameter of a dummy constructor was emitted without the
flag "Param" instead of the flag "ParamAccessor", this isn't meaningful
and lead to compilation errors when unpickled from tasty outline files.
This should be replaced by flags or tags in Tasty that actually
represent the semantics of each Java construct we need to encode.
When this flag is enabled:
- The body of a definition (def, val or var) is not typechecked and its
  body is replaced by `???` (unless its result type needs to be inferred,
  or it's a special case, see Typer#canDropBody).
- Statements in a class body are dropped.
- .tasty files are emitted for Java source files too.
- Compilation is stopped after the Pickler phase (which will emit
  both .tasty files as well as empty .class files).
Ideally, all -*path options would work with list of virtual or
non-virtual directories but that's not needed to get the
proof-of-concept working. So instead we just reusing the same logic that
is used to make "-d" work.
When enabled, compilation will proceed in two passes:
- The first pass is sequential and generates tasty outline files, these
  files are not written to disk but stored in memory.
- The second pass splits the list of input files into N groups and
  compiles each group in parallel. The tasty outline files from the
  first pass are available on the classpath of each of these compilers,
  they contain the type signatures needed for the separate compilation of
  each group to succeed.

TODO: Instead of splitting the input into N groups, implement
work-stealing to avoid leaving some threads idle.
I'm working on fixing this in another branch.
@smarter
Copy link
Member Author

smarter commented Jul 6, 2018

It's also worth noting that in a project with two subprojects A and B, where B depends on A, the output of the first-pass compilation for A could be used as input for B to start its compilation earlier instead of waiting for A to finish compiling, @jvican has been experimenting with something like this in the context of scalac.

@jvican
Copy link
Member

jvican commented Jul 6, 2018

Yes, it will be available in Bloop 1.1.0, together with implementation notes and detailed explanation of how everything works.

@xeno-by
Copy link
Member

xeno-by commented Jul 7, 2018

Super exciting results! I like how you've been able to use compilation from Tasty to leverage the signatures produced in the first phase.

Here's our take on this, based on a similar architecture but compatible with Scalac: twitter/rsc#85. We have implemented support for a subset of Scala to produce signatures for automatically rewritten core of Twitter Util, and we will be working on adding support for more Scala features according to the roadmap.

@smarter
Copy link
Member Author

smarter commented Jul 7, 2018

Here's our take on this, based on a similar architecture but compatible with Scalac

Interesting! I think that there's one significant difference with the approach I'm taking here though, in https://github.com/twitter/rsc/blob/master/docs/language.md you state:

The biggest limitation is the lack of type inference for result types of vals, vars and defs, as well as type arguments for super constructor calls. We don't plan to address this limitation in the near future. Instead, we rely on Scalafix to automatically rewrite Scala sources to be compatible with Rsc.

By contrast, the parallelism that this PR enables works fine with code bases that do not have explicit result types everywhere, the first pass will just be slower the more it needs to do type inference. As an example, I've tweaked the core of Twitter Util to make it compile with Dotty (I should add it to the dotty-community-build and report the bugs I worked around too). On my laptop, hot compilation with -parallel 1 takes approximately 5 seconds and hot compilation with -parallel 8 takes approximately 3 seconds (I'm running this through sbt which only reports crude time estimates).

@@ -46,6 +46,7 @@ class ScalaSettings extends Settings.SettingGroup {
val rewrite = OptionSetting[Rewrites]("-rewrite", "When used in conjunction with -language:Scala2 rewrites sources to migrate to new syntax")
val silentWarnings = BooleanSetting("-nowarn", "Silence all warnings.")
val fromTasty = BooleanSetting("-from-tasty", "Compile classes from tasty in classpath. The arguments are used as class names.")
val parallelism = IntSetting("-parallelism", "Number of parallel threads, 0 to use all cores.", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be all cores or all threads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parallelism is set to 0, we create one thread per "core" (for some definition of core) on your computer (to be more precise, we createruntime.getRuntime().availableProcessors() threads).

@gkossakowski
Copy link
Member

This is really cool! I agree that dividing the typechecking into symbol table typechecking (signatures) and method body typechecking is the way to go. The key variable in this approach is how quickly one can compute the whole symbol table. This is the question I was researching in Kentucky Mule and I became convinced symbol table can be computed really quickly with careful effort. The symbol table calculation itself can be parallelized which came as a surprise to me. I gave a talk on this subject at SF Scala meetup two months ago.

@smarter out of curiosity, do you plan to work more on parallelization of dotty?

PS. I'm in Basel for the summer and happy to chat about this subject.

@smarter
Copy link
Member Author

smarter commented Jul 11, 2018

out of curiosity, do you plan to work more on parallelization of dotty?

Yes, though I have other higher priorities so not sure how much time I'll spend on it.

PS. I'm in Basel for the summer and happy to chat about this subject.

You should come to EPFL to give a talk :)

@odersky
Copy link
Contributor

odersky commented Jan 12, 2019

@smarter This has been inactive for 6 months. Should we keep it open?

@smarter
Copy link
Member Author

smarter commented Jan 12, 2019

Yes, I'm still on it, I thinking keeping PRs marked WIP open is OK.

@bishabosha bishabosha marked this pull request as draft July 8, 2021 18:36
@anatoliykmetyuk
Copy link
Contributor

There was no activity on this one for a long while, so let's close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants