Reduce compilation speed overhead introduced by incremental compiler #1078

gkossakowski opened this Issue Jan 13, 2014 · 4 comments


None yet
2 participants

gkossakowski commented Jan 13, 2014

UPDATE from April 25th: The numbers in this ticket are wrong. See the comment below the ticket description for correct numbers. In the light of finding that the overhead is much lower than it was originally suspected, there are no plans on improving performance overhead of incremental compiler and this ticket is closed.

The incremental compiler adds significant overhead to clean builds of Scala projects. Normally, it's around 15-20% but it can be as high as 30% when compiling Scala compiler code base.

I ran some simple benchmarks to see where the time is spent. The theory was that significant portion of the time incremental compiler adds to overall compilation time is spent on extracting API structures. Let's see timings for compiling Scala compiler using the original algorithm (not the name hashing that's been recently introduced):

// original algorithm
[info] [Extracts the public API from source files. in 18 436ms]
[info] [Extracts dependency information in 2 105ms]
[info] [total in 93 391ms]

// original algorithm, skips extraction of inherited members
[info] [Extracts the public API from source files. in 1 398ms]
[info] [Extracts dependency information in 1 326ms]
[info] [total in 67 885ms]

As we can see, we do spend a lot of time in API extraction and, in particular, extracting inherited members. The API extraction logic extracts inherited members for each class separately so you end up with multiple copies of the same member inherited by different classes. This introduces O(n^2) complexity in the API extraction algorithm. Scala compiler has a very deep inheritance hierarchies with lots of members defined. This code structure highlights the O(n^2) particularly well hence the overhead introduced by incremental compiler is close to 30% in this case.

I also tried to see whether name hashing introduces any significant overhead on top of what's already introduced by the original algorithm. The numbers I've got do not suggest that:

// name hashing enabled but skips *hashing* of inherited members
[info] [Extracts the public API from source files. in 24 877ms]
[info] [Extracts dependency information in 1901ms]
[info] [total in 96 797ms]

// name hashing enabled
[info] [Extracts the public API from source files. in 20 771ms]
[info] [Extracts dependency information in 1 834ms]
[info] [total in 87 762ms]

One weird thing is that when we do more work (we do have inherited members) the time is lower. This seems to be due to JIT non-determinism.

In any case, it's clear that we should be smarter when extracting API structures by not extracting inherited members over and over again. However, this will require quite deep refactoring. I'll try to outline a plan for such refactoring some other time.


gkossakowski commented Apr 25, 2014

The original numbers I got back in January were wrong because I was using hacked version of incremental compiler for benchmarks. I gathered more numbers using sbt 0.13.2 and the overhead of incremental compiler in each project is:

  • spire-core: 2%
  • scalatest: 6-9%
  • specs2-core: 7-11%
  • scala-library: 25%
  • scala-reflect: 17%
  • scala-compiler: 20%
  • akka-actor: 7%
  • Play 2 core: 5%

As you can see, for all projects apart from Scala itself, the overhead is below 10%. Scala suffers from larger overhead because it makes heavy use of inheritance and the problem of O(n^2) behavior in incremental compiler is still there. This problem is less visible in all other projects which make less use of inheritance in their code bases.

In light of this findings, I'm closing this ticket as Invalid.

purefn commented May 1, 2014

@gkossakowski How did you determine the amount of overhead? I'd like to seem what the overhead is for a large project I work on and see if we might benefit from contributing a patch or not. I suspect not because we don't use much inheritance, but I'd like to see real numbers. We also have lots of branches and builds going at the same time and builds take several minutes, so even we could only reduce the overhead by 5-10% that might still be worth it for us.


gkossakowski commented May 1, 2014

@purefn: I got the number by adding -verbose flag to scalacOptions and looking at how much time is spent in each phase.

In your build, add scalacOptions += "-verbose" and then run compile. You should see the output which looks like:

[info] [typer in 16ms]
[info] [patmat in 0ms]
[info] [superaccessors in 1ms]
[info] [extmethods in 1ms]
[info] [pickler in 2ms]
[info] API phase took : 0.044 s
[info] [Extracts the public API from source files. in 45ms]
[info] [total in 274ms]

The line which tells you how much time (45ms) is spent in api extraction phase is relevant. If you compare it to total time, you see that in case of a very trivial project I just tested it takes 16% of the total time.

purefn commented May 1, 2014

Ok, on our project the API extraction only accounts for ~2.5% of the total build time. I'll look elsewhere to try and improve our compilation speed, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment