Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JMH benchmarks and improve performance of Zinc phases #225

Merged
merged 12 commits into from
Feb 17, 2017

Conversation

jvican
Copy link
Member

@jvican jvican commented Feb 8, 2017

This pull request applies some minor fixes and improves the performance of the
Dependency phase by merging the two-step implementation into one step. For an
explanation of the applied changes and its effect to performance, please check
the commit messages.

This work is done by the Scala Center.

@eed3si9n
Copy link
Member

eed3si9n commented Feb 8, 2017

Do you have benchmark to compare before & after? Preferably using shapeless or akka.

}

private class DependencyProcessor(unit: CompilationUnit) {
private def firstClassOrModuleDef(tree: Tree): Option[Tree] = {
Copy link
Member Author

@jvican jvican Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, note that this code has been moved from the end of the file to the place where it's used. I decided not to change this implementation because there's not a collectFirst or find primitive in the reflect tree API and this is probably the most efficient implementation. I don't think it's worth it to reimplement it in a more functional style that doesn't leverage return.

* class/trait/object declared in the compilation unit. Otherwise, issue warning.
*/
def processTopLevelImportDependency(dep: Symbol): Unit = {
if (!orphanImportsReported) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only report the orphan error once.

// The dependency comes from a class file
binaryDependency(pf.file, binaryClassName)
case _ =>
// TODO: If this happens, scala internals have changed. Log error.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is very unlikely to happen, but if it does it should be more than just devWarning or warning. That's why it's a todo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least throw an exception for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it is not a new logic so we can live without exception or error.

@jvican
Copy link
Member Author

jvican commented Feb 8, 2017

@eed3si9n Will do that between today and tomorrow.

@jvican
Copy link
Member Author

jvican commented Feb 8, 2017

By the way, I would be happy if @romanowski has a look at this diff too.

@jvican
Copy link
Member Author

jvican commented Feb 9, 2017

I've just finished running this on shapeless, and it spares two seconds in the incremental compilation.

Before:

[info] Updating {file:/home/jvican/shapeless/}core... (out-46)
[info] Resolving jline#jline;2.12.1 ... (out-46)
[info] Done updating. (out-46)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-56)
[info] Resolving org.scala-lang.modules#scala-parser-combinators_2.11;1.0.4 ... (out-56)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-60)
[info] Resolving jline#jline;2.12.1 ... (out-56)
[info] Done updating. (out-56)
[info] Updating {file:/home/jvican/shapeless/}root... (out-64)
[info] Resolving org.scala-lang#scala-library;2.11.8 ... (out-64)
[info] 'compiler-bridge_2.11' not yet compiled for Scala 2.11.8. Compiling... (out-60)
[info] Resolving jline#jline;2.12.1 ... (out-64)
[info] Done updating. (out-64)
[info]   Compilation completed in 8.594 s (out-60)
[info] Done compiling. (out-60)
[info] Total time: 44 s, completed Feb 9, 2017 1:06:58 PM (out-23)

After:

> compile
[info] Updating {file:/home/jvican/shapeless/}core... (out-126)
[info] Resolving jline#jline;2.12.1 ... (out-126)
[info] Done updating. (out-126)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-140)
[info] Resolving jline#jline;2.12.1 ... (out-140)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-138)
[info] Done updating. (out-140)
[info] Updating {file:/home/jvican/shapeless/}root... (out-144)
[info] Resolving jline#jline;2.12.1 ... (out-144)
[info] Done updating. (out-144)
[info] 'compiler-bridge_2.11' not yet compiled for Scala 2.11.8. Compiling... (out-138)
[info]   Compilation completed in 8.657 s (out-138)
[info] Done compiling. (out-138)
[info] Total time: 42 s, completed Feb 9, 2017 11:52:55 AM (out-103)

As it's a micro-optimization, I think the results are good. #206 had a bigger impact (my initial guess), and in my opinion the changes of this PR bring us two things:

  • Readability. Code is more intuitive and readable now.
  • Small performance gain.

@eed3si9n
Copy link
Member

eed3si9n commented Feb 9, 2017

Nice. Thanks for testing.

@eed3si9n
Copy link
Member

eed3si9n commented Feb 9, 2017

I'll keep this open for another day so @romanowski can review.

@romanowski
Copy link
Contributor

Hmm I am a little bit skeptical about this 2 seconds since this is almost 5% of time spend in full build command (that contains 8 s spent on compiling compiler interface and I don't know how many on resolving dependencies). IIRC pure compilation time was around 20 s. 2 sec out of that means that compilation time is reduced by 10%. After my changes (on my machine) zinc generates around 5% of of overhead so it would mean that this fix speeds up non-zinc phases.

@jvican can you print times spent in zinc phases? I suggest do following routine

  • start sbt
  • compile then clean then update
  • shout down sbt and start again
  • run compile

Moreover I suggest using this changes: romanowski@afd5c04 -time spent in Dependency and API phase will be printed.

Copy link
Contributor

@romanowski romanowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM with small suggestions.

I really like that code now becomes much clearer.

responsibleOfImports match {
case Some(classOrModuleDef) =>
val sym = classOrModuleDef.symbol
val firstClassSymbol = if (sym.isModule) sym.moduleClass else sym
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can should be computed only once

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

// The dependency comes from a class file
binaryDependency(pf.file, binaryClassName)
case _ =>
// TODO: If this happens, scala internals have changed. Log error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least throw an exception for now?

// The dependency comes from a class file
binaryDependency(pf.file, binaryClassName)
case _ =>
// TODO: If this happens, scala internals have changed. Log error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it is not a new logic so we can live without exception or error.

@jvican
Copy link
Member Author

jvican commented Feb 10, 2017

IIRC pure compilation time was around 20 s. 2 sec out of that means that compilation time is reduced by 10%.

No, the 2 seconds are out of the full build, not the compilation time. You can see that in the logs I attached with the benchmarking results.

Benchmarking logs

I've benchmarked this PR again and these are the results:

Before:

jvican in ~/shapeless                                                                                                                                           [9:14:24] 
> $ sbt                                                                                                                                             [±sbt-1.0-stripped ●]
[info] Loading project definition from /home/jvican/shapeless/project (out-3)
[info] Set current project to root (in build file:/home/jvican/shapeless/) (out-3)
> clean
[info] Total time: 0 s, completed Feb 10, 2017 9:14:39 AM (out-23)
> update
[info] Updating {file:/home/jvican/shapeless/}core... (out-60)
[info] Resolving jline#jline;2.12.1 ... (out-60)
[info] Done updating. (out-60)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-70)
[info] Resolving jline#jline;2.12.1 ... (out-70)
[info] Done updating. (out-70)
[info] Updating {file:/home/jvican/shapeless/}root... (out-72)
[info] Resolving jline#jline;2.12.1 ... (out-72)
[info] Done updating. (out-72)
[info] Total time: 2 s, completed Feb 10, 2017 9:14:41 AM (out-49)
> compile
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-108)
[info] Resolving org.scala-lang.modules#scala-parser-combinators_2.11;1.0.4 ... (out-108)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-112)
[info] Resolving jline#jline;2.12.1 ... (out-108)
[info] Done updating. (out-108)
[info] Updating {file:/home/jvican/shapeless/}root... (out-115)
[info] Resolving jline#jline;2.12.1 ... (out-115)
[info] Done updating. (out-115)
[info] API phase took : 2.354 s (out-78)
[info] Dependency phase took : 1.96 s (out-78)
[info] Done compiling (in 44.897 s (out-112)
[info] Total time: 47 s, completed Feb 10, 2017 9:15:29 AM (out-75)
> clean
[info] Total time: 1 s, completed Feb 10, 2017 9:15:36 AM (out-129)
> compile
[info] Updating {file:/home/jvican/shapeless/}core... (out-178)
[info] Resolving jline#jline;2.12.1 ... (out-178)
[info] Done updating. (out-178)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-190)
[info] Resolving jline#jline;2.12.1 ... (out-190)
[info] Done updating. (out-190)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-192)
[info] Updating {file:/home/jvican/shapeless/}root... (out-196)
[info] Resolving jline#jline;2.12.1 ... (out-196)
[info] Done updating. (out-196)
[info] API phase took : 1.774 s (out-156)
[info] Dependency phase took : 0.447 s (out-156)
[info] Done compiling (in 26.497 s (out-192)
[info] Total time: 28 s, completed Feb 10, 2017 9:16:05 AM (out-155)

After:

jvican in ~/shapeless                                                                                                                                           [8:45:07] 
> $ sbt                                                                                                                                             [±sbt-1.0-stripped ●]
[info] Loading project definition from /home/jvican/shapeless/project (out-3)
[info] Set current project to root (in build file:/home/jvican/shapeless/) (out-3)
> clean
[info] Total time: 1 s, completed Feb 10, 2017 8:45:24 AM (out-23)
> update
[info] Updating {file:/home/jvican/shapeless/}core... (out-60)
[info] Resolving jline#jline;2.12.1 ... (out-60)
[info] Done updating. (out-60)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-70)
[info] Resolving jline#jline;2.12.1 ... (out-70)
[info] Done updating. (out-70)
[info] Updating {file:/home/jvican/shapeless/}root... (out-72)
[info] Resolving jline#jline;2.12.1 ... (out-72)
[info] Done updating. (out-72)
[info] Total time: 2 s, completed Feb 10, 2017 8:45:27 AM (out-49)
> compile
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-108)
[info] Resolving org.scala-lang.modules#scala-parser-combinators_2.11;1.0.4 ... (out-108)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-112)
[info] Resolving jline#jline;2.12.1 ... (out-108)
[info] Done updating. (out-108)
[info] Updating {file:/home/jvican/shapeless/}root... (out-116)
[info] Resolving jline#jline;2.12.1 ... (out-116)
[info] Done updating. (out-116)
[info] API phase took : 2.554 s (out-78)
[info] Dependency phase took : 0.572 s (out-78)
[info] Done compiling (in 44.033 s (out-112)
[info] Total time: 46 s, completed Feb 10, 2017 8:46:15 AM (out-75)
> clean
[info] Total time: 0 s, completed Feb 10, 2017 8:46:17 AM (out-129)
> compile
[info] Updating {file:/home/jvican/shapeless/}core... (out-178)
[info] Resolving jline#jline;2.12.1 ... (out-178)
[info] Done updating. (out-178)
[info] Updating {file:/home/jvican/shapeless/}scratch... (out-188)
[info] Resolving jline#jline;2.12.1 ... (out-188)
[info] Done updating. (out-188)
[info] Updating {file:/home/jvican/shapeless/}root... (out-196)
[info] Resolving core#shapeless_2.11;2.3.3-SNAPSHOT ... (out-196)
[info] Compiling 79 Scala sources to /home/jvican/shapeless/core/target/scala-2.11/classes... (out-192)
[info] Resolving jline#jline;2.12.1 ... (out-196)
[info] Done updating. (out-196)
[info] API phase took : 1.58 s (out-156)
[info] Dependency phase took : 0.481 s (out-156)
[info] Done compiling (in 25.316 s (out-192)
[info] Total time: 27 s, completed Feb 10, 2017 8:46:46 AM (out-155)

Hardware

The machine used for the "benchmark" is Linux tribox 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017 x86_64 GNU/Linux. It has Turbo Boost disabled on purpose to get more confident results (note that the previous benchmarking has Turbo Boost enabled).

Analysis of the results

The logs were achieved with the same methodology that @romanowski proposed. The first time compile was run, this PR decreased the running time of Dependency by 3/4. The second time it's run, performance is on par (this is very weird considering that I'm cleaning the directory before running compile again).

My guess is that the first time is run Zinc is cold, and the second one is already hot and has already been optimized by the JVM. Anyhow, I think that this produces a significant speedup in cold starts and minor or zero speedups while hot. However, I prefer not to confirm this hypothesis and focus on other tasks of Zinc, since there's still some work more to do in the ExtractAPI and Dependency algorithm that I'm planning to change in my next PR.

Note: this "benchmark" uses millis to compute the time of the phases, so please don't take it very seriously, this is just an experiment that shows measurable improvement.

@dwijnand
Copy link
Member

dwijnand commented Feb 10, 2017 via email

@dwijnand
Copy link
Member

dwijnand commented Feb 10, 2017 via email

@jvican
Copy link
Member Author

jvican commented Feb 10, 2017

I forgot to mention that I got very minor speedups in other runs that I did (the posted one is only the first one). Obviously, this is not the case for the posted results.

@jvican
Copy link
Member Author

jvican commented Feb 10, 2017

@dwijnand As I mention in my comment, I did not intend to make a rigorous, repeatable, scientifically-provable benchmark that shows the real impact of this change. What I tried is to provide a faint measurable impact that people can use to discern whether this is a valuable change or not regarding performance and readability. This benchmark is missing a lot of stuff (GC logs, debug and optimization information, proper warm up, etc).

Honestly, if I were to benchmark this seriously, I would use jmh and would also profile it to see the memory footprint. But, I don't think this is something worth it now -- creating all the scaffolding to perform such experiment and analyzing all the benchmarking data is very time-consuming.

@dwijnand
Copy link
Member

Don't get me wrong, I agree 1 datapoint is better than none.

But for a project as established and important as zinc I would consider setting up a suite of JMH benchmarks to be a pre-requisite to working on and accepting any performance-related code changes.

Otherwise we're kind of just shooting in the dark; just guessing at how reorganising the code impacts its performance when running on the JVM and by the JIT.

@jvican
Copy link
Member Author

jvican commented Feb 10, 2017

I agree with you that such a suite is required, and I wish we already had that. It's true that Zinc is a very established project in the community but no performance fixes (e.g. https://github.com/sbt/zinc/pulls?q=is:pr+performance+is:closed and more issues in sbt/sbt) have been blocked by the need of such a suite.

Even though I would really like to have it, I don't have the time to do it right now. The time I can invest on Zinc is fairly constrained -- I'm focusing on other use cases of the incremental compilation that are more significant to the community (correctness) so that we can have a beta soon; then I will focus on java compatibility and benchmarking.

My main purpose while creating this PR was enhancing readability of the code and doing an algorithm that is more intuitive and easier to follow. I then realised that it would also be more performant, and that's why I chose the title of this PR. What I propose is that we merge this and when I have the time to do proper benchmarking, I will get back to it and prove that these changes don't affect performance (though the speedup in cold start is significant).

@dwijnand
Copy link
Member

Btw I would consider a JMH benchmark suite to have been a pre-requisite to those other performance-related changes too.

@romanowski
Copy link
Contributor

@jvican thanks for the results - over 60% speedup on cold machine is really nice (but TBH it is strange that Dependecy phase in 1st compilation is so slow).

@dwijnand I think having JMH would be really nice but no one working only partially (or in the free time) on zinc don't have time for that. E.g. in my case I just saw that compilation with zinc was way slower that was without so in order to use zinc I needed to fix that (and fixing took quite a while) so there was no time left for JMH.

@jvican
Copy link
Member Author

jvican commented Feb 13, 2017

After sleeping on this, I've decided to implement a JMH benchmark suite for Zinc. I'll update this PR once it's done. I think that this is becoming even more key after the changes I've done to Dependency that will help us fix #174. More details to follow.

@jvican
Copy link
Member Author

jvican commented Feb 15, 2017

After some days of work, I already have a working implementation of JMH benchmarks. I'll update this PR with it and the results. I may also add more benchmarks.

jvican added a commit to scalacenter/zinc that referenced this pull request Feb 15, 2017
This commit introduces JMH benchmarks as requested by sbt#225.

The strategy for the JMH benchmarks is the following:

1. Clone repository to be compiled and checkout at concrete commit (for
   reproducibility).
2. Generate sbt task for every subproject we want to test. This task
   will tell us information about the compiler options: sources and
   classpath (scalac options are integrated in next commit).
3. Execute the sbt tasks and get the output.
4. Instantiate a compiler fully integrated with the compiler bridge that
   has all the parameters given by the sbt output.
5. Run and enjoy.

The JMH benchmarks run this for every iteration, for now. However, they
only measure the running time of the **whole** compiler pipeline. There
is no reliable way to do it only for our Zinc phases since they are
tightly coupled to the compiler, so this is the best we can do.

The JMH benchmarks are only 2.12 compatible. This commit introduces a
benchmark for Shapeless and shows how easy it is to extend for other
codebases. It also adds some tests to make sure that the required
scaffolding works.

For that, we have to modify the accessibility of some methods in our
infrastructure, particularly `ScalaCompilerForUnitTesting`.
@jvican jvican changed the title Improve performance of Dependency phase Add JMH benchmarks and improve performance of Zinc phases Feb 15, 2017
@jvican
Copy link
Member Author

jvican commented Feb 17, 2017

Yes, this way we don't benchmark sbt. What I like about this approach is that it's pretty scalable, you can benchmark any project in the community that uses sbt and compiles with 2.12.1. In fact, I was thinking of running these results with scalac instead of shapeless since they are completely different codebases and the results could differ. I'll do that deep study when I PR some changes to Dependency and ExtractUsedNames that I'm working right now.

@jvican
Copy link
Member Author

jvican commented Feb 17, 2017

Okay, here's some more info regarding memory consumption.

Running JMH with gc profiler

Before

[info] 
[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score           Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_fc51a8a7  sample   18       12041.381 ±       449.960   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_fc51a8a7  sample            11442.061                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_fc51a8a7  sample            11936.989                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_fc51a8a7  sample            12908.390                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_fc51a8a7  sample            13270.778                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_fc51a8a7  sample            13270.778                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_fc51a8a7  sample            13270.778                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_fc51a8a7  sample            13270.778                   ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_fc51a8a7  sample            13270.778                   ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_fc51a8a7  sample   18         276.220 ±         9.252  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_fc51a8a7  sample   18  3631095811.111 ±   9551724.939    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_fc51a8a7  sample   18           0.006 ±         0.010  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_fc51a8a7  sample   18       89657.778 ±    145543.632    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_fc51a8a7  sample   18           0.047 ±         0.076  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_fc51a8a7  sample   18      651851.111 ±   1058359.022    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_fc51a8a7  sample   18         273.222 ±        35.202  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_fc51a8a7  sample   18  3594218369.333 ± 465027547.723    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_fc51a8a7  sample   18          10.564 ±        17.143  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_fc51a8a7  sample   18   143772908.000 ± 231209910.701    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_fc51a8a7  sample   18           8.002 ±         4.726  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_fc51a8a7  sample   18   105298880.000 ±  61733223.185    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_fc51a8a7  sample   18          59.000                  counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_fc51a8a7  sample   18       10780.000                      ms
[success] Total time: 598 s, completed Feb 17, 2017 5:14:36 PM
[success] Total time: 0 s, completed Feb 17, 2017 5:14:36 PM

After

[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score           Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_e21d0ada  sample   18       12119.674 ±       400.522   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_e21d0ada  sample            11408.507                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_e21d0ada  sample            12071.207                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_e21d0ada  sample            12846.314                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_e21d0ada  sample            12952.011                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_e21d0ada  sample            12952.011                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_e21d0ada  sample            12952.011                   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_e21d0ada  sample            12952.011                   ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_e21d0ada  sample            12952.011                   ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_e21d0ada  sample   18         274.153 ±         8.612  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_e21d0ada  sample   18  3626667393.778 ±  18956657.954    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_e21d0ada  sample   18           0.005 ±         0.009  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_e21d0ada  sample   18       64198.222 ±    118385.765    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_e21d0ada  sample   18           0.034 ±         0.063  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_e21d0ada  sample   18      467534.667 ±    862294.189    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_e21d0ada  sample   18         268.816 ±        28.283  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_e21d0ada  sample   18  3557585351.111 ± 367550320.494    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_e21d0ada  sample   18           8.998 ±        16.580  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_e21d0ada  sample   18   122109626.667 ± 223902086.350    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_e21d0ada  sample   18           8.098 ±         5.239  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_e21d0ada  sample   18   108134766.222 ±  71266025.283    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_e21d0ada  sample   18          59.000                  counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_e21d0ada  sample   18       10329.000                      ms
[success] Total time: 609 s, completed Feb 17, 2017 5:02:18 PM
[success] Total time: 0 s, completed Feb 17, 2017 5:02:18 PM

The following data confirms that this PR produces 20% less memory in old generation, as expected. This saves around 400ms in GC running time. Not a lot, but something.

`zincBenchmarks` was a 2.12.x project because it was using the
right-biased `Either`. This commit makes it compile with 2.11.8.
@jvican
Copy link
Member Author

jvican commented Feb 17, 2017

CI is green.

@eed3si9n eed3si9n merged commit c026d5e into sbt:1.0 Feb 17, 2017
@eed3si9n
Copy link
Member

Merged. Thanks for the contribution!

@smarter
Copy link
Contributor

smarter commented Feb 17, 2017

Note that cold Zinc is faster than warm Zinc because C1 has 2 threads instead of 1 (check CICompilerCount JVM option).

That's pretty concerning. Have you done run with different values of CICompilerCount to verify this hypothesis?

@jvican
Copy link
Member Author

jvican commented Feb 17, 2017

Yes, I verified that's true. @smarter I copied that line from here.

@retronym
Copy link
Member

I don't know how (or if) limiting CICompilerCount is in the benchmarks. As we get more experience with the numbers in scala/compiler-benchmark, we might have more to say...

@retronym
Copy link
Member

As we see, these changes don't affect running time as I previously claimed. I've made sure this time that all the setup is correct and predictable. The results that I gave before could have been a result of frequency variation, since the cpu governor was powersave and tlp was running.

Perhaps shapeless isn't a great example. Depending on which parts you're compiling, it might have proportionally more time spent in the typer phase than a typical project, so improvements to SBT's phases don't have as much impact. Having a mode in this benchmark tool to turn the SBT phases into no-ops (or remove them completely), would help give us a number that we could discuss easily (e.g "the overhead of SBT in a full build was reduced from 18% to 15%").

@retronym
Copy link
Member

Performance of SBT's phases is worst in deep hierarchies, as it calls .members as each level of the hierarchy. Scala collections are a poster child for these sort of deep hierarchies.

@retronym
Copy link
Member

Finally, I'm a little bit nervous about making structural changes to Zinc in this codebase, because we sacrifice the ability to run the new version through the community build and check for crashes.

Would it be feasible to take inspiration for 4a09b04 and make an sbt 0.13 plugin that augments the standard compile task to also recompile with Zinc 1.0?

jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
@jvican
Copy link
Member Author

jvican commented Feb 21, 2017

Performance of SBT's phases is worst in deep hierarchies, as it calls .members as each level of the hierarchy. Scala collections are a poster child for these sort of deep hierarchies.

From now on, I'll use the Scala standard libray for those changes. In fact, my upcoming PR fixing ASF has such information in the commit messages.

Would it be feasible to take inspiration for 4a09b04 and make an sbt 0.13 plugin that augments the standard compile task to also recompile with Zinc 1.0?

This is a marvelous idea. Thanks, I would very much like to do this so that people can benefit automatically from the improvements in latest Zinc.

@dwijnand
Copy link
Member

Belated many thanks @jvican for bootstrapping JMH in zinc! (I'm still catching up)

@jvican
Copy link
Member Author

jvican commented Feb 25, 2017

@retronym:

Would it be feasible to take inspiration for 4a09b04 and make an sbt 0.13 plugin that augments the standard compile task to also recompile with Zinc 1.0?

I've been thinking about how to do this, but it does not seem easy. Zinc 1.0 is compiled for 2.11+ and sbt runs in 2.10, so to enable this we'd need to have a client (sbt 0.13) and a server (Zinc 1.0) communicating via files/messages. The server would need to keep a cache of hot compilers too. Is this how you envisioned the sbt plugin when you proposed the idea or am I missing something?

@dwijnand
Copy link
Member

@jvican There's some more recent prior art of a better way from scalafmt: you call Zinc like a main, passing arguments to it, using classloader isolation (no forking). Great idea @retronym!

@jvican
Copy link
Member Author

jvican commented Feb 25, 2017

You're right, classloader isolation is the way to go. I forgot this is the way the Scala compiler is actually run by sbt. Thanks for the pointer @dwijnand!

jvican added a commit to scalacenter/zinc that referenced this pull request Feb 26, 2017
This commit introduces JMH benchmarks as requested by sbt#225.

The strategy for the JMH benchmarks is the following:

1. Clone repository to be compiled and checkout at concrete commit (for
   reproducibility).
2. Generate sbt task for every subproject we want to test. This task
   will tell us information about the compiler options: sources and
   classpath (scalac options are integrated in next commit).
3. Execute the sbt tasks and get the output.
4. Instantiate a compiler fully integrated with the compiler bridge that
   has all the parameters given by the sbt output.
5. Run and enjoy.

The JMH benchmarks run this for every iteration, for now. However, they
only measure the running time of the **whole** compiler pipeline. There
is no reliable way to do it only for our Zinc phases since they are
tightly coupled to the compiler, so this is the best we can do.

The JMH benchmarks are only 2.12 compatible. This commit introduces a
benchmark for Shapeless and shows how easy it is to extend for other
codebases. It also adds some tests to make sure that the required
scaffolding works.

For that, we have to modify the accessibility of some methods in our
infrastructure, particularly `ScalaCompilerForUnitTesting`.
jvican added a commit to scalacenter/zinc that referenced this pull request Feb 26, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
eed3si9n pushed a commit to eed3si9n/scala that referenced this pull request May 14, 2019
This commit introduces JMH benchmarks as requested by sbt/zinc#225.

The strategy for the JMH benchmarks is the following:

1. Clone repository to be compiled and checkout at concrete commit (for
   reproducibility).
2. Generate sbt task for every subproject we want to test. This task
   will tell us information about the compiler options: sources and
   classpath (scalac options are integrated in next commit).
3. Execute the sbt tasks and get the output.
4. Instantiate a compiler fully integrated with the compiler bridge that
   has all the parameters given by the sbt output.
5. Run and enjoy.

The JMH benchmarks run this for every iteration, for now. However, they
only measure the running time of the **whole** compiler pipeline. There
is no reliable way to do it only for our Zinc phases since they are
tightly coupled to the compiler, so this is the best we can do.

The JMH benchmarks are only 2.12 compatible. This commit introduces a
benchmark for Shapeless and shows how easy it is to extend for other
codebases. It also adds some tests to make sure that the required
scaffolding works.

For that, we have to modify the accessibility of some methods in our
infrastructure, particularly `ScalaCompilerForUnitTesting`.
eed3si9n pushed a commit to eed3si9n/scala that referenced this pull request May 14, 2019
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt/zinc#225
* sbt/zinc#216
* sbt/zinc#206
* sbt/zinc#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
lrytz pushed a commit to lrytz/scala that referenced this pull request Nov 5, 2019
This commit introduces JMH benchmarks as requested by sbt/zinc#225.

The strategy for the JMH benchmarks is the following:

1. Clone repository to be compiled and checkout at concrete commit (for
   reproducibility).
2. Generate sbt task for every subproject we want to test. This task
   will tell us information about the compiler options: sources and
   classpath (scalac options are integrated in next commit).
3. Execute the sbt tasks and get the output.
4. Instantiate a compiler fully integrated with the compiler bridge that
   has all the parameters given by the sbt output.
5. Run and enjoy.

The JMH benchmarks run this for every iteration, for now. However, they
only measure the running time of the **whole** compiler pipeline. There
is no reliable way to do it only for our Zinc phases since they are
tightly coupled to the compiler, so this is the best we can do.

The JMH benchmarks are only 2.12 compatible. This commit introduces a
benchmark for Shapeless and shows how easy it is to extend for other
codebases. It also adds some tests to make sure that the required
scaffolding works.

For that, we have to modify the accessibility of some methods in our
infrastructure, particularly `ScalaCompilerForUnitTesting`.

Rewritten from sbt/zinc@4a09b04
lrytz pushed a commit to lrytz/scala that referenced this pull request Nov 5, 2019
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt/zinc#225
* sbt/zinc#216
* sbt/zinc#206
* sbt/zinc#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.

Rewritten from sbt/zinc@5d46c1b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants