Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
281 lines (228 sloc) 14.8 KB
layout post-type by title
blog-detail
blog
Jorge Vicente Cantero
Speed up compile times with Zinc 1.0

Zinc is the incremental compiler for Scala. Most Scala developers use it daily without noticing -- it's embedded in key build tools like sbt, pants, CBT, IntelliJ, and Scala IDE.

The incremental compiler has one goal: to make your compilation times faster without sacrificing correctness. When you change a source file, Zinc analyses the dependencies of your code and compiles a subset of source files affected by your change, such that the generated code is identical to the output of a clean compile.

Scala Center's involvement in Zinc 1.0

Zinc 1.0 has been under development for several years in a GitHub repo independent from sbt. The Lightbend Tooling team split up Zinc to its own repo for the purpose of providing a common incremental compiler that could benefit all build tools, and developing it together.

A key feature of 1.0 is class-based dependency analysis, contributed by Grzegorz Kossakowski and commissioned by Lightbend. Class-based dependency analysis was designed to handle Scala dependencies in a much finer-grained scale than the previous one, pruning common cases of overcompilation. Back then, benchmarks showed that such analysis provided speedups of up to 40x.

However, these changes never managed to make it into a Zinc release despite being merged on March 2016. The project required more work before an experimental release: a Java-friendly polished API, improvements to the correctness of the incremental compilation, faster cold compilations, intensive benchmarking, bugfixing, and enhanced Java support, among others.

At the Scala Center, we thought it was a pity that Grzegorz's work was not in a production-ready release that Scala developers could benefit from. No one was actively working on that Zinc to-do list and progress was slow, so we decided to research how we could facilitate a stable 1.0 release.

That's why, in February 2017, the Center asked the Community whether we should invest our time into the development of Zinc. Seeing the wide participation in this survey, with 97% in favor of it, we became convinced it was a good idea. A stable release would put a faster incremental compiler into everyone's hands. That and other Scala Center efforts we embarked on would help speed up the release of sbt 1.0.

POLL Would you like the Scala Center to finish up @gkossakowski improvements in Scala's incremental compiler (~40x)? https://t.co/OxKccd5hCv

— Jorge (@jvican) 26 January 2017
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

A preview of Zinc 1.0: an example of a 7x speedup

How big are the improvements that Zinc 1.x brings to the table and how do they impact your day-to-day developer workflow?

To answer this question, we're going to use both Zinc 0.13.x and 1.x to compile a big real-world project: ScalaTest, whose core module has 40.377 lines of Scala code without counting comments and blank lines.

We illustrate our example with a common and relatively unintrusive operation in every codebase: a method addition. In our case, we add a new method to a widely dependent class like AndHaveWord (in Matcher.scala) after we've correctly warmed up the compiler (note that it's not fully hot) and compiled the project. How fast can Zinc 0.13.x recompile the new method?

scastie

The incremental compilation in Zinc 0.13.x yields a compilation time of 21 seconds. Let's try it with Zinc 1.0.0.

scastie

Zinc 1.0.0 terminates compilation in only 3 seconds. For this particular example, the class-based dependency in the new version does 7x better incremental compilation than the source-based old version.

Such experiments yield varying results depending on the Scala features and architecture used, but the general idea is that we can expect nice speedups in common day-to-day situations. For example, Scala projects using the cake pattern or intensive type-level programming are likely to benefit from these improvements. In my experience testing and using Zinc 1.0, I have observed speedups up to 22x.

Contributions to Zinc 1.0

Zinc 1.0 has been a long time in the making, so it has a long track of contributions. Following the changes previously committed by Grzegorz Kossakowski and Eugene Yokota, the Scala Center synchronized with the sbt team and started to work on the project.

Our contributions in the past month have been numerous and spanned several quarters at irregular intervals. In the following paragraphs, I provide a high-level overview of my work. If you're looking for the nitty-gritty details, I suggest you go through the list of issues and PR's labeled with 'Scala Center'; they all contain a detailed analysis and discussion of the proposed changes.

Statistics (as of August)

Highlighted contributions

With no previous experience with Zinc, the Scala Center started to hack on the most urgent to-do list: the missing tasks in the class-based dependency algorithm front. Finalizing these tasks was instrumental to measure how production-ready the algorithm was, and when a Zinc 1.0 release could happen.

As new issues started to be reported and more deficiencies to show up, I worked on other issues to stabilize the project. During this period I collaborated with the Pants team at Twitter and other users building upon Zinc. This collaboration revolved around guaranteeing that Zinc satisfied other people's use cases aside from sbt's, and that end users would get the best incremental compilation experience possible.

All in all, this is a high-level summary of our contributions to Zinc 1.0:

  • Improve correctness of the incremental algorithm in cases where it undercompiled.
  • Fixing bugs both in the Scala bridge and the Java incremental compilation.
  • Handle type-level programming more gracefully (for example, changes in type members).
  • Improve old Zinc APIs and create friendly APIs to complement missing public functionality.
  • Finish off the migration to Scala 2.12 and making Zinc ready for JDK8.
  • Add a protobuf-based analysis file with long term binary support.
  • Add features to Zinc that were missing and were considered important for Zinc's users (improvements to loggers and hooks for IDEs, for example).
  • Add JMH benchmarks to reason about the performance impact of incremental compiles.
  • Improve infrastructure to attract contributors:
    • Speed up CI by 7x to have a fast turnaround time,
    • Add more tests to the project,
    • Add more documentation to the APIs, a README and a CONTRIBUTING guide; and,
    • Add similar high-quality improvements to the build and scripted tests.

Find the list of PRs here.

External contributions

Zinc 1.0.0 also has had significant contributions from the Scala open-source community, and I thank everyone involved to make it possible! The contributors since our involvement in the project have been Stu Hood, Peiyu Wang, Krzysztof Romanowski, Wieslaw Popielarski, Peter Pan, Gregor Heine, Łukasz Indykiewicz, Allan Timothy Leong, Guillaume Martres, Wojciech Langiewicz, Krzysztof Borowski, Wojtek Pitula, knirski, Thierry Treyer, Grzegorz Kossakowski, Dale Wijnand and Eugene Yokota.

Martin Duhem and I are happy that we met some of these contributors during our Scala Center sprees all over Europe and that they dared to submit pull requests to Zinc; while contributing to the project may at first sound daunting, you didn't give up and at the end your work had a clear impact on Scala developers. Congratulations!

A look at the new algorithm

In our previous example, we showcase the impact that the new dependency detection algorithm has in real-world codebases. However, it may be surprising to some users that the previous version was so slow in the first place; 21 seconds to compile a method addition seems bad even for a codebase of so many lines of code and such complexity. Why was that the case?

Zinc 0.13.x has always had an underlying dependency detection operating at the source rather than the class level. In practice, this means that the invalidation algorithm collected more recompilation candidates than it should: all classes defined in a source file that were "affected" by a change would be checked for invalidation transitively. As it is common to define more than one class per source file, the number of source files to recompile grew up quickly and resulted in worse incremental compiles.

Let's show it with an example. Consider the following structure:

| Source file | Content | | =========== | ======= | | A.scala | class A0; class A1 | | B.scala | class B extends A1 | | C.scala | class C extends B |

In this scenario, class B depends on class A1, and class C depends on class B. Because these dependencies are introduced by inheritance, any API change to class A1 will trigger the recompilation of class B and class C. Changing class A0 should not invalidate any source file (no source files depend on it). However, because Zinc 0.13.x tracks dependencies at the source file level, any API change to class A0 triggers the recompilation of class B and class C.

In contrast, Zinc 1.x's dependency granularity is at the class level. With the new detection, the incremental compiler is smart enough to understand that nothing has to be recompiled in our previous example because it can tell apart class A0 from class A1. For that, it has a mapping of source files with their defined classes, class names with their generated class files and class names with their dependent class names. This information, complemented with other mappings, allows it to be much more selective when picking candidates for invalidation.

In general, Zinc 1.0 is faster and more correct than 0.13.x (thanks to some fundamental improvements to the Scala bridges implementation). This high-level explanation of the incremental compiler is in no way exhaustive; so if you'd like to know more about the inner workings of the Scala incremental compiler, subscribe to the following ticket. If there's enough interest, I'd like to write up a technical document for future contributors in the coming months to complement the already existing documentation.

Try it out!

Zinc is already available in sbt 1.0, and it will soon be available in Pants. If you're an sbt user, I recommend you to upgrade to latest sbt 1.0.3 and give it a spin!

If you're a library author, add Zinc 1.0.3 to your build.sbt with libraryDependencies += "org.scala-sbt" % "zinc_2.11" % "1.0.3". To add it to other builds tools like Maven, click here.

Special acknowledgements

I'd like to thank the sbt team, and concretely Eugene Yokota, for reviewing all our work and always being open for improvements. I'm happy to see such cross-team collaboration (sbt team at Lightbend <-> Scala Center) work in a complex real-world project like Zinc.

Lastly, I'd like to thank Grzegorz Kossakowski for his great work and leaving so much documentation behind, enabling others to finish off the project he started off. I send you my best wishes to recover from your hamstring injury.

Conclusion

In this announcement, I explained the work that the Scala Center has put into the already released Zinc 1.0 and motivated it.

While happy with the results, I still believe that there's more room for improvement, especially when it comes to faster incremental compiles and reproducibility. In the next month and a half, we will zoom in on these problems to continue improving the productivity of Scala developers in what it's known as SCP 15 (Scala Center Advisory Board Proposal).

I invite build tools to catch up with our changes and get in touch with the Zinc team to keep improving the project. I hope that in the following weeks, all build tools alike make Zinc 1.0 accessible to their users.

If you'd like to ask questions about Zinc, get involved in its development or comment on our progress, ask me (@jvican) in our Gitter channel.