Make incremental compilation artifacts reusable (aka cached compilation) #216

romanowski · 2017-01-31T20:33:23Z

Goal is achieved by providing custom mappers to TextAnalysisFormat.

Based on client can create CacheProviders that using given mappers can migrate incremental compilation artifacts between workspaces.

This is can be done in two steps:

export cache to intermediate format (e.g. map path as a relative to abstract roots)
import cache from intermediate format (e.g. replace abstract roots with concrete values).

Zinc is not responsible for maintaining caches (e.g. there is no verification if correct cache provider is used) or searching caches for build, this is client responsibility (zinc only provides utilities).

This PR also include two basic cache implementation based on making all paths relative to project root. They are used to test whole mechanism internally and are base for any custom caches.

project rebased cache (copy artifacts and classes from different project)
exportable cache (first export cache as pair or zipped classes and incremental compilation metadata) that can be later imported to another workspace

Internally, we are just about to beta tests this with huge scala codebase (way over 100K lines of scala code). Results so far are promising:

currently we can observe ~ 4-5 times improvement on full build (when relatively close to cache)
most of time is spent on extracting zips with classfiles and with few fixes we should get speedups around x10 times.
Of course workspace after cache import behaves as it was compiled locally.

Our integration is not open source but in future (hopefully close) we will release at least part of it to public domain.

I am working on open source plugins for sbt (both 0.13.x and 1.0.x versions) based on that PR (deep WIP so far): https://github.com/romanowski/hoarder

For certain reasons this time I have to paste the below disclaimer:
THIS PROGRAM IS SUBJECT TO THE TERMS OF THE BSD 3-CLAUSE LICENSE.

THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE:
THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR
ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.
ONLY THE SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE, IF ANY, THAT ARE ATTACHED TO (OR OTHERWISE ACCOMPANY) THIS SUBMISSION (AND ORDINARY
COURSE CONTRIBUTIONS OF FUTURES PATCHES THERETO) ARE TO BE CONSIDERED A CONTRIBUTION. NO OTHER SOFTWARE CODE OR MATERIALS ARE A CONTRIBUTION.

jvican

Hello @romanowski, great job!

I've left some minor comments/questions. Hope they are useful. Otherwise, LGTM.

jvican · 2017-02-02T19:36:04Z

.gitignore

@@ -1 +1,3 @@
 target/
+.idea


Are these additions intended? IMO, these ignores should be local. See how to implement them globally in your computer in https://help.github.com/articles/ignoring-files/.

This is an option but most project I work with has .idea added directly in in local gitignore (in this dir intellij stores whole project config).

I can remove .idea if needed.

zinc/src/test/resources/bin stores generated jars/classesdirs

zinc/src/test/resources/bin is fine and correct to ignore, imo. .idea not.

If zinc/src/test/resources/bin is added, then it's because it has generated jars/classesdirs. But I don't see any of those jars in git history. If they are present, they should also be removed. Otherwise, I don't see why we should special case this directory, it hasn't generated anything for me at least.

Ok, I will remove .idea then

jvican · 2017-02-02T19:42:26Z

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalNameHashing.scala

@@ -32,8 +32,10 @@ private final class IncrementalNameHashing(log: sbt.util.Logger, options: IncOpt
      val aNameHashes = a.nameHashes
      val bNameHashes = b.nameHashes
      val modifiedNames = ModifiedNames.compareTwoNameHashes(aNameHashes, bNameHashes)
-      val apiChange = NamesChange(className, modifiedNames)
-      Some(apiChange)
+      if (modifiedNames.regularNames.nonEmpty || modifiedNames.implicitNames.nonEmpty) {


I think I'm missing context here. Why is this check required?

I've encountered situations where for some reason class got different API with no names changed (on 'normal' incremental compilation - not 'cached' ones).

Debugging problems with API hashes is really hard so I gave up looking why class has changed API without any names change and just introduce this check.

But this check is very weird. You're checking against something that NamesChange was already checking. This is how NamesChange is currently defined:

final case class NamesChange(modified0: String, modifiedNames: ModifiedNames) extends APIChange(modified0) { assert( modifiedNames.regularNames.nonEmpty || modifiedNames.implicitNames.nonEmpty, s"Modified names for $modified0 is empty" ) }

What you're doing here is no different than what we were doing before. We never return None for the same reason that we never got an AssertionError before.

Good catch - I recall removing that assertion probably it was lost in rebasing process.

I will remove that new check and if I face another problem with it I will create new PR.

Great @romanowski!

jvican · 2017-02-02T19:43:04Z

internal/zinc-core/src/test/scala/sbt/inc/TestAnalysisCallback.scala

@@ -58,40 +58,7 @@ class TestAnalysisCallback(
  def hashFile(f: File): Array[Byte] = Stamp.hash(f).asInstanceOf[Hash].value

  def get: TestAnalysis = {
-


Why is all of this gone? Are we not interested in testing this?

Tests for that utils are commented out: https://github.com/sbt/zinc/blob/1.0/internal/zinc-core/src/test/scala/sbt/inc/IncrementalTest.scala

jvican · 2017-02-02T19:57:59Z

internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala

@@ -162,13 +162,13 @@ final class Dependency(val global: CallbackGlobal) extends LocateClassFile with
        case _ => newOne()
      }
    }
-    private def addClassDependency(deps: HashSet[ClassDependency], fromClass: Symbol, dep: Symbol): Unit = {
+    private def addClassDependency(deps: HashSet[ClassDependency], fromClass: Symbol, dep: Symbol): Unit = if (dep != NoSymbol) {


We should also guard against null, just to be safer since scalac can unexpectedly give us a null from time to time. Perhaps it's a good idea to create a utility called ignoreSymbol that will guard against NoSymbol and null at the same time? It's a very recurrent logic in Zinc.

This is not a new logic - I just noticed that we keep empty name for multiple classes so I just remove it. I never see any nulls here (call sites of addClassDependency taking care of it)

Perfect, even better. But I think we should guard against all of this in the whole Zinc. I'll do that in a PR.

jvican · 2017-02-02T20:00:34Z

internal/zinc-persist/src/main/scala/sbt/internal/inc/RelationsTextFormat.scala

+  private def stringsDescriptor(header: String, rels: Relations => Relation[String, String]) =
+    Descriptor(header, rels, Mapper.forString, Mapper.forString)
+
+  private val allRelations: List[Descriptor[_, _]] = {


This should probably be well documented and linked with the original source Relations since any change there may affect behaviour here. I would do the same for the rest of the methods in this class. In fact, it's a little bit unfortunate that this is generated manually and not automatically.

Agree on documentation part but IMO first we need to clean up whole logic around relation construction. Code (from below):

relations match { case p :: bin :: lcn :: mri :: mre :: ii :: ie :: lii :: lie :: cn :: un :: bcn :: Nil =>

Is IMO one of the most hacky thing in whole zinc codebase and at some point it needs to be cleanup and refactored.
I tried to do so in this PR but after some time I look at diff and I was way to big for single PR (and I needed to fit cached compilation related changes).

I agree, and also it's better to keep PRs short, otherwise it's very difficult to reason about the changes and we should be a little bit conservative and test everything carefully before a merge 😄.

I'll take care of cleaning up the relations logic and maybe then we can document this code.

eed3si9n · 2017-02-02T20:29:08Z

I'm out in Canada this week for Lightbend meetings, but I'll take closer look at this when I get back.

jvican · 2017-02-03T14:12:07Z

Have you tested these changes thoroughly @romanowski ? I wonder because the diffs are a little bit scary, and even though you've added several tests I'd like to make sure that real-world behaviour is stable.

romanowski · 2017-02-03T14:29:44Z

Do you know any repository that uses sbt 1.0? I've created mine based on shapless (https://github.com/romanowski/shapeless).

I tested this a lot with my custom (not open source yet) integration and so far all works fine (including cached compilation based on this PR)

jvican · 2017-02-03T14:37:33Z

No, I do not, but since we'd like to stabilize Zinc for a beta release, we better start thinking about this stuff as soon as possible 😄. I didn't find the commit that sets up 1.0 in that shapeless fork, maybe you can describe how to try 1.0 out?

It's good that this PR has been tested in such an intense codebase as Shapeless. So thanks for that!

romanowski · 2017-02-03T14:42:43Z

I put it wrong: I have shapless repo but I didn't test new zinc there (I tested zinc on non-open source codebase).
Of course now I will (rather tomorrow).

romanowski · 2017-02-03T14:46:29Z

Here is 1.0 based branch https://github.com/romanowski/shapeless/tree/profiling/problem
Beware! By default it uses sbt 1.0.0-M4 that has a problem with extracting names (it takes ages for shapless) but it is fixed in 1.0 branch.

romanowski · 2017-02-05T10:21:18Z

I did basic tests with shapeless repo and except #220 that is IMO not related to this PR zinc worked fine.

jvican · 2017-02-06T10:30:05Z

I'm happy to see this merged @romanowski. I agree that #220 is not strictly related to this, though in the same lines. Could you have a look at it? I'll comment on the strategy in the ticket.

romanowski · 2017-02-06T10:39:17Z

I fear I won't be able to look at #220 soon (maybe end of Feb). If I have enough time I would create PR instead of issue :)

Fix scala checks for zinc-persist and add test for mapped Analysis.

Create cache aware store that can load Analysis from given cache provider. Implement simplest (based on project 'rebase') cache. Create intergration test for mechanism above.

Exportable cache can be exported as pair: zipped classfiles, zipped analysis. Exportable cache exports paths as relative paths to project location.

…lassfiles. Also fixed failing tests.

Add cache verifier. Add classpath mappers. Add mapper for whole MiniSetup after setup os loaded. Fixes small problems with dependencies phase (e.g. reduce numbers of NoSymbol checked) and do treat refinement class as top-level class (since it does not have runtime representation.

romanowski · 2017-02-08T09:06:41Z

Branch is rebased and headers are added for new files.

romanowski · 2017-02-09T20:52:58Z

@eed3si9n is there anything I need to change/explain? Since zinc repo become much more alive now I don't want to rebase this (quite big) PR multiple times :)

eed3si9n · 2017-02-09T20:55:01Z

I'm wondering if we need this change if we implement #218 instead. The purpose of transformation is to make it machine-independent, correct?

romanowski · 2017-02-11T23:16:33Z

@eed3si9n

The purpose of transformation is to make it machine-independent, correct?
This is only one option. Generally with this PR clients can easy modify analysis and make it not only machine independent but also system or even build tool.

#218 can be based on this PR. As commented in #218 I don't think we can make zinc analysis 100% machine independent without full knowledge from build tool. E.g. how you can handle sbt's unmannaged jars or sources?

Naive implementation of #218 is also part of this PR ( ProjectRebasedCache ).

Is there anyone working on #218? Is there any plan how to implement it?

Moreover general mechanics of zinc generally is not changed (except change Comlilation to compilationTimestamp) and I can remove all optional logic (so this PR will consist only around TextFormat),

eed3si9n · 2017-02-13T19:45:47Z

Is there anyone working on #218? Is there any plan how to implement it?

I can certainly work on #218. Not 100% sure if it can be done, but my idea was to essentially strip the information until it is machine-independent and have some sort of a tag that can be substituted per machine.

romanowski · 2017-02-13T20:15:16Z

Did you know that even option to scalac can be machine depended? E.g. path to compiler plugin or parameters passed to it. Same goes for almost every entry kind in Analysis so I am interested how do you want to detect and tag all those cases?

I think using my Mappers is a good start for #218. Since you can implement it as specific mapper can merge this PR and stat another one based on my work? Or merge this one and if we can (somehow) implement #218 inside zinc then we can remove all not-used logic?

eed3si9n · 2017-02-13T21:04:52Z

ok. Let's merge this first then.

gabro · 2017-02-14T09:19:09Z

@romanowski thanks for this! Just to understand, will this be available in sbt directly, or should we wait for hoarder to be ready?

romanowski · 2017-02-14T17:23:46Z

@gabro this will be part of sbt 1.0 and AFAIK there is almost no project using it.

Hoarder will use this PR for 1.0 branch and will implement similar solution for 0.13 (I am working on that).

gabro · 2017-02-14T17:38:15Z

Awesome, thanks

stuhood · 2017-02-15T18:36:07Z

In the case of scalac parameters + plugins, etc: those must be part of the cache key. Therefore, there is no need to strip them from the file. It's only stuff that has no effect on the outcome of the build that should be stripped. On Feb 13, 2017 12:15 PM, "Krzysztof Romanowski" <notifications@github.com> wrote: Did you know that even option to scalac can be machine depended? E.g. path to compiler plugin or parameters passed to it. Same goes for almost every entry kind in Analysis so I am interested how do you want to detect and tag all those cases? I think using my Mappers is a good start for #218 <#218>. Since you can implement it as specific mapper can merge this PR and stat another one based on my work? Or merge this one and if we can (somehow) implement #218 <#218> inside zinc then we can remove all not-used logic? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#216 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC2lP86XaHdCFnM-UcTLKF_Z0PDME9Uks5rcLnVgaJpZM4LzI77> .

jvican · 2017-02-16T12:11:59Z

internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala

      assert(fromClass.isClass, Feedback.expectedClassSymbol(fromClass))
      val depClass = enclOrModuleClass(dep)
-      if (fromClass.associatedFile != depClass.associatedFile) {
+      if (fromClass.associatedFile != depClass.associatedFile && !depClass.isRefinementClass) {


@romanowski Why are you doing this? This looks wrong.

@jvican Refinement classes do not have byte code representation and it fails later on (we still depends on parents of refinement)

Right, in that case this is correct. I double checked with SLS.

This commit introduces changes to Scala 2.10 sources from the following PRs: * sbt#225 * sbt#216 * sbt#206 * sbt#221 It also removes a stub for 2.8 compatibility in `DelegatingReporter`. Support for Scala 2.8 compatibility is not already maintained it.

jvican · 2017-02-21T19:40:26Z

@romanowski Is there something we can do to follow up on @stuhood's remark? This is related pantsbuild/pants#3923.

romanowski · 2017-02-21T21:23:22Z

I mentioned scalac plugins and I don't see how this is related to pant's plugin written in Java (but I don't know pants so I might miss something).

I generally agree with @stuhood's but we've got a case where compiler plugin is a part of same project so it should use relative path in such case.

Generally as said before this PR open multiple possibilities such as cached compilation or #218

This commit introduces changes to Scala 2.10 sources from the following PRs: * sbt#225 * sbt#216 * sbt#206 * sbt#221 It also removes a stub for 2.8 compatibility in `DelegatingReporter`. Support for Scala 2.8 compatibility is not already maintained it.

…dge-cases sbt#215 Target Date edge cases

This commit introduces changes to Scala 2.10 sources from the following PRs: * sbt/zinc#225 * sbt/zinc#216 * sbt/zinc#206 * sbt/zinc#221 It also removes a stub for 2.8 compatibility in `DelegatingReporter`. Support for Scala 2.8 compatibility is not already maintained it.

This commit introduces changes to Scala 2.10 sources from the following PRs: * sbt/zinc#225 * sbt/zinc#216 * sbt/zinc#206 * sbt/zinc#221 It also removes a stub for 2.8 compatibility in `DelegatingReporter`. Support for Scala 2.8 compatibility is not already maintained it. Rewritten from sbt/zinc@5d46c1b

eed3si9n added the waiting for review label Jan 31, 2017

jvican approved these changes Feb 2, 2017

View reviewed changes

romanowski force-pushed the cached-compilaiton branch from c5823a1 to 1d1e752 Compare February 3, 2017 11:45

eed3si9n mentioned this pull request Feb 4, 2017

Current status of Zinc, or what are the "must have's" before stable release #212

Closed

3 tasks

This was referenced Feb 4, 2017

Make Analysis content machine-independent #218

Closed

EOF after incremental compilation in sbt #220

Closed

romanowski and others added 10 commits February 8, 2017 09:40

Analisis text format might now map given values.

849e402

Fix scala checks for zinc-persist and add test for mapped Analysis.

Analyzed class now store only compilation time

4cc0d28

Cached compilation basics.

fbed61b

Create cache aware store that can load Analysis from given cache provider. Implement simplest (based on project 'rebase') cache. Create intergration test for mechanism above.

Add exportable cache.

afc3fb2

Exportable cache can be exported as pair: zipped classfiles, zipped analysis. Exportable cache exports paths as relative paths to project location.

Do not log messages about empty changes and better failure messages.

fa4277c

Exportable cache now maps modification dates in analysis for cached c…

b5f7452

…lassfiles. Also fixed failing tests.

Fix classes dir

7a4d034

Add javac and scalac options mapper

9c52239

Add missing headers.

4b4e5b2

romanowski force-pushed the cached-compilaiton branch from 1d1e752 to 4b4e5b2 Compare February 8, 2017 09:05

eed3si9n merged commit b5c2817 into sbt:1.0 Feb 13, 2017

eed3si9n removed the waiting for review label Feb 13, 2017

jvican reviewed Feb 16, 2017

View reviewed changes

jvican mentioned this pull request Feb 21, 2017

Backport changes from several previous PRs #232

Merged

cunei pushed a commit to cunei/zinc that referenced this pull request Oct 25, 2017

Merge pull request sbt#216 from lloydmeta/feature/calendar-gen-with-e…

893c84b

…dge-cases sbt#215 Target Date edge cases

		@@ -58,40 +58,7 @@ class TestAnalysisCallback(
		def hashFile(f: File): Array[Byte] = Stamp.hash(f).asInstanceOf[Hash].value

		def get: TestAnalysis = {

Make incremental compilation artifacts reusable (aka cached compilation) #216

Make incremental compilation artifacts reusable (aka cached compilation) #216

Conversation

romanowski commented Jan 31, 2017

jvican left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanowski Feb 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican Feb 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eed3si9n commented Feb 2, 2017

jvican commented Feb 3, 2017

romanowski commented Feb 3, 2017

jvican commented Feb 3, 2017

romanowski commented Feb 3, 2017

romanowski commented Feb 3, 2017

romanowski commented Feb 5, 2017

jvican commented Feb 6, 2017

romanowski commented Feb 6, 2017

romanowski commented Feb 8, 2017

romanowski commented Feb 9, 2017

eed3si9n commented Feb 9, 2017

romanowski commented Feb 11, 2017 • edited Loading

eed3si9n commented Feb 13, 2017

romanowski commented Feb 13, 2017

eed3si9n commented Feb 13, 2017

gabro commented Feb 14, 2017

romanowski commented Feb 14, 2017

gabro commented Feb 14, 2017

stuhood commented Feb 15, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican commented Feb 21, 2017

romanowski commented Feb 21, 2017

romanowski Feb 2, 2017 •

edited

Loading

jvican Feb 2, 2017 •

edited

Loading

romanowski commented Feb 11, 2017 •

edited

Loading