Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make incremental compilation artifacts reusable (aka cached compilation) #216

Merged
merged 10 commits into from
Feb 13, 2017

Conversation

romanowski
Copy link
Contributor

Goal is achieved by providing custom mappers to TextAnalysisFormat.

Based on client can create CacheProviders that using given mappers can migrate incremental compilation artifacts between workspaces.

This is can be done in two steps:

  • export cache to intermediate format (e.g. map path as a relative to abstract roots)
  • import cache from intermediate format (e.g. replace abstract roots with concrete values).

Zinc is not responsible for maintaining caches (e.g. there is no verification if correct cache provider is used) or searching caches for build, this is client responsibility (zinc only provides utilities).

This PR also include two basic cache implementation based on making all paths relative to project root. They are used to test whole mechanism internally and are base for any custom caches.

  • project rebased cache (copy artifacts and classes from different project)
  • exportable cache (first export cache as pair or zipped classes and incremental compilation metadata) that can be later imported to another workspace

Internally, we are just about to beta tests this with huge scala codebase (way over 100K lines of scala code). Results so far are promising:

  • currently we can observe ~ 4-5 times improvement on full build (when relatively close to cache)
  • most of time is spent on extracting zips with classfiles and with few fixes we should get speedups around x10 times.
    Of course workspace after cache import behaves as it was compiled locally.

Our integration is not open source but in future (hopefully close) we will release at least part of it to public domain.

I am working on open source plugins for sbt (both 0.13.x and 1.0.x versions) based on that PR (deep WIP so far): https://github.com/romanowski/hoarder

For certain reasons this time I have to paste the below disclaimer:
THIS PROGRAM IS SUBJECT TO THE TERMS OF THE BSD 3-CLAUSE LICENSE.

THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE:
THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR
ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.
ONLY THE SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE, IF ANY, THAT ARE ATTACHED TO (OR OTHERWISE ACCOMPANY) THIS SUBMISSION (AND ORDINARY
COURSE CONTRIBUTIONS OF FUTURES PATCHES THERETO) ARE TO BE CONSIDERED A CONTRIBUTION. NO OTHER SOFTWARE CODE OR MATERIALS ARE A CONTRIBUTION.

Copy link
Member

@jvican jvican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @romanowski, great job!

I've left some minor comments/questions. Hope they are useful. Otherwise, LGTM.

.gitignore Outdated
@@ -1 +1,3 @@
target/
.idea
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these additions intended? IMO, these ignores should be local. See how to implement them globally in your computer in https://help.github.com/articles/ignoring-files/.

Copy link
Contributor Author

@romanowski romanowski Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an option but most project I work with has .idea added directly in in local gitignore (in this dir intellij stores whole project config).

I can remove .idea if needed.

zinc/src/test/resources/bin stores generated jars/classesdirs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zinc/src/test/resources/bin is fine and correct to ignore, imo. .idea not.

Copy link
Member

@jvican jvican Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zinc/src/test/resources/bin is added, then it's because it has generated jars/classesdirs. But I don't see any of those jars in git history. If they are present, they should also be removed. Otherwise, I don't see why we should special case this directory, it hasn't generated anything for me at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove .idea then

@@ -32,8 +32,10 @@ private final class IncrementalNameHashing(log: sbt.util.Logger, options: IncOpt
val aNameHashes = a.nameHashes
val bNameHashes = b.nameHashes
val modifiedNames = ModifiedNames.compareTwoNameHashes(aNameHashes, bNameHashes)
val apiChange = NamesChange(className, modifiedNames)
Some(apiChange)
if (modifiedNames.regularNames.nonEmpty || modifiedNames.implicitNames.nonEmpty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing context here. Why is this check required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encountered situations where for some reason class got different API with no names changed (on 'normal' incremental compilation - not 'cached' ones).

Debugging problems with API hashes is really hard so I gave up looking why class has changed API without any names change and just introduce this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this check is very weird. You're checking against something that NamesChange was already checking. This is how NamesChange is currently defined:

final case class NamesChange(modified0: String, modifiedNames: ModifiedNames) extends APIChange(modified0) {
  assert(
    modifiedNames.regularNames.nonEmpty || modifiedNames.implicitNames.nonEmpty,
    s"Modified names for $modified0 is empty"
  )
}

What you're doing here is no different than what we were doing before. We never return None for the same reason that we never got an AssertionError before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I recall removing that assertion probably it was lost in rebasing process.

I will remove that new check and if I face another problem with it I will create new PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @romanowski!

@@ -58,40 +58,7 @@ class TestAnalysisCallback(
def hashFile(f: File): Array[Byte] = Stamp.hash(f).asInstanceOf[Hash].value

def get: TestAnalysis = {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is all of this gone? Are we not interested in testing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -162,13 +162,13 @@ final class Dependency(val global: CallbackGlobal) extends LocateClassFile with
case _ => newOne()
}
}
private def addClassDependency(deps: HashSet[ClassDependency], fromClass: Symbol, dep: Symbol): Unit = {
private def addClassDependency(deps: HashSet[ClassDependency], fromClass: Symbol, dep: Symbol): Unit = if (dep != NoSymbol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also guard against null, just to be safer since scalac can unexpectedly give us a null from time to time. Perhaps it's a good idea to create a utility called ignoreSymbol that will guard against NoSymbol and null at the same time? It's a very recurrent logic in Zinc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new logic - I just noticed that we keep empty name for multiple classes so I just remove it. I never see any nulls here (call sites of addClassDependency taking care of it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, even better. But I think we should guard against all of this in the whole Zinc. I'll do that in a PR.

private def stringsDescriptor(header: String, rels: Relations => Relation[String, String]) =
Descriptor(header, rels, Mapper.forString, Mapper.forString)

private val allRelations: List[Descriptor[_, _]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be well documented and linked with the original source Relations since any change there may affect behaviour here. I would do the same for the rest of the methods in this class. In fact, it's a little bit unfortunate that this is generated manually and not automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on documentation part but IMO first we need to clean up whole logic around relation construction. Code (from below):

relations match {
      case p :: bin :: lcn :: mri :: mre :: ii :: ie :: lii :: lie :: cn :: un :: bcn :: Nil =>

Is IMO one of the most hacky thing in whole zinc codebase and at some point it needs to be cleanup and refactored.
I tried to do so in this PR but after some time I look at diff and I was way to big for single PR (and I needed to fit cached compilation related changes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and also it's better to keep PRs short, otherwise it's very difficult to reason about the changes and we should be a little bit conservative and test everything carefully before a merge 😄.

I'll take care of cleaning up the relations logic and maybe then we can document this code.

@eed3si9n
Copy link
Member

eed3si9n commented Feb 2, 2017

I'm out in Canada this week for Lightbend meetings, but I'll take closer look at this when I get back.

@jvican
Copy link
Member

jvican commented Feb 3, 2017

Have you tested these changes thoroughly @romanowski ? I wonder because the diffs are a little bit scary, and even though you've added several tests I'd like to make sure that real-world behaviour is stable.

@romanowski
Copy link
Contributor Author

Do you know any repository that uses sbt 1.0? I've created mine based on shapless (https://github.com/romanowski/shapeless).

I tested this a lot with my custom (not open source yet) integration and so far all works fine (including cached compilation based on this PR)

@jvican
Copy link
Member

jvican commented Feb 3, 2017

No, I do not, but since we'd like to stabilize Zinc for a beta release, we better start thinking about this stuff as soon as possible 😄. I didn't find the commit that sets up 1.0 in that shapeless fork, maybe you can describe how to try 1.0 out?

It's good that this PR has been tested in such an intense codebase as Shapeless. So thanks for that!

@romanowski
Copy link
Contributor Author

I put it wrong: I have shapless repo but I didn't test new zinc there (I tested zinc on non-open source codebase).
Of course now I will (rather tomorrow).

@romanowski
Copy link
Contributor Author

Here is 1.0 based branch https://github.com/romanowski/shapeless/tree/profiling/problem
Beware! By default it uses sbt 1.0.0-M4 that has a problem with extracting names (it takes ages for shapless) but it is fixed in 1.0 branch.

@romanowski
Copy link
Contributor Author

I did basic tests with shapeless repo and except #220 that is IMO not related to this PR zinc worked fine.

@jvican
Copy link
Member

jvican commented Feb 6, 2017

I'm happy to see this merged @romanowski. I agree that #220 is not strictly related to this, though in the same lines. Could you have a look at it? I'll comment on the strategy in the ticket.

@romanowski
Copy link
Contributor Author

I fear I won't be able to look at #220 soon (maybe end of Feb). If I have enough time I would create PR instead of issue :)

romanowski and others added 10 commits February 8, 2017 09:40
Fix scala checks for zinc-persist and add test for mapped Analysis.
Create cache aware store that can load Analysis from given cache provider.

Implement simplest (based on project 'rebase') cache.
Create intergration test for mechanism above.
Exportable cache can be exported as pair: zipped classfiles, zipped analysis.
Exportable cache exports paths as relative paths to project location.
Add cache verifier. Add classpath mappers. Add mapper for whole MiniSetup after setup os loaded.
Fixes small problems with dependencies phase (e.g. reduce numbers of NoSymbol checked)  and do treat refinement class as top-level class (since it does not have runtime representation.
@romanowski
Copy link
Contributor Author

Branch is rebased and headers are added for new files.

@romanowski
Copy link
Contributor Author

@eed3si9n is there anything I need to change/explain? Since zinc repo become much more alive now I don't want to rebase this (quite big) PR multiple times :)

@eed3si9n
Copy link
Member

eed3si9n commented Feb 9, 2017

I'm wondering if we need this change if we implement #218 instead. The purpose of transformation is to make it machine-independent, correct?

@romanowski
Copy link
Contributor Author

romanowski commented Feb 11, 2017

@eed3si9n

The purpose of transformation is to make it machine-independent, correct?
This is only one option. Generally with this PR clients can easy modify analysis and make it not only machine independent but also system or even build tool.

#218 can be based on this PR. As commented in #218 I don't think we can make zinc analysis 100% machine independent without full knowledge from build tool. E.g. how you can handle sbt's unmannaged jars or sources?

Naive implementation of #218 is also part of this PR ( ProjectRebasedCache ).

Is there anyone working on #218? Is there any plan how to implement it?

Moreover general mechanics of zinc generally is not changed (except change Comlilation to compilationTimestamp) and I can remove all optional logic (so this PR will consist only around TextFormat),

@eed3si9n
Copy link
Member

Is there anyone working on #218? Is there any plan how to implement it?

I can certainly work on #218. Not 100% sure if it can be done, but my idea was to essentially strip the information until it is machine-independent and have some sort of a tag that can be substituted per machine.

@romanowski
Copy link
Contributor Author

Did you know that even option to scalac can be machine depended? E.g. path to compiler plugin or parameters passed to it. Same goes for almost every entry kind in Analysis so I am interested how do you want to detect and tag all those cases?

I think using my Mappers is a good start for #218. Since you can implement it as specific mapper can merge this PR and stat another one based on my work? Or merge this one and if we can (somehow) implement #218 inside zinc then we can remove all not-used logic?

@eed3si9n
Copy link
Member

ok. Let's merge this first then.

@eed3si9n eed3si9n merged commit b5c2817 into sbt:1.0 Feb 13, 2017
@gabro
Copy link

gabro commented Feb 14, 2017

@romanowski thanks for this! Just to understand, will this be available in sbt directly, or should we wait for hoarder to be ready?

@romanowski
Copy link
Contributor Author

@gabro this will be part of sbt 1.0 and AFAIK there is almost no project using it.

Hoarder will use this PR for 1.0 branch and will implement similar solution for 0.13 (I am working on that).

@gabro
Copy link

gabro commented Feb 14, 2017

Awesome, thanks

@stuhood
Copy link
Contributor

stuhood commented Feb 15, 2017 via email

assert(fromClass.isClass, Feedback.expectedClassSymbol(fromClass))
val depClass = enclOrModuleClass(dep)
if (fromClass.associatedFile != depClass.associatedFile) {
if (fromClass.associatedFile != depClass.associatedFile && !depClass.isRefinementClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanowski Why are you doing this? This looks wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvican Refinement classes do not have byte code representation and it fails later on (we still depends on parents of refinement)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in that case this is correct. I double checked with SLS.

jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
jvican added a commit to scalacenter/zinc that referenced this pull request Feb 21, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
@jvican
Copy link
Member

jvican commented Feb 21, 2017

@romanowski Is there something we can do to follow up on @stuhood's remark? This is related pantsbuild/pants#3923.

@romanowski
Copy link
Contributor Author

I mentioned scalac plugins and I don't see how this is related to pant's plugin written in Java (but I don't know pants so I might miss something).

I generally agree with @stuhood's but we've got a case where compiler plugin is a part of same project so it should use relative path in such case.

Generally as said before this PR open multiple possibilities such as cached compilation or #218

jvican added a commit to scalacenter/zinc that referenced this pull request Feb 26, 2017
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt#225
* sbt#216
* sbt#206
* sbt#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
cunei pushed a commit to cunei/zinc that referenced this pull request Oct 25, 2017
eed3si9n pushed a commit to eed3si9n/scala that referenced this pull request May 14, 2019
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt/zinc#225
* sbt/zinc#216
* sbt/zinc#206
* sbt/zinc#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.
lrytz pushed a commit to lrytz/scala that referenced this pull request Nov 5, 2019
This commit introduces changes to Scala 2.10 sources from the following
PRs:

* sbt/zinc#225
* sbt/zinc#216
* sbt/zinc#206
* sbt/zinc#221

It also removes a stub for 2.8 compatibility in `DelegatingReporter`.
Support for Scala 2.8 compatibility is not already maintained it.

Rewritten from sbt/zinc@5d46c1b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants