Class-based dependency tracking in name hashing algorithm #1104

Closed
gkossakowski opened this Issue Feb 6, 2014 · 48 comments

Comments

Projects
None yet
@gkossakowski
Contributor

gkossakowski commented Feb 6, 2014

This is a meta-issue that provides quick overview of what needs to be done in order to change dependency tracking from source file-based to class name-based.

Tasks

Class-based dependency tracking can be implemented in two phases. The third phase is dedicated to testing on large projects.

Invalidate by inheritance based on class dependencies (phase 1)

Tasks marked as completed are completed only in a prototype implementation. No changes has been merged to sbt yet.

  • add tracking of declared classes in source files; we want to invalidate classes but at the end we need className -> srcFile relation because we can recompile source files only
  • add dependency tracking of top-level (not owned by any other class) imports; we'll assign dependencies introduced by those imports to the first top level class/object/trait declared in the source file (see notes below for details)
  • track dependencies at class level (and use declaredClasses to immediately map them back to files until the rest of the algorithm is implemented)
  • add tracking of children of sealed parents; we should track this in API object corresponding to sealed parent; this will enable us to perform proper invalidation when a new case (class) introduced to a sealed hierarchy
  • add per (top level) class tracking of api hashes; tracking of api hashes per each (even nested) class separately will be done in phase 2
  • handle dependencies coming from local and anonymous classes (see discussion below)
  • switch invalidation of dependencies by inheritance to invalidate at class level instead of at source level (would fix #2320)
  • distinguish between source and binary class names. Introduce a new relation binaryClassName: Relation[String, String] and use it whenever the conversion is needed.

Track name hashes per class (phase 2)

  • refactor tracking of APIs to be done at class level instead of at source level
  • extract APIs of all classes (including inner ones) separately
  • track name hashes at class level instead of being at source level (see also: #2319)
  • implement the rest of invalidation logic (member reference, name hashing) based on class names (get rid of mapping through declaredClasses in almost entire algorithm) (would fix #2320)

Only once the last bullet is merged we'll see improvement to incremental compilation when big number of nested classes is involved (e.g. like in Scala compiler's case).

Testing, bug fixing and benchmarking (phase 3)

  • test with the Scalatest repo
  • test with the scala repo using the sbt build
  • test with the specs2 repo
  • fix handling of anonymous and local classes defined in Java (similar logic will have to be implemented as for handling local Scala classes) (sbt/zinc#192)
  • index classes only declared (not inherited) in source files (sbt/zinc#174)
  • benchmark ;clean;compile performance
  • simplify classpath and Analysis instance lookup in incremental compiler (I think number of classpath lookups can be reduced now)

Merge changes upstrem, prepare for shipping (phase 4)

  • determine the location where this work will be merged into

Targeted sbt version

This most likely is going to be shipped with sbt 1.0-M1.

Benefits

The improved dependency tracking delivers up to 40x speedups of incremental compilation in scenarios tested. Check benchmarking results here: #1104 (comment)

The speedups are caused by fixing two main issues:

  1. The #2319 would be fixed once name hashes are tracked per class. This way introduction of a new class and members coming with it would not affect source files dependent (by member ref) on already existing classes.
  2. Effects of adding members (like methods) to a class would affect only classes that inherit from that class. At the moment, adding a member to a class that nobody inherits from can trigger invalidation of all descendants of all classes defined in the same source file (see #2320 for a specific scenario).

The 1. is likely to be triggered by any code base that uses more than one class defined in a single source file. The 2. is affecting code bases with big number of nested classes that are inherited.
One example is Scala compiler itself. Even with name hashing, we invalidate too much and code edit cycle becomes long whenever a new member is introduced.


This work described in this issue is funded by Lightbend. I'm working on it as a contractor.

gkossakowski added a commit to gkossakowski/website that referenced this issue Apr 9, 2015

Mention issue on class-level dependency tracking.
The sbt/sbt#1104 discusses in depth what's needed to implement tracking of
dependencies at class level.

gkossakowski added a commit to gkossakowski/website that referenced this issue Apr 9, 2015

Mention issue on class-level dependency tracking.
The sbt/sbt#1104 discusses in depth what's needed to implement tracking of
dependencies at class level.
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 4, 2016

Contributor

I'd like to share notes on issues and design choices I encountered when prototyping phase 1.

Referring (naming classes)

Existing incremental compiler uses class names to refer to targets of external dependencies. External dependencies, are the ones where the target of a dependency is in a source file outside of a given project. In such case, the dependency is expressed as a src file -> class name. The name of a class is stored as a string. The string is created from a symbol seen after the flatten phase and it includes its package and its owners. For example, if we have:

package x.y
class A {
  object B {
    class C
  }
}

Then we haven names x.y.A, x.y.A$B, x.y.A$B$C for A, B and C. Things to note:

  • there's no distinction between class and object name (in particular, there's no way to distinguish companions by their names)
  • after flatten phase, all classes are nested in top level class but their names include names of their all owners

I was wondering if the lack of distinction between companions would lead to a problem of two distinct classes clashing with each other:

class A {
  class X
}
object A {
  class X
}

but this is disallowed by the Scala compiler.

In my old prototype, I used names of symbols as seen by the typer phase (before flattening) and I tracked all classes declared in a given source file by declaredClasses relation. However, the existing incremental compiler tracks produced by a given source file in classes relation that uses names after the flatten phase. I checked the history and e8eae7d explicitly switches to names after flatten phase but it doesn't say why this is a correct thing to do.

I can't see why this couldn't work at the moment so I'll try sticking to an existing classes relation and see what happens.

Preserving the old incremental compilation algorithm

I looked into preserving the old incremental compilation algorithm. It's a lot of work because relations in the old algorithm are expected to be mostly of type Relation[File, T] for T either String or File but in class-based dependency that type becomes Relation[String, T]. A lot of places have to be adapted. In my new prototype I simply removed all the code that supported turning off name hashing.

Switching to class-based invalidation

Switching to class-based invalidation is not enough to fix #2320. In addition to tracking dependencies at class level we also need to know which classes have changed. The full implementation of that tracking is the topic of phase 2. In phase 1, I'll just add separate tracking of hashes for each top level classes defined in source. This is the minimal change needed to fix #2320 and prove that phase 1 is mostly completed.

Contributor

gkossakowski commented Jan 4, 2016

I'd like to share notes on issues and design choices I encountered when prototyping phase 1.

Referring (naming classes)

Existing incremental compiler uses class names to refer to targets of external dependencies. External dependencies, are the ones where the target of a dependency is in a source file outside of a given project. In such case, the dependency is expressed as a src file -> class name. The name of a class is stored as a string. The string is created from a symbol seen after the flatten phase and it includes its package and its owners. For example, if we have:

package x.y
class A {
  object B {
    class C
  }
}

Then we haven names x.y.A, x.y.A$B, x.y.A$B$C for A, B and C. Things to note:

  • there's no distinction between class and object name (in particular, there's no way to distinguish companions by their names)
  • after flatten phase, all classes are nested in top level class but their names include names of their all owners

I was wondering if the lack of distinction between companions would lead to a problem of two distinct classes clashing with each other:

class A {
  class X
}
object A {
  class X
}

but this is disallowed by the Scala compiler.

In my old prototype, I used names of symbols as seen by the typer phase (before flattening) and I tracked all classes declared in a given source file by declaredClasses relation. However, the existing incremental compiler tracks produced by a given source file in classes relation that uses names after the flatten phase. I checked the history and e8eae7d explicitly switches to names after flatten phase but it doesn't say why this is a correct thing to do.

I can't see why this couldn't work at the moment so I'll try sticking to an existing classes relation and see what happens.

Preserving the old incremental compilation algorithm

I looked into preserving the old incremental compilation algorithm. It's a lot of work because relations in the old algorithm are expected to be mostly of type Relation[File, T] for T either String or File but in class-based dependency that type becomes Relation[String, T]. A lot of places have to be adapted. In my new prototype I simply removed all the code that supported turning off name hashing.

Switching to class-based invalidation

Switching to class-based invalidation is not enough to fix #2320. In addition to tracking dependencies at class level we also need to know which classes have changed. The full implementation of that tracking is the topic of phase 2. In phase 1, I'll just add separate tracking of hashes for each top level classes defined in source. This is the minimal change needed to fix #2320 and prove that phase 1 is mostly completed.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 5, 2016

Contributor

I'm adding more notes taken while prototyping class-based dependency tracking.

Naming classes

I discussed how to refer (name) classes in the comment above. I was wondering if naming classes as seen after the flatten phase is ok. However, the API phase that extracts class structures (and their names) uses names as seen before the pickler phase (and the flatten phase) so we run into an inconsistency. I think either choice (pickler or after flatten) is fine but we must be consistent throughout entire incremental compiler algorithm. I'm leaning towards sticking to names of Symbols at pickler phase to be consistent with the rest of API extraction logic that sees trees before pickler phase.

I'll try to switch everything in incremental compiler to names seen at pickler phase and report back.

Local and anonymous classes

Consider the example:

class A
class A2

class B {
  def foo = {
     class Foo(a: A)
  }
  {
    class Bar extends A2
  }
  class Inner(a: A)
}

B doesn't have any dependencies. Inner has A as it's member ref dependency. What do we do with Foo and Bar? Nobody outside of B can depend on them but those local classes depend on other classes and we have to record that dependency somehow. The challenge is that there's no obvious way to give a full name to either Foo or Bar. It would be even more tricky to do that for anonymous classes and keep names both unique and stable (across compilations).

I think what we should do is to attribute those dependencies to enclosing class of Foo and Bar. In this case, we would record dependency by inheritance on A2 and member reference dependency on A. There's one caveat, regular dependencies by inheritance are invalidated transitively. It means, we invalidate entire DAG of classes that inherit from one. The idea is that we want to recompile everything that will be recompiled anyway in one step. However, if we attribute dependency on A2 to B we'll invalidate all classes inheriting from B when A2 changes. That's not the behaviour we want. We cannot record that dependency as a member reference because inheritance dependencies are not filtered by name hashing (for correctness). To solve that, we need a new kind of dependency local inheritance.

My example above was a bit simplistic because local classes were defined within a top-level class. Let's look what do we do in a general case of arbitrary nesting. We have a symbol local for a local or an anonymous class that introduces some dependencies and we want to find a symbol outer that will receive those dependencies. We'll look at owner chain of local and we'll look for the longest prefix such that all elements in the prefix are either symbols representing package, classes (traits) or modules. Once we find such prefix, we'll take last element of it as outer.

The intuition for this algorithm is that we want to attribute dependencies to an inner most class or object that we can make a full name for.

I haven't done any work to implement it. I'll add a task to the list in this ticket.

Contributor

gkossakowski commented Jan 5, 2016

I'm adding more notes taken while prototyping class-based dependency tracking.

Naming classes

I discussed how to refer (name) classes in the comment above. I was wondering if naming classes as seen after the flatten phase is ok. However, the API phase that extracts class structures (and their names) uses names as seen before the pickler phase (and the flatten phase) so we run into an inconsistency. I think either choice (pickler or after flatten) is fine but we must be consistent throughout entire incremental compiler algorithm. I'm leaning towards sticking to names of Symbols at pickler phase to be consistent with the rest of API extraction logic that sees trees before pickler phase.

I'll try to switch everything in incremental compiler to names seen at pickler phase and report back.

Local and anonymous classes

Consider the example:

class A
class A2

class B {
  def foo = {
     class Foo(a: A)
  }
  {
    class Bar extends A2
  }
  class Inner(a: A)
}

B doesn't have any dependencies. Inner has A as it's member ref dependency. What do we do with Foo and Bar? Nobody outside of B can depend on them but those local classes depend on other classes and we have to record that dependency somehow. The challenge is that there's no obvious way to give a full name to either Foo or Bar. It would be even more tricky to do that for anonymous classes and keep names both unique and stable (across compilations).

I think what we should do is to attribute those dependencies to enclosing class of Foo and Bar. In this case, we would record dependency by inheritance on A2 and member reference dependency on A. There's one caveat, regular dependencies by inheritance are invalidated transitively. It means, we invalidate entire DAG of classes that inherit from one. The idea is that we want to recompile everything that will be recompiled anyway in one step. However, if we attribute dependency on A2 to B we'll invalidate all classes inheriting from B when A2 changes. That's not the behaviour we want. We cannot record that dependency as a member reference because inheritance dependencies are not filtered by name hashing (for correctness). To solve that, we need a new kind of dependency local inheritance.

My example above was a bit simplistic because local classes were defined within a top-level class. Let's look what do we do in a general case of arbitrary nesting. We have a symbol local for a local or an anonymous class that introduces some dependencies and we want to find a symbol outer that will receive those dependencies. We'll look at owner chain of local and we'll look for the longest prefix such that all elements in the prefix are either symbols representing package, classes (traits) or modules. Once we find such prefix, we'll take last element of it as outer.

The intuition for this algorithm is that we want to attribute dependencies to an inner most class or object that we can make a full name for.

I haven't done any work to implement it. I'll add a task to the list in this ticket.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 5, 2016

Invalidate dependencies at class level.
This commit implements invalidation of dependencies at class level both
for inheritance and memberRef dependencies. However, information that
determines api changes is still tracked mostly at source file level.

Name hashes are tracked at source level (classes considered for
invalidation are mapped back to source files before name hash lookup).
For that reason, #2319 is still not fixed.

Api hashes are tracked per-class but only for top-level classes. It's not
a final design but a quick hack to prove that class-based invalidation is
working correctly. It was enough to scenario described in #2320 fixed.
This is shown in `class-based-inheritance` scripted test.
The final design of api hashing will be that all publicly accessible
classes will get their own separate hash.

The sealed test is marked pending as described in notes of #1104.

Other thing to note is that we're getting close to being able to strip
abstractions that deal with the fact that dependencies were tracked either
as Relation[File, File] (for internal dependencies) or
Relation[File, String] (for external dependencies). We track all
dependencies uniformly as Relation[String, String] now.
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 6, 2016

Contributor

I've done a bit more research on naming classes. Here're my notes

Naming classes and class files

In the implementation of the incremental compiler recording of produced class files, their class names and binary dependencies are all mixed together.

The incremental compiler needs a function (classFile: File) => (className: String) (and it's inverse) to support the invalidation between projects and upon classpath changes. This function has been introduced in 0d5814e along with recording of a class name for a binary dependency. I haven't worked out whether this commit fixes any bugs or it's just a refactoring. It seems to me that it makes incremental compiler more robust against changes to the classpath.

The incremental compiler also needs a function (className: String) => (srcFile: String) to support exportJars (see #108 and 0b3ec05). This is related again to binary dependencies and the classpath lookup.

For both of those functions, the className was a name seen after the flatten phase. It's important that the name is consistent with the name of a class file corresponding to it because conversion between names and class files is done in several places. Once we have a srcFile we can grab a corresponding api object and figure out whether an external (from a different project) changes affect us. This is done in IncrementalCommon.currentExternalAPI. However, once we switch fully to class-based dependency tracking we'll track an api per-class and not per source. We'll need a (className: String) => (api: ClassLike). The api extraction happens at a phase ran around the pickler phase (before flatten) so we see names that do not resemble names of class files. Oooops.

Let's see how we can solve this. My current thinking is that I'll stick to flat names for everything binary-related (class files, binary dependencies, classpath lookups, etc.) and I'll stick to pickler names for everything we're interested in reasoning about using the Scala model (declared classes, source dependencies). For the external dependencies (from other projects) we need mapping from binaryClassName (after flatten phase) => srcClassName (at the pickler phase). We'll need a new relation binaryClassName: Relation[String, String]. We'll track only class names of classes that are non-local (see the discussion of local classes to understand what does it mean). That's ok, because we need the mapping only for classes that other classes from different compilation unit can depend on.

This change might be tricky enough to implement that I'm creating a new task for it. My guess is that I'll have to do some work to introduce a clear distinction between binary and src names for classes.

Contributor

gkossakowski commented Jan 6, 2016

I've done a bit more research on naming classes. Here're my notes

Naming classes and class files

In the implementation of the incremental compiler recording of produced class files, their class names and binary dependencies are all mixed together.

The incremental compiler needs a function (classFile: File) => (className: String) (and it's inverse) to support the invalidation between projects and upon classpath changes. This function has been introduced in 0d5814e along with recording of a class name for a binary dependency. I haven't worked out whether this commit fixes any bugs or it's just a refactoring. It seems to me that it makes incremental compiler more robust against changes to the classpath.

The incremental compiler also needs a function (className: String) => (srcFile: String) to support exportJars (see #108 and 0b3ec05). This is related again to binary dependencies and the classpath lookup.

For both of those functions, the className was a name seen after the flatten phase. It's important that the name is consistent with the name of a class file corresponding to it because conversion between names and class files is done in several places. Once we have a srcFile we can grab a corresponding api object and figure out whether an external (from a different project) changes affect us. This is done in IncrementalCommon.currentExternalAPI. However, once we switch fully to class-based dependency tracking we'll track an api per-class and not per source. We'll need a (className: String) => (api: ClassLike). The api extraction happens at a phase ran around the pickler phase (before flatten) so we see names that do not resemble names of class files. Oooops.

Let's see how we can solve this. My current thinking is that I'll stick to flat names for everything binary-related (class files, binary dependencies, classpath lookups, etc.) and I'll stick to pickler names for everything we're interested in reasoning about using the Scala model (declared classes, source dependencies). For the external dependencies (from other projects) we need mapping from binaryClassName (after flatten phase) => srcClassName (at the pickler phase). We'll need a new relation binaryClassName: Relation[String, String]. We'll track only class names of classes that are non-local (see the discussion of local classes to understand what does it mean). That's ok, because we need the mapping only for classes that other classes from different compilation unit can depend on.

This change might be tricky enough to implement that I'm creating a new task for it. My guess is that I'll have to do some work to introduce a clear distinction between binary and src names for classes.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 14, 2016

Invalidate dependencies at class level.
This commit implements invalidation of dependencies at class level both
for inheritance and memberRef dependencies. However, information that
determines api changes is still tracked mostly at source file level.

Name hashes are tracked at source level (classes considered for
invalidation are mapped back to source files before name hash lookup).
For that reason, #2319 is still not fixed.

Api hashes are tracked per-class but only for top-level classes. It's not
a final design but a quick hack to prove that class-based invalidation is
working correctly. It was enough to scenario described in #2320 fixed.
This is shown in `class-based-inheritance` scripted test.
The final design of api hashing will be that all publicly accessible
classes will get their own separate hash.

The sealed test is marked pending as described in notes of #1104.

Other thing to note is that we're getting close to being able to strip
abstractions that deal with the fact that dependencies were tracked either
as Relation[File, File] (for internal dependencies) or
Relation[File, String] (for external dependencies). We track all
dependencies uniformly as Relation[String, String] now.
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 23, 2016

Contributor

More notes on names.

Distinction of objects and classes in names

In my original comment on naming classes I said there would be no distinction between names of objects and classes. If you have a pair of companions:

package abc
class A
object A

then there's just one name for both of them: abc.A. This choice has a consequence on dependency tracking. Since names do not distinguish between classes and objects, you will always have an implicit dependency on both companions at the same time. It means that if a given piece of code depends just on a companion object it will still be affected by changes to an api of a companion class.

Couldn't we just append $ to simple names of objects and track dependencies on objects and classes separately? Changing the naming scheme is easy. However, it turns out that you have to depend on the pair of companions for correctness. The reason is that companion objects contributes to an implicit scope of a companion class where implicits are being searched for. An example of a scenario when it matters is provided by a scripted test: https://github.com/sbt/sbt/tree/1.0.x/sbt/src/sbt-test/source-dependencies/implicit-search-companion-scope

This test settles the question whether we should distinguish between dependencies on objects and classes. The answer is no.

Contributor

gkossakowski commented Jan 23, 2016

More notes on names.

Distinction of objects and classes in names

In my original comment on naming classes I said there would be no distinction between names of objects and classes. If you have a pair of companions:

package abc
class A
object A

then there's just one name for both of them: abc.A. This choice has a consequence on dependency tracking. Since names do not distinguish between classes and objects, you will always have an implicit dependency on both companions at the same time. It means that if a given piece of code depends just on a companion object it will still be affected by changes to an api of a companion class.

Couldn't we just append $ to simple names of objects and track dependencies on objects and classes separately? Changing the naming scheme is easy. However, it turns out that you have to depend on the pair of companions for correctness. The reason is that companion objects contributes to an implicit scope of a companion class where implicits are being searched for. An example of a scenario when it matters is provided by a scripted test: https://github.com/sbt/sbt/tree/1.0.x/sbt/src/sbt-test/source-dependencies/implicit-search-companion-scope

This test settles the question whether we should distinguish between dependencies on objects and classes. The answer is no.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Jan 23, 2016

Member

Ah, that makes sense. Very interesting, thanks.

On Sat, 23 Jan 2016 at 15:10 Grzegorz Kossakowski notifications@github.com
wrote:

More notes on names.
Distinction of objects and classes in names

In my original comment on naming classes I said there would be no
distinction between names of objects and classes. If you have a pair of
companions:

package abcclass Aobject A

then there's just one name for both of them: abc.A. This choice has a
consequence on dependency tracking. Since names do not distinguish between
classes and objects, you will always have an implicit dependency on both
companions at the same time. It means that if a given piece of code depends
just on a companion object it will still be affected by changes to an api
of a companion class.

Couldn't we just append $ to simple names of objects and track
dependencies on objects and classes separately? Changing naming scheme is
easy. However, it turns out that you have to depend on the pair of
companions for correctness. The reason is that companion objects
contributes to implicit scope of a companion class where implicits are
being searched for. An example of a scenario when it matters is provided by
a scripted test:
https://github.com/sbt/sbt/tree/1.0.x/sbt/src/sbt-test/source-dependencies/implicit-search-companion-scope

This test settles the question whether we should distinguish between
dependencies on objects and classes. The answer is no.


Reply to this email directly or view it on GitHub
#1104 (comment).

Member

dwijnand commented Jan 23, 2016

Ah, that makes sense. Very interesting, thanks.

On Sat, 23 Jan 2016 at 15:10 Grzegorz Kossakowski notifications@github.com
wrote:

More notes on names.
Distinction of objects and classes in names

In my original comment on naming classes I said there would be no
distinction between names of objects and classes. If you have a pair of
companions:

package abcclass Aobject A

then there's just one name for both of them: abc.A. This choice has a
consequence on dependency tracking. Since names do not distinguish between
classes and objects, you will always have an implicit dependency on both
companions at the same time. It means that if a given piece of code depends
just on a companion object it will still be affected by changes to an api
of a companion class.

Couldn't we just append $ to simple names of objects and track
dependencies on objects and classes separately? Changing naming scheme is
easy. However, it turns out that you have to depend on the pair of
companions for correctness. The reason is that companion objects
contributes to implicit scope of a companion class where implicits are
being searched for. An example of a scenario when it matters is provided by
a scripted test:
https://github.com/sbt/sbt/tree/1.0.x/sbt/src/sbt-test/source-dependencies/implicit-search-companion-scope

This test settles the question whether we should distinguish between
dependencies on objects and classes. The answer is no.


Reply to this email directly or view it on GitHub
#1104 (comment).

@gkossakowski gkossakowski self-assigned this Jan 23, 2016

@gkossakowski gkossakowski added this to the 1.0 milestone Jan 23, 2016

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 23, 2016

Contributor

I've spent some time thinking and prototyping handling of top-level imports. This is one of the tasks outlined in the description of the ticket (for phase 1).

Handling of top level imports

Let's start with an example motivating the problem:

// A.scala
package foo
class A

// B.scala
import foo.A
class B

At the moment, the dependency introduced by the import node is: B.scala -> A.scala
Once we switch to tracking dependencies at class level, what do we put instead of B.scala in our dependency? The solution seems to be to say: we'll keep B.scala. We would have B.scala -> A as a dependency introduced by the import node. Since the rest of the dependency graph is class name -> class name (nodes are classes), we would need a new relation src file -> class name. That's what I proposed as a task. At the outset, it seems easy. We would need to just have one more relation and take it into account in the invalidation logic. However, once you start working on implementation, you realize you have to touch more places than you might expect. The complex logic that maps symbols to a correct kind of dependency (internal src, external src, external binary) has to be able to map symbols to both class names and source files. The implementation involves mapping between source and binary names, looking up classpath and working with multiple Analysis instances (for multi project setups). This functionality would have to be either somewhat duplicated (and specialized for one case) or generalized to deal with both class names and source files as starting nodes for dependencies. This part of incremental compiler is already complex and would become even more complex to handle two different kinds of nodes in the dependency graph.

And all of that effort would go into supporting unused imports. If the import is used, the dependency will be accounted for a class that is using an imported type or term. I think the return we get on the investment is not great.

What can we do? I think we could cheat a little bit and pretend that top level imports are actually imports defined inside a top level class. Our example would be treated as it was:

// A.scala
package foo
class A

// B.scala
class B {
  import foo.A
}

And this way we would just have B -> A dependency. Obviously, scoping rules are violated so this would not type check:

// A.scala
package foo
class A

// B.scala
// we don't see imported A here
class B(a: A) {
  import foo.A
}

That's why it's cheating. However, it doesn't matter for the dependency tracking. It's enough that we record the dependency anywhere that will cause B.scala to be recompiled when, for example, A is renamed.

My proposal is that we collect all dependencies introduced by top level imports and assign them to the first top level class/object/trait declared in a source file. What if the source file contains just top level imports and no class/object/trait declarations? We issue a warning that incremental compiler doesn't track dependencies correctly in such case.

This solution will be straightforward to implement (it's a simple tweak to dependency extraction logic) and will handle the majority of cases where you have both unused imports and classes declared in a file. If this turns out to be problematic, the more principled (and much more complex to implement) solution can be introduced later on.

Contributor

gkossakowski commented Jan 23, 2016

I've spent some time thinking and prototyping handling of top-level imports. This is one of the tasks outlined in the description of the ticket (for phase 1).

Handling of top level imports

Let's start with an example motivating the problem:

// A.scala
package foo
class A

// B.scala
import foo.A
class B

At the moment, the dependency introduced by the import node is: B.scala -> A.scala
Once we switch to tracking dependencies at class level, what do we put instead of B.scala in our dependency? The solution seems to be to say: we'll keep B.scala. We would have B.scala -> A as a dependency introduced by the import node. Since the rest of the dependency graph is class name -> class name (nodes are classes), we would need a new relation src file -> class name. That's what I proposed as a task. At the outset, it seems easy. We would need to just have one more relation and take it into account in the invalidation logic. However, once you start working on implementation, you realize you have to touch more places than you might expect. The complex logic that maps symbols to a correct kind of dependency (internal src, external src, external binary) has to be able to map symbols to both class names and source files. The implementation involves mapping between source and binary names, looking up classpath and working with multiple Analysis instances (for multi project setups). This functionality would have to be either somewhat duplicated (and specialized for one case) or generalized to deal with both class names and source files as starting nodes for dependencies. This part of incremental compiler is already complex and would become even more complex to handle two different kinds of nodes in the dependency graph.

And all of that effort would go into supporting unused imports. If the import is used, the dependency will be accounted for a class that is using an imported type or term. I think the return we get on the investment is not great.

What can we do? I think we could cheat a little bit and pretend that top level imports are actually imports defined inside a top level class. Our example would be treated as it was:

// A.scala
package foo
class A

// B.scala
class B {
  import foo.A
}

And this way we would just have B -> A dependency. Obviously, scoping rules are violated so this would not type check:

// A.scala
package foo
class A

// B.scala
// we don't see imported A here
class B(a: A) {
  import foo.A
}

That's why it's cheating. However, it doesn't matter for the dependency tracking. It's enough that we record the dependency anywhere that will cause B.scala to be recompiled when, for example, A is renamed.

My proposal is that we collect all dependencies introduced by top level imports and assign them to the first top level class/object/trait declared in a source file. What if the source file contains just top level imports and no class/object/trait declarations? We issue a warning that incremental compiler doesn't track dependencies correctly in such case.

This solution will be straightforward to implement (it's a simple tweak to dependency extraction logic) and will handle the majority of cases where you have both unused imports and classes declared in a file. If this turns out to be problematic, the more principled (and much more complex to implement) solution can be introduced later on.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 26, 2016

Invalidate dependencies at class level.
This commit implements invalidation of dependencies at class level both
for inheritance and memberRef dependencies. However, information that
determines api changes is still tracked mostly at source file level.

Name hashes are tracked at source level (classes considered for
invalidation are mapped back to source files before name hash lookup).
For that reason, #2319 is still not fixed.

Api hashes are tracked per-class but only for top-level classes. It's not
a final design but a quick hack to prove that class-based invalidation is
working correctly. It was enough to scenario described in #2320 fixed.
This is shown in `class-based-inheritance` scripted test.
The final design of api hashing will be that all publicly accessible
classes will get their own separate hash.

The sealed test is marked pending as described in notes of #1104.

Other thing to note is that we're getting close to being able to strip
abstractions that deal with the fact that dependencies were tracked either
as Relation[File, File] (for internal dependencies) or
Relation[File, String] (for external dependencies). We track all
dependencies uniformly as Relation[String, String] now.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 26, 2016

Introduce `binaryClassName` relation.
The `binaryClassName` relation maintains mapping between source and binary
class names. This mapping is needed to map binary dependencies back to
source dependencies in case of separate compilation (where we see
dependencies on class files). You can see that mapping being used in
`binaryDependency` method implementation of Analysis callback. Previously,
we would map class file to a source file it was produced from and then
assume that dependency is on any (all) of classes declared in that class.
Introduction of `binaryClassName` lets us map dependency back to source
class name directly and remove that imprecision of dependency tracking.

We maintain mapping between source and binary class names just for
non-local classes. Check this
sbt#1104 (comment) for the
discussion of local and non-local classes.

We also rework tracking of products in Analysis by introducing explicitly
the concept of local and non-local products corresponding to local and
non-local classes. This helps us to clarify for which classes we track
source and binary class names.
@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jan 28, 2016

Contributor

It's a good idea to cheat a bit for import tracking, I agree. I wonder about putting them in some existing top-level class. Maybe we should have a synthetic $compilationUnit class in each compilation unit for these things?

Contributor

adriaanm commented Jan 28, 2016

It's a good idea to cheat a bit for import tracking, I agree. I wonder about putting them in some existing top-level class. Maybe we should have a synthetic $compilationUnit class in each compilation unit for these things?

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Jan 29, 2016

Contributor

I thought about introducing a synthetic class. However, I suspect that implementation would run into subtle issues. For example, the current implementation assumes that there's 1-n relation between a declared class in source file and class files. I'd have to chase places where it matters and patch them. On the other hand, the proposal I described above was a simple patch to Dependency extraction logic.
I hope I won't need to revisit this but if I have to then I'll give synthetic class idea another shot.

Contributor

gkossakowski commented Jan 29, 2016

I thought about introducing a synthetic class. However, I suspect that implementation would run into subtle issues. For example, the current implementation assumes that there's 1-n relation between a declared class in source file and class files. I'd have to chase places where it matters and patch them. On the other hand, the proposal I described above was a simple patch to Dependency extraction logic.
I hope I won't need to revisit this but if I have to then I'll give synthetic class idea another shot.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jan 29, 2016

Contributor

Ok, i was thinking it could be confusing to use an arbitrary class (from potentially many) in the compilation unit to as the one that has the "compilation unit stuff".

Contributor

adriaanm commented Jan 29, 2016

Ok, i was thinking it could be confusing to use an arbitrary class (from potentially many) in the compilation unit to as the one that has the "compilation unit stuff".

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 30, 2016

Invalidate dependencies at class level.
This commit implements invalidation of dependencies at class level both
for inheritance and memberRef dependencies. However, information that
determines api changes is still tracked mostly at source file level.

Name hashes are tracked at source level (classes considered for
invalidation are mapped back to source files before name hash lookup).
For that reason, #2319 is still not fixed.

Api hashes are tracked per-class but only for top-level classes. It's not
a final design but a quick hack to prove that class-based invalidation is
working correctly. It was enough to scenario described in #2320 fixed.
This is shown in `class-based-inheritance` scripted test.
The final design of api hashing will be that all publicly accessible
classes will get their own separate hash.

The sealed test is marked pending as described in notes of #1104.

Other thing to note is that we're getting close to being able to strip
abstractions that deal with the fact that dependencies were tracked either
as Relation[File, File] (for internal dependencies) or
Relation[File, String] (for external dependencies). We track all
dependencies uniformly as Relation[String, String] now.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 30, 2016

Introduce `binaryClassName` relation.
The `binaryClassName` relation maintains mapping between source and binary
class names. This mapping is needed to map binary dependencies back to
source dependencies in case of separate compilation (where we see
dependencies on class files). You can see that mapping being used in
`binaryDependency` method implementation of Analysis callback. Previously,
we would map class file to a source file it was produced from and then
assume that dependency is on any (all) of classes declared in that class.
Introduction of `binaryClassName` lets us map dependency back to source
class name directly and remove that imprecision of dependency tracking.

We maintain mapping between source and binary class names just for
non-local classes. Check this
sbt#1104 (comment) for the
discussion of local and non-local classes.

We also rework tracking of products in Analysis by introducing explicitly
the concept of local and non-local products corresponding to local and
non-local classes. This helps us to clarify for which classes we track
source and binary class names.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 30, 2016

Add handling of unused top level imports
Implements a strategy for recording dependencies introduced by top level
imports by assigning those dependencies to the first top level class.
In case there are top level imports but no top level class/trait/object
defined in a compilation unit, a warning is issued. The rationale for this
strategy can be found at:
sbt#1104 (comment)

Add an unit test covering different cases of top level imports (e.g.
defined in nested packages).

Mark the scripted test source-dependencies/import-class as passing after
a small modification of adding a top level class.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 30, 2016

Add a pending test for inheritance by a local class
Add a pending scripted test that checks if dependencies introduced by
inheritance by local classes are handled properly. Check the comment in
the test and this discussion:
sbt#1104 (comment) for more
details.

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Jan 30, 2016

Handle dependencies coming from local classes.
Dependencies introduced by local classes require special handling because
we track only non local classes (that can be referred from other files)
in dependency relations. To overcome this problem, dependencies introduced
by a local class are recorded as introduced by an outer class that is
non local. However, this introduces a new problem with dependencies
introduced by inheritance. We don't want local inheritance dependencies
to cause a transitive invalidation of all classes that inherit from the
outer class containing the local class. Check the comment in
Relations.scala this patches introduces or follow the discussion of this
problem at: sbt#1104 (comment)

To capture the subtlety of inheritance dependencies from local classes,
we introduce `LocalDependencyByInheritance` case to `DependencyContext`
enum.

TestCallback has been modified to return extracted local inheritance
dependencies and a test in DependencySpecification has been updated
accordingly.

The Dependency phase has been reworked to handle local classes properly
by mapping dependencies to outer, non local classes.Check the
implementation for details of the mapping. It's worth mentioning that
mapping is implemented as an amortized O(1) lookup so this change
doesn't affect performance of extraction phase negatively.

The invalidation logic has been modified to take into account inheritance
dependencies introduced by local classes. The patch is small because
the invalidation logic is very straightforward: we invalidate local
inheritance dependencies non-transitively and we do not apply name hashing
dependency pruning.

Lastly, we mark local-class-inheritance scripted test as passing.
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 2, 2016

Contributor

I have great news! The phase 1 of the class-based dependency tracking is implemented. All tasks are marked as completed. I pushed all of my changes and they are available for viewing here: https://github.com/sbt/sbt/compare/0.13...gkossakowski:class-based-dependencies?expand=1

I'm going to test the changes from the phase 1 on a few real projects like scala/scala, scalatest, specs2 to catch some bugs. In next few days, I'll add more notes on issues encountered while implementing changes from the phase 1.

Contributor

gkossakowski commented Feb 2, 2016

I have great news! The phase 1 of the class-based dependency tracking is implemented. All tasks are marked as completed. I pushed all of my changes and they are available for viewing here: https://github.com/sbt/sbt/compare/0.13...gkossakowski:class-based-dependencies?expand=1

I'm going to test the changes from the phase 1 on a few real projects like scala/scala, scalatest, specs2 to catch some bugs. In next few days, I'll add more notes on issues encountered while implementing changes from the phase 1.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Feb 9, 2016

Contributor

Cool! Could you provide a quick update?

Contributor

adriaanm commented Feb 9, 2016

Cool! Could you provide a quick update?

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 16, 2016

Contributor

I'm adding a note on implementation challenge of the notion of local classes.

Local classes (implementation challenge)

The notion of local classes has been discussed before in this ticket. I'm going to discuss an issue with implementing that notion that makes the code of incremental compiler more complicated because two different phases have to communicate with each other.

Whether a class is local or not can be determined at phases after typechecking and before lambdalift that moves classes around and destroys the original nesting structure. E.g. the Dependency phase needs to determine whether a dependency by inheritance is coming from a local or a non-local class. This is required to properly classify the dependency (see discussion above).

We need to know whether a class is local or non-local for other reason: to find out how it's class files should be recorded. We have a notion of both local and non-local products. The information about produced class files (e.g. their names) is available at the end of compilation pipeline and its extracted by the Analyze phase. That phase needs to know whether a product (a class file) is coming from a local or non-local class. It cannot determine it simply by looking at owner chain because they got rewritten. We have to introduce a communication mechanism between the Dependency phase and Analyze phase. We do that by introducing a Map from a class to its non-local enclosing class (it can be just itself). The map is stored as a field of a subclass of Global so both phases have access to it.

This solution works but is ugly. Now we have to worry about the lifecycle of the Map and make sure that its properly populated so it can serve both Dependency and Analyze phases. It's ugly for another reason: there's already a similar map maintained for the backend so it can generate proper EnclosingClass attributes in class files. However, there's no good way to access that information so I had to duplicate it.


I decided to add the note on this implementation detail to show that destructive changes like rewriting owner chains can be problematic in surprising contexts. It's good to keep this in mind when new design ideas are floated around (e.g. Dotty).

Contributor

gkossakowski commented Feb 16, 2016

I'm adding a note on implementation challenge of the notion of local classes.

Local classes (implementation challenge)

The notion of local classes has been discussed before in this ticket. I'm going to discuss an issue with implementing that notion that makes the code of incremental compiler more complicated because two different phases have to communicate with each other.

Whether a class is local or not can be determined at phases after typechecking and before lambdalift that moves classes around and destroys the original nesting structure. E.g. the Dependency phase needs to determine whether a dependency by inheritance is coming from a local or a non-local class. This is required to properly classify the dependency (see discussion above).

We need to know whether a class is local or non-local for other reason: to find out how it's class files should be recorded. We have a notion of both local and non-local products. The information about produced class files (e.g. their names) is available at the end of compilation pipeline and its extracted by the Analyze phase. That phase needs to know whether a product (a class file) is coming from a local or non-local class. It cannot determine it simply by looking at owner chain because they got rewritten. We have to introduce a communication mechanism between the Dependency phase and Analyze phase. We do that by introducing a Map from a class to its non-local enclosing class (it can be just itself). The map is stored as a field of a subclass of Global so both phases have access to it.

This solution works but is ugly. Now we have to worry about the lifecycle of the Map and make sure that its properly populated so it can serve both Dependency and Analyze phases. It's ugly for another reason: there's already a similar map maintained for the backend so it can generate proper EnclosingClass attributes in class files. However, there's no good way to access that information so I had to duplicate it.


I decided to add the note on this implementation detail to show that destructive changes like rewriting owner chains can be problematic in surprising contexts. It's good to keep this in mind when new design ideas are floated around (e.g. Dotty).

gkossakowski added a commit to gkossakowski/sbt that referenced this issue Feb 18, 2016

Track API at class level instead of source file level
This commit changes how tracking of API data structures is done within
the incremental compiler. It changes how APIs are passed around and stored
but doesn't change the behavior of the incremental compiler.

Here's a summary of what has changed and what's still being tracked at the
source file level:

  - APIs are tracked per class name in a newly introduced Companions data
    structure; incremental compiler always considers the pair of companions
    from now on
  - only APIs for top level classes are extracted at the moment;
    invalidation is still imprecise
  - Used names are tracked per source file
  - Name hashes are tracked per top-level class (they're part of
    AnalyzedClass data structure)

Companion class and object have to be considered as a pair because they're
given exactly the same name in the incremental compiler. The idea of
naming classes and objects separately has been discussed and rejected
here: sbt#1104 (comment)
APIs of companions are linked together in AnalysisCallback. The ExtractAPI
compiler phase continues to extract apis of classes and objects separately.
More on those changes below.

Most changes in this patch are direct consequences of the changes in the
`interface/other` file. The `Source` has been replaced by the
`AnalyzedClass`. The AnalyzedClass carries an extracted api of the
(class, companion object) pair (stored as `Companions`) plus some meta data
about the pair (e.g. hash sum of its api representation). Source used to
carry both hash sum of the source file contents (hash of the text) and a
hash of the api. The hash of the source file has been introduced to
shortcut equality checking before api hashing has been introduced. Now
it's redundant so it's removed.

The `SourceAPI` used to carry information about packages declared in a
source file but this information wasn't used in the incremental compiler
so its tracking is removed.

I also removed an ugly looking `_internalOnly_` prefixes. The changes in
this branch are not binary or source compatible so it's a good opportunity
to perform some cleanups.

AnalysisCallback has a new method `startSource`. It's needed because we
want to track sources that do not declare any classes and for which there's
no `api` call. The `api` method takes `ClassLike` as an argument and can
be called multiple times per a single source file.

The implementation of the AnalysisCallback has been affected by the switch
from Source to AnalyzedClass tracking the most. The main change here is
that AnalysisCallback is responsible for pairing apis of a companion class
and a companion object. It stores apis of classes and objects separately
and does the same of their name hashes. Right before persisting all that
information it puts both apis into Companions data structure and merges
name hashes for a class and its companion object. Merging is performed
only when both class and object with the same name are declared.
It's worth noting why we cannot just compute name hashes once we have a
class and its companion object available instead of merging precomputed
name hashes. We can't do that because APIs are minimized and only their
hashes are stored so names and signatures of members are lost at this
point. We have to computer name hashes before minimization but then we
don't have both of companions available yet.

For that reason `NameHashing.merge` operation has been introduced that
performs a straightforward merge. NameHashingSpecification provides a basic
test for its properties.

The incremental invalidation algorithm has been affected by switching to
class level api tracking. As a consequence, most of invalidation is
happening at class level now and only right before compilation class names
are mapped to source files that declare them. To emphasize that point,
recompilation of classes has been refactored to its own method
`recompileClasses` in the IncrementalCommon. However, we had two make two
exceptions to invalidation performed on class level:

  1. Sources that just has been added and we don't know their declared
     classes yet so we have to scheduled them for compilation as is.
  2. Sources that do not declare any classes have to be scheduled for
     recompilation directly too

This is the reason why `cycle` takes both invalidated classes and modified
srcs as inputs and why `invalidateInitial` computes both. After the first
iteration of `cycle`, the set of modified sources becomes empty and the
remaining of invalidation is performed at the class level only.

Here's a list of changes I think are worth highlighting either for clarity
or to make a point:

  - SameAPI dropped some old, unused code from TopLevel and NameChanges
    classes
  - APIs do not have any reference to `java.io.File` now, this data
    structure operates purely on class names now
  - helpers methods for looking up dependency information from Relations
    has been changed to work on a class level
  - version number in TextAnalysisFormat has been bumped
  - the `inherited_type_params` scripted test has been removed as it looks
    like not testing anything useful and breaks due to changes to the
    `APIs` interface
  - Analyze doesn't store empty apis for Java source files that do not
    declare any classes; we use `AnalysisCallback.startSource` for tracking
  - The Scala 2.8-specific test has been dropped.
  - The test for Ant-style compilation is marked as pending. Supporting of
    Ant-style compilation is tricky because invalidation is happening at
    the class level now.
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 25, 2016

Contributor

I've done some preliminary testing of class-based dependency tracking to see how it fares.

Performance improvements

Scalatest (master)

Add a method to AndHaveWord in Matchers.scala
Run compile with name hashing:

[info] Compiling 1 Scala source to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes...
[warn] there were four deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Compiling 45 Scala sources to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes…
[success] Total time: 49 s, completed Feb 24, 2016 12:15:01 AM

Run compile with name hashing + class-based dependency tracking:

[info] Compiling 1 Scala source to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes...
[warn] there were four deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[success] Total time: 4 s, completed Feb 24, 2016 12:21:12 AM

Specs2 (master)

Add a method to OptionResultMatcher class in OptionMatchers.scala.

Run compile with name hashing:

[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[info] Compiling 5 Scala sources to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[info] Compiling 6 Scala sources to /Users/grek/tmp/specs2/core/target/scala-2.11/classes...
[info] Compiling 3 Scala sources to /Users/grek/tmp/specs2/junit/target/scala-2.11/classes...
[info] Compiling 99 Scala sources to /Users/grek/tmp/specs2/core/target/scala-2.11/test-classes...
[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/form/target/scala-2.11/classes...
[success] Total time: 48 s, completed Feb 25, 2016 12:48:38 AM

Run compile with name hashing + class-based dependency tracking:

[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[success] Total time: 1 s, completed Feb 25, 2016 12:58:27 AM
Contributor

gkossakowski commented Feb 25, 2016

I've done some preliminary testing of class-based dependency tracking to see how it fares.

Performance improvements

Scalatest (master)

Add a method to AndHaveWord in Matchers.scala
Run compile with name hashing:

[info] Compiling 1 Scala source to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes...
[warn] there were four deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Compiling 45 Scala sources to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes…
[success] Total time: 49 s, completed Feb 24, 2016 12:15:01 AM

Run compile with name hashing + class-based dependency tracking:

[info] Compiling 1 Scala source to /Users/grek/scala/scalatest/scalatest/target/scala-2.11/classes...
[warn] there were four deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[success] Total time: 4 s, completed Feb 24, 2016 12:21:12 AM

Specs2 (master)

Add a method to OptionResultMatcher class in OptionMatchers.scala.

Run compile with name hashing:

[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[info] Compiling 5 Scala sources to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[info] Compiling 6 Scala sources to /Users/grek/tmp/specs2/core/target/scala-2.11/classes...
[info] Compiling 3 Scala sources to /Users/grek/tmp/specs2/junit/target/scala-2.11/classes...
[info] Compiling 99 Scala sources to /Users/grek/tmp/specs2/core/target/scala-2.11/test-classes...
[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/form/target/scala-2.11/classes...
[success] Total time: 48 s, completed Feb 25, 2016 12:48:38 AM

Run compile with name hashing + class-based dependency tracking:

[info] Compiling 1 Scala source to /Users/grek/tmp/specs2/matcher/target/scala-2.11/classes...
[success] Total time: 1 s, completed Feb 25, 2016 12:58:27 AM
@jakozaur

This comment has been minimized.

Show comment
Hide comment
@jakozaur

jakozaur Feb 25, 2016

@gkossakowski great work! Would love to use it.

@gkossakowski great work! Would love to use it.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 25, 2016

Contributor

@jakozaur, if you would like to test it you can do it now:

  1. Clone https://github.com/gkossakowski/sbt/tree/class-based-dependencies
  2. In your checkout, run sbt publishLocal
  3. In your sbt project change version to 0.13.12-SNAPSHOT and run sbt normally.
Contributor

gkossakowski commented Feb 25, 2016

@jakozaur, if you would like to test it you can do it now:

  1. Clone https://github.com/gkossakowski/sbt/tree/class-based-dependencies
  2. In your checkout, run sbt publishLocal
  3. In your sbt project change version to 0.13.12-SNAPSHOT and run sbt normally.
@jakozaur

This comment has been minimized.

Show comment
Hide comment
@jakozaur

jakozaur Feb 26, 2016

@gkossakowski how can I try it with zinc. Also get zinc, change the sbt dependency number and hope that it will work?

@gkossakowski how can I try it with zinc. Also get zinc, change the sbt dependency number and hope that it will work?

@rockjam

This comment has been minimized.

Show comment
Hide comment
@rockjam

rockjam Feb 26, 2016

@gkossakowski will this improvement likely appear in 0.13.12 release?

rockjam commented Feb 26, 2016

@gkossakowski will this improvement likely appear in 0.13.12 release?

@etorreborre

This comment has been minimized.

Show comment
Hide comment
@etorreborre

etorreborre Feb 26, 2016

@gkossakowski I'm sorry to hear about the difficulties in the implementation but the first results are AWESOME!

@gkossakowski I'm sorry to hear about the difficulties in the implementation but the first results are AWESOME!

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 26, 2016

Contributor

@jakozaur, yes. Keep in mind that zinc sets two sbt versions: the sbt version used to build zinc and sbt version used as dependency of zinc. You want to change the latter.

Contributor

gkossakowski commented Feb 26, 2016

@jakozaur, yes. Keep in mind that zinc sets two sbt versions: the sbt version used to build zinc and sbt version used as dependency of zinc. You want to change the latter.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 26, 2016

Contributor

@rockjam, no decision on targeted sbt version has been made yet. Merging into 0.13.x series is tricky because class-based dependency tracking breaks source compatibility of the Analysis object that is returned by the compile task. This might potentially break sbt plugins that rely on information stored in Analysis. For example, I know that Play's sbt plugin relies on that API and would need to be adapted.

Contributor

gkossakowski commented Feb 26, 2016

@rockjam, no decision on targeted sbt version has been made yet. Merging into 0.13.x series is tricky because class-based dependency tracking breaks source compatibility of the Analysis object that is returned by the compile task. This might potentially break sbt plugins that rely on information stored in Analysis. For example, I know that Play's sbt plugin relies on that API and would need to be adapted.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 26, 2016

Contributor

@etorreborre thank you! The implementation difficulties explain why this hasn't been done before.

Contributor

gkossakowski commented Feb 26, 2016

@etorreborre thank you! The implementation difficulties explain why this hasn't been done before.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Feb 28, 2016

Member

It's ugly for another reason: there's already a similar map maintained for the backend so it can generate proper EnclosingClass attributes in class files. However, there's no good way to access that information so I had to duplicate it.

Symbol#originalOwner is a public. Was this insufficient?

Member

retronym commented Feb 28, 2016

It's ugly for another reason: there's already a similar map maintained for the backend so it can generate proper EnclosingClass attributes in class files. However, there's no good way to access that information so I had to duplicate it.

Symbol#originalOwner is a public. Was this insufficient?

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Feb 29, 2016

Contributor

The Symbol#originalOwner has been introduced in scala/scala@0ccdb15 which was included by Scala 2.11.3 for the first time:

$ git tag --contains 0ccdb151ffe9caa9eae7d01a4f2eacc87fa8f5ff
v2.11.3
v2.11.4
v2.11.5
v2.11.6
v2.11.7
v2.12.0-M1
v2.12.0-M2
v2.12.0-M3

I need something that works across 2.10.x, 2.11.x and 2.12.x.

Contributor

gkossakowski commented Feb 29, 2016

The Symbol#originalOwner has been introduced in scala/scala@0ccdb15 which was included by Scala 2.11.3 for the first time:

$ git tag --contains 0ccdb151ffe9caa9eae7d01a4f2eacc87fa8f5ff
v2.11.3
v2.11.4
v2.11.5
v2.11.6
v2.11.7
v2.12.0-M1
v2.12.0-M2
v2.12.0-M3

I need something that works across 2.10.x, 2.11.x and 2.12.x.

@Duhemm

This comment has been minimized.

Show comment
Hide comment
@Duhemm

Duhemm Feb 29, 2016

Contributor

@gkossakowski If you're targeting sbt 1.0, note that the compiler bridge now has separated sources for 2.11 and 2.10.

Contributor

Duhemm commented Feb 29, 2016

@gkossakowski If you're targeting sbt 1.0, note that the compiler bridge now has separated sources for 2.11 and 2.10.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Mar 3, 2016

Member

You mean, you would depend on implicit part of API for both companions and then for regular members you could distinguish between classes and objects?

Not sure about the details, just food for thought at this stage.

Member

retronym commented Mar 3, 2016

You mean, you would depend on implicit part of API for both companions and then for regular members you could distinguish between classes and objects?

Not sure about the details, just food for thought at this stage.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 6, 2016

Contributor

I've collected performance numbers on Scala compiler's codebase. The class-based dependency tracking performs really yielding 4-22x improvements. I'm particularly interested in this code base because the current implementation of the incremental compiler is known to handle cake pattern suboptimally. The new implementation eats the cake.

Below are performance numbers I gathered. I'm excited to make these improvements available to the entire Scala community.

Performance on scala/scala (2.12.x branch)

Add a method foo to Platform in ./scala/tools/nsc/backend/Platform.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 49 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/interactive...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/scaladoc...
[info] Compiling 6 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/repl...
[info] Compiling 11 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/junit...
[success] Total time: 59 s, completed Mar 6, 2016 3:01:25 AM
Name hashing + class-based deps (4x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/repl...
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/junit...
[success] Total time: 15 s, completed Mar 6, 2016 10:37:22 AM

Add a method foo to compiler/scala/reflect/macros/contexts/Reifiers.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 50 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 70 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 68 s, completed Mar 6, 2016 6:14:42 PM
Name hashing + class-based deps (22x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 3 s, completed Mar 6, 2016 6:50:08 PM

Add a method foo to MatchCodeGen in ./scala/tools/nsc/transform/patmat/MatchCodeGen.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 47 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 48 s, completed Mar 6, 2016 7:10:00 PM
Name hashing + class-based deps (2.8x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[info] Compiling 6 Scala sources to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 17 s, completed Mar 6, 2016 7:11:23 PM

Add a method foo to CompilationUnit in ./scala/tools/nsc/CompilationUnits.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 48 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 28 s, completed Mar 6, 2016 7:44:08 PM
Name hashing + class-based deps (14x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 2 s, completed Mar 6, 2016 7:45:12 PM
Contributor

gkossakowski commented Mar 6, 2016

I've collected performance numbers on Scala compiler's codebase. The class-based dependency tracking performs really yielding 4-22x improvements. I'm particularly interested in this code base because the current implementation of the incremental compiler is known to handle cake pattern suboptimally. The new implementation eats the cake.

Below are performance numbers I gathered. I'm excited to make these improvements available to the entire Scala community.

Performance on scala/scala (2.12.x branch)

Add a method foo to Platform in ./scala/tools/nsc/backend/Platform.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 49 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/interactive...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/scaladoc...
[info] Compiling 6 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/repl...
[info] Compiling 11 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/junit...
[success] Total time: 59 s, completed Mar 6, 2016 3:01:25 AM
Name hashing + class-based deps (4x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/repl...
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/junit...
[success] Total time: 15 s, completed Mar 6, 2016 10:37:22 AM

Add a method foo to compiler/scala/reflect/macros/contexts/Reifiers.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 50 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 70 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 68 s, completed Mar 6, 2016 6:14:42 PM
Name hashing + class-based deps (22x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 3 s, completed Mar 6, 2016 6:50:08 PM

Add a method foo to MatchCodeGen in ./scala/tools/nsc/transform/patmat/MatchCodeGen.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 47 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 48 s, completed Mar 6, 2016 7:10:00 PM
Name hashing + class-based deps (2.8x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[info] Compiling 6 Scala sources to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 17 s, completed Mar 6, 2016 7:11:23 PM

Add a method foo to CompilationUnit in ./scala/tools/nsc/CompilationUnits.scala

Name hashing
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 48 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[success] Total time: 28 s, completed Mar 6, 2016 7:44:08 PM
Name hashing + class-based deps (14x speedup)
> compile
[info] Compiling 1 Scala source to /Users/grek/scala/scala-for-class-based/build-sbt/quick/classes/compiler...
[success] Total time: 2 s, completed Mar 6, 2016 7:45:12 PM
@cvogt

This comment has been minimized.

Show comment
Hide comment
@cvogt

cvogt Mar 7, 2016

exciting stuff :)!

cvogt commented Mar 7, 2016

exciting stuff :)!

smarter referenced this issue in adriaanm/scala Mar 8, 2016

integrate sbt's api & dependency extraction phase
quick hack to experiment with stability/perf of api extraction

I tried to stay as close to the way it's done in sbt,
EXCEPT for fusing all traversals into one tree walk
(there's still some local redundant tree.foreach'ing, but should be ok)

i do believe integrating sbt's compiler interface into the compiler is the way forward
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 9, 2016

Contributor

I updated the status of this work. Check out the issue description for progress on implementation and remaining tasks.

I think this work is most likely to ship with sbt 1.0-M1.
@dwijnand, I discussed different options for shipping this work with @eed3si9n and @Duhemm in person during Lightbend's engineering meeting. We determined that I could merge my class-based-depdencies branch into incrementalcompiler repo and ship this with sbt 1.0-M1. My changes remove both the old incremental implementation (predating name hashing) and name hashing. The class-based dependency tracking would be the only implementation available. Do you have concerns with it?

Also, there's a discussion about folding incrementalcompiler repo into zinc. No decision has been made but I'd be in favor of that.

Contributor

gkossakowski commented Mar 9, 2016

I updated the status of this work. Check out the issue description for progress on implementation and remaining tasks.

I think this work is most likely to ship with sbt 1.0-M1.
@dwijnand, I discussed different options for shipping this work with @eed3si9n and @Duhemm in person during Lightbend's engineering meeting. We determined that I could merge my class-based-depdencies branch into incrementalcompiler repo and ship this with sbt 1.0-M1. My changes remove both the old incremental implementation (predating name hashing) and name hashing. The class-based dependency tracking would be the only implementation available. Do you have concerns with it?

Also, there's a discussion about folding incrementalcompiler repo into zinc. No decision has been made but I'd be in favor of that.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Mar 9, 2016

Member

I've no concerns, sounds great to me.

I'm also in favour of merging zinc and incrementalcompiler, and I think zinc is a better name and more known than "sbt's incremental compiler".

Member

dwijnand commented Mar 9, 2016

I've no concerns, sounds great to me.

I'm also in favour of merging zinc and incrementalcompiler, and I think zinc is a better name and more known than "sbt's incremental compiler".

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 9, 2016

Contributor

I'm also in favour of merging zinc and incrementalcompiler, and I think zinc is a better name and more known than "sbt's incremental compiler"

yeah, +1, I had been wondering about this as well

Contributor

SethTisue commented Mar 9, 2016

I'm also in favour of merging zinc and incrementalcompiler, and I think zinc is a better name and more known than "sbt's incremental compiler"

yeah, +1, I had been wondering about this as well

@kiritsuku

This comment has been minimized.

Show comment
Hide comment
@kiritsuku

kiritsuku Mar 9, 2016

Contributor

On 03/09/2016 06:29 PM, Seth Tisue wrote:

I'm also in favour of merging zinc and incrementalcompiler, and I
think zinc is a better name and more known than "sbt's incremental
compiler"

yeah, +1, I had been wondering about this as well

We IDE folks would also welcome this decision. It would significantly
simplify our build if we no longer have to depend on sbt.

Contributor

kiritsuku commented Mar 9, 2016

On 03/09/2016 06:29 PM, Seth Tisue wrote:

I'm also in favour of merging zinc and incrementalcompiler, and I
think zinc is a better name and more known than "sbt's incremental
compiler"

yeah, +1, I had been wondering about this as well

We IDE folks would also welcome this decision. It would significantly
simplify our build if we no longer have to depend on sbt.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 14, 2016

Contributor

The zinc/incrementalcompiler has been renamed to sbt/zinc. The intention is to spend all energy on polishing sbt/zinc and retiring typesafehub/zinc. If you're interested in using incremental compiler outside of sbt, chime in at sbt/zinc#80.

Contributor

gkossakowski commented Mar 14, 2016

The zinc/incrementalcompiler has been renamed to sbt/zinc. The intention is to spend all energy on polishing sbt/zinc and retiring typesafehub/zinc. If you're interested in using incremental compiler outside of sbt, chime in at sbt/zinc#80.

gkossakowski added a commit to sbt/zinc that referenced this issue Mar 16, 2016

Introduce `binaryClassName` relation.
The `binaryClassName` relation maintains mapping between source and binary
class names. This mapping is needed to map binary dependencies back to
source dependencies in case of separate compilation (where we see
dependencies on class files). You can see that mapping being used in
`binaryDependency` method implementation of Analysis callback. Previously,
we would map class file to a source file it was produced from and then
assume that dependency is on any (all) of classes declared in that class.
Introduction of `binaryClassName` lets us map dependency back to source
class name directly and remove that imprecision of dependency tracking.

We maintain mapping between source and binary class names just for
non-local classes. Check this
sbt/sbt#1104 (comment) for the
discussion of local and non-local classes.

We also rework tracking of products in Analysis by introducing explicitly
the concept of local and non-local products corresponding to local and
non-local classes. This helps us to clarify for which classes we track
source and binary class names.

gkossakowski added a commit to sbt/zinc that referenced this issue Mar 16, 2016

Add handling of unused top level imports
Implements a strategy for recording dependencies introduced by top level
imports by assigning those dependencies to the first top level class.
In case there are top level imports but no top level class/trait/object
defined in a compilation unit, a warning is issued. The rationale for this
strategy can be found at:
sbt/sbt#1104 (comment)

Add an unit test covering different cases of top level imports (e.g.
defined in nested packages).

Mark the scripted test source-dependencies/import-class as passing after
a small modification of adding a top level class.

gkossakowski added a commit to sbt/zinc that referenced this issue Mar 16, 2016

Handle dependencies coming from local classes.
Dependencies introduced by local classes require special handling because
we track only non local classes (that can be referred from other files)
in dependency relations. To overcome this problem, dependencies introduced
by a local class are recorded as introduced by an outer class that is
non local. However, this introduces a new problem with dependencies
introduced by inheritance. We don't want local inheritance dependencies
to cause a transitive invalidation of all classes that inherit from the
outer class containing the local class. Check the comment in
Relations.scala this patches introduces or follow the discussion of this
problem at: sbt/sbt#1104 (comment)

To capture the subtlety of inheritance dependencies from local classes,
we introduce `LocalDependencyByInheritance` case to `DependencyContext`
enum.

TestCallback has been modified to return extracted local inheritance
dependencies and a test in DependencySpecification has been updated
accordingly.

The Dependency phase has been reworked to handle local classes properly
by mapping dependencies to outer, non local classes.Check the
implementation for details of the mapping. It's worth mentioning that
mapping is implemented as an amortized O(1) lookup so this change
doesn't affect performance of extraction phase negatively.

The invalidation logic has been modified to take into account inheritance
dependencies introduced by local classes. The patch is small because
the invalidation logic is very straightforward: we invalidate local
inheritance dependencies non-transitively and we do not apply name hashing
dependency pruning.

Lastly, we mark local-class-inheritance scripted test as passing.

gkossakowski added a commit to sbt/zinc that referenced this issue Mar 16, 2016

Track API at class level instead of source file level
This commit changes how tracking of API data structures is done within
the incremental compiler. It changes how APIs are passed around and stored
but doesn't change the behavior of the incremental compiler.

Here's a summary of what has changed and what's still being tracked at the
source file level:

  - APIs are tracked per class name in a newly introduced Companions data
    structure; incremental compiler always considers the pair of companions
    from now on
  - only APIs for top level classes are extracted at the moment;
    invalidation is still imprecise
  - Used names are tracked per source file
  - Name hashes are tracked per top-level class (they're part of
    AnalyzedClass data structure)

Companion class and object have to be considered as a pair because they're
given exactly the same name in the incremental compiler. The idea of
naming classes and objects separately has been discussed and rejected
here: sbt/sbt#1104 (comment)
APIs of companions are linked together in AnalysisCallback. The ExtractAPI
compiler phase continues to extract apis of classes and objects separately.
More on those changes below.

Most changes in this patch are direct consequences of the changes in the
`interface/other` file. The `Source` has been replaced by the
`AnalyzedClass`. The AnalyzedClass carries an extracted api of the
(class, companion object) pair (stored as `Companions`) plus some meta data
about the pair (e.g. hash sum of its api representation). Source used to
carry both hash sum of the source file contents (hash of the text) and a
hash of the api. The hash of the source file has been introduced to
shortcut equality checking before api hashing has been introduced. Now
it's redundant so it's removed.

The `SourceAPI` used to carry information about packages declared in a
source file but this information wasn't used in the incremental compiler
so its tracking is removed.

I also removed an ugly looking `_internalOnly_` prefixes. The changes in
this branch are not binary or source compatible so it's a good opportunity
to perform some cleanups.

AnalysisCallback has a new method `startSource`. It's needed because we
want to track sources that do not declare any classes and for which there's
no `api` call. The `api` method takes `ClassLike` as an argument and can
be called multiple times per a single source file.

The implementation of the AnalysisCallback has been affected by the switch
from Source to AnalyzedClass tracking the most. The main change here is
that AnalysisCallback is responsible for pairing apis of a companion class
and a companion object. It stores apis of classes and objects separately
and does the same of their name hashes. Right before persisting all that
information it puts both apis into Companions data structure and merges
name hashes for a class and its companion object. Merging is performed
only when both class and object with the same name are declared.
It's worth noting why we cannot just compute name hashes once we have a
class and its companion object available instead of merging precomputed
name hashes. We can't do that because APIs are minimized and only their
hashes are stored so names and signatures of members are lost at this
point. We have to computer name hashes before minimization but then we
don't have both of companions available yet.

For that reason `NameHashing.merge` operation has been introduced that
performs a straightforward merge. NameHashingSpecification provides a basic
test for its properties.

The incremental invalidation algorithm has been affected by switching to
class level api tracking. As a consequence, most of invalidation is
happening at class level now and only right before compilation class names
are mapped to source files that declare them. To emphasize that point,
recompilation of classes has been refactored to its own method
`recompileClasses` in the IncrementalCommon. However, we had two make two
exceptions to invalidation performed on class level:

  1. Sources that just has been added and we don't know their declared
     classes yet so we have to scheduled them for compilation as is.
  2. Sources that do not declare any classes have to be scheduled for
     recompilation directly too

This is the reason why `cycle` takes both invalidated classes and modified
srcs as inputs and why `invalidateInitial` computes both. After the first
iteration of `cycle`, the set of modified sources becomes empty and the
remaining of invalidation is performed at the class level only.

Here's a list of changes I think are worth highlighting either for clarity
or to make a point:

  - SameAPI dropped some old, unused code from TopLevel and NameChanges
    classes
  - APIs do not have any reference to `java.io.File` now, this data
    structure operates purely on class names now
  - helpers methods for looking up dependency information from Relations
    has been changed to work on a class level
  - version number in TextAnalysisFormat has been bumped
  - the `inherited_type_params` scripted test has been removed as it looks
    like not testing anything useful and breaks due to changes to the
    `APIs` interface
  - Analyze doesn't store empty apis for Java source files that do not
    declare any classes; we use `AnalysisCallback.startSource` for tracking
  - The Scala 2.8-specific test has been dropped.
  - The test for Ant-style compilation is marked as pending. Supporting of
    Ant-style compilation is tricky because invalidation is happening at
    the class level now.
@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Mar 21, 2016

Contributor

@gkossakowski : using https://github.com/gkossakowski/sbt/commits/class-based-dependencies and scala 2.12 master (scala/scala@ad48e59 right now) I was able to reproduce most of your performance numbers but not the one for Reifiers.scala:

[info] Compiling 1 Scala source to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 295 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[warn] there were 67 deprecation warnings; re-run with -deprecation for details
[warn] there were 32 unchecked warnings; re-run with -unchecked for details
[success] Total time: 80 s, completed 21 mars 2016 14:02:11

I don't know if this is a regression or if something changed in scala master since you did your tests (the biggest change is probably that the new trait encoding was merged: scala/scala#5003)

Contributor

smarter commented Mar 21, 2016

@gkossakowski : using https://github.com/gkossakowski/sbt/commits/class-based-dependencies and scala 2.12 master (scala/scala@ad48e59 right now) I was able to reproduce most of your performance numbers but not the one for Reifiers.scala:

[info] Compiling 1 Scala source to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 2 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[info] Compiling 295 Scala sources to /home/smarter/opt/scala/build-sbt/quick/classes/compiler...
[warn] there were 67 deprecation warnings; re-run with -deprecation for details
[warn] there were 32 unchecked warnings; re-run with -unchecked for details
[success] Total time: 80 s, completed 21 mars 2016 14:02:11

I don't know if this is a regression or if something changed in scala master since you did your tests (the biggest change is probably that the new trait encoding was merged: scala/scala#5003)

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Mar 21, 2016

Contributor

I did some very primitive benchmarking of non-incremental compilation:

  1. Start sbt, run ;clean;compile;clean
  2. Run compile

My first instinct was to blame the poor results in scala/scala due to some weird pattern in the standard library (like the *ViewLike classes which contain many inner classes that extend an outer class, sbt deals poorly with that), but there's still a significant slowdown when compiling just scalac:

  1. Start sbt, run ;clean;compile;compiler/clean
  2. Run compiler/compile
Contributor

smarter commented Mar 21, 2016

I did some very primitive benchmarking of non-incremental compilation:

  1. Start sbt, run ;clean;compile;clean
  2. Run compile

My first instinct was to blame the poor results in scala/scala due to some weird pattern in the standard library (like the *ViewLike classes which contain many inner classes that extend an outer class, sbt deals poorly with that), but there's still a significant slowdown when compiling just scalac:

  1. Start sbt, run ;clean;compile;compiler/clean
  2. Run compiler/compile

smarter added a commit to smarter/zinc that referenced this issue Mar 21, 2016

Add class dependencies based on types of trees
This is necessary to get #87 to work on top of #86

Compared to #86, this branch:
- Does not seem to impact the performance gains in
  incremental compilation
- Makes scala/scala non-incremental compilation 15% slower,
  but #86 is 60% slower than 0.13.11 (see
  sbt/sbt#1104 (comment)),
  so that needs to be fixed first before we analysis this.

TODO:
- More tests similar to inv-subtyping
  - Abstract types
  - Singleton types
  - Tests to verify if we need to use `symbolsInType` or if `typeSymbol`
    enough.
  - ...
@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 22, 2016

Contributor

Thanks for testing it out! I'll check what's going on with the Reifiers.scala case soon.

Regarding the ;clean;compile performance. Does dotty have a more flat structure with less of inner classes being inherited as members of other classes?

Contributor

gkossakowski commented Mar 22, 2016

Thanks for testing it out! I'll check what's going on with the Reifiers.scala case soon.

Regarding the ;clean;compile performance. Does dotty have a more flat structure with less of inner classes being inherited as members of other classes?

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Mar 22, 2016

Contributor

Yes, dotty is pretty flat.

Contributor

smarter commented Mar 22, 2016

Yes, dotty is pretty flat.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 22, 2016

Contributor

I reproduced the problem with Reifiers.scala. It looks like a regression. I'll dig into it tomorrow.

Contributor

gkossakowski commented Mar 22, 2016

I reproduced the problem with Reifiers.scala. It looks like a regression. I'll dig into it tomorrow.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 24, 2016

Contributor

I debugged the problem with Reifiers.scala. I had a bug in invalidation of local inheritance dependencies. I fixed that bug (I'll push changes to my branch tomorrow) but that uncovered a much more intricate bug: sbt/zinc#88. I think it's a new winner for a most elusive bug in incremental compiler. It involves self types, inheritance, refinements, inferred types and repeated parameters. If you remove any of those, then the bug goes away.

Contributor

gkossakowski commented Mar 24, 2016

I debugged the problem with Reifiers.scala. I had a bug in invalidation of local inheritance dependencies. I fixed that bug (I'll push changes to my branch tomorrow) but that uncovered a much more intricate bug: sbt/zinc#88. I think it's a new winner for a most elusive bug in incremental compiler. It involves self types, inheritance, refinements, inferred types and repeated parameters. If you remove any of those, then the bug goes away.

@gkossakowski

This comment has been minimized.

Show comment
Hide comment
@gkossakowski

gkossakowski Mar 25, 2016

Contributor

I pushed fixes for local dependencies to my branch. As a work-around for sbt/zinc#88 I applied a patch included at the bottom. Here's the updated result I get with Reifiers.scala scenario:

[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...

The work-around for sbt/zinc#88:

diff --git a/src/compiler/scala/tools/nsc/typechecker/Macros.scala b/src/compiler/scala/tools/nsc/typechecker/Macros.scala
index bcf9e01..e6b14bd 100644
--- a/src/compiler/scala/tools/nsc/typechecker/Macros.scala
+++ b/src/compiler/scala/tools/nsc/typechecker/Macros.scala
@@ -496,7 +496,7 @@ trait Macros extends MacroRuntimes with Traces with Helpers {
   /** Keeps track of macros in-flight.
    *  See more informations in comments to `openMacros` in `scala.reflect.macros.whitebox.Context`.
    */
-  var _openMacros = List[MacroContext]()
+  var _openMacros: List[MacroContext] = Nil
   def openMacros = _openMacros
   def pushMacroContext(c: MacroContext) = _openMacros ::= c
   def popMacroContext() = _openMacros = _openMacros.tail
Contributor

gkossakowski commented Mar 25, 2016

I pushed fixes for local dependencies to my branch. As a work-around for sbt/zinc#88 I applied a patch included at the bottom. Here's the updated result I get with Reifiers.scala scenario:

[info] Compiling 1 Scala source to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...
[info] Compiling 3 Scala sources to /Users/grek/scala/scala/build-sbt/quick/classes/compiler...

The work-around for sbt/zinc#88:

diff --git a/src/compiler/scala/tools/nsc/typechecker/Macros.scala b/src/compiler/scala/tools/nsc/typechecker/Macros.scala
index bcf9e01..e6b14bd 100644
--- a/src/compiler/scala/tools/nsc/typechecker/Macros.scala
+++ b/src/compiler/scala/tools/nsc/typechecker/Macros.scala
@@ -496,7 +496,7 @@ trait Macros extends MacroRuntimes with Traces with Helpers {
   /** Keeps track of macros in-flight.
    *  See more informations in comments to `openMacros` in `scala.reflect.macros.whitebox.Context`.
    */
-  var _openMacros = List[MacroContext]()
+  var _openMacros: List[MacroContext] = Nil
   def openMacros = _openMacros
   def pushMacroContext(c: MacroContext) = _openMacros ::= c
   def popMacroContext() = _openMacros = _openMacros.tail

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Invalidate dependencies at class level.
This commit implements invalidation of dependencies at class level both
for inheritance and memberRef dependencies. However, information that
determines api changes is still tracked mostly at source file level.

Name hashes are tracked at source level (classes considered for
invalidation are mapped back to source files before name hash lookup).
For that reason, #2319 is still not fixed.

Api hashes are tracked per-class but only for top-level classes. It's not
a final design but a quick hack to prove that class-based invalidation is
working correctly. It was enough to scenario described in #2320 fixed.
This is shown in `class-based-inheritance` scripted test.
The final design of api hashing will be that all publicly accessible
classes will get their own separate hash.

The sealed test is marked pending as described in notes of #1104.

Other thing to note is that we're getting close to being able to strip
abstractions that deal with the fact that dependencies were tracked either
as Relation[File, File] (for internal dependencies) or
Relation[File, String] (for external dependencies). We track all
dependencies uniformly as Relation[String, String] now.

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Introduce `binaryClassName` relation.
The `binaryClassName` relation maintains mapping between source and binary
class names. This mapping is needed to map binary dependencies back to
source dependencies in case of separate compilation (where we see
dependencies on class files). You can see that mapping being used in
`binaryDependency` method implementation of Analysis callback. Previously,
we would map class file to a source file it was produced from and then
assume that dependency is on any (all) of classes declared in that class.
Introduction of `binaryClassName` lets us map dependency back to source
class name directly and remove that imprecision of dependency tracking.

We maintain mapping between source and binary class names just for
non-local classes. Check this
sbt#1104 (comment) for the
discussion of local and non-local classes.

We also rework tracking of products in Analysis by introducing explicitly
the concept of local and non-local products corresponding to local and
non-local classes. This helps us to clarify for which classes we track
source and binary class names.

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Add handling of unused top level imports
Implements a strategy for recording dependencies introduced by top level
imports by assigning those dependencies to the first top level class.
In case there are top level imports but no top level class/trait/object
defined in a compilation unit, a warning is issued. The rationale for this
strategy can be found at:
sbt#1104 (comment)

Add an unit test covering different cases of top level imports (e.g.
defined in nested packages).

Mark the scripted test source-dependencies/import-class as passing after
a small modification of adding a top level class.

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Add a pending test for inheritance by a local class
Add a pending scripted test that checks if dependencies introduced by
inheritance by local classes are handled properly. Check the comment in
the test and this discussion:
sbt#1104 (comment) for more
details.

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Handle dependencies coming from local classes.
Dependencies introduced by local classes require special handling because
we track only non local classes (that can be referred from other files)
in dependency relations. To overcome this problem, dependencies introduced
by a local class are recorded as introduced by an outer class that is
non local. However, this introduces a new problem with dependencies
introduced by inheritance. We don't want local inheritance dependencies
to cause a transitive invalidation of all classes that inherit from the
outer class containing the local class. Check the comment in
Relations.scala this patches introduces or follow the discussion of this
problem at: sbt#1104 (comment)

To capture the subtlety of inheritance dependencies from local classes,
we introduce `LocalDependencyByInheritance` case to `DependencyContext`
enum.

TestCallback has been modified to return extracted local inheritance
dependencies and a test in DependencySpecification has been updated
accordingly.

The Dependency phase has been reworked to handle local classes properly
by mapping dependencies to outer, non local classes.Check the
implementation for details of the mapping. It's worth mentioning that
mapping is implemented as an amortized O(1) lookup so this change
doesn't affect performance of extraction phase negatively.

The invalidation logic has been modified to take into account inheritance
dependencies introduced by local classes. The patch is small because
the invalidation logic is very straightforward: we invalidate local
inheritance dependencies non-transitively and we do not apply name hashing
dependency pruning.

Lastly, we mark local-class-inheritance scripted test as passing.

eed3si9n added a commit to eed3si9n/sbt that referenced this issue Apr 28, 2016

Track API at class level instead of source file level
This commit changes how tracking of API data structures is done within
the incremental compiler. It changes how APIs are passed around and stored
but doesn't change the behavior of the incremental compiler.

Here's a summary of what has changed and what's still being tracked at the
source file level:

  - APIs are tracked per class name in a newly introduced Companions data
    structure; incremental compiler always considers the pair of companions
    from now on
  - only APIs for top level classes are extracted at the moment;
    invalidation is still imprecise
  - Used names are tracked per source file
  - Name hashes are tracked per top-level class (they're part of
    AnalyzedClass data structure)

Companion class and object have to be considered as a pair because they're
given exactly the same name in the incremental compiler. The idea of
naming classes and objects separately has been discussed and rejected
here: sbt#1104 (comment)
APIs of companions are linked together in AnalysisCallback. The ExtractAPI
compiler phase continues to extract apis of classes and objects separately.
More on those changes below.

Most changes in this patch are direct consequences of the changes in the
`interface/other` file. The `Source` has been replaced by the
`AnalyzedClass`. The AnalyzedClass carries an extracted api of the
(class, companion object) pair (stored as `Companions`) plus some meta data
about the pair (e.g. hash sum of its api representation). Source used to
carry both hash sum of the source file contents (hash of the text) and a
hash of the api. The hash of the source file has been introduced to
shortcut equality checking before api hashing has been introduced. Now
it's redundant so it's removed.

The `SourceAPI` used to carry information about packages declared in a
source file but this information wasn't used in the incremental compiler
so its tracking is removed.

I also removed an ugly looking `_internalOnly_` prefixes. The changes in
this branch are not binary or source compatible so it's a good opportunity
to perform some cleanups.

AnalysisCallback has a new method `startSource`. It's needed because we
want to track sources that do not declare any classes and for which there's
no `api` call. The `api` method takes `ClassLike` as an argument and can
be called multiple times per a single source file.

The implementation of the AnalysisCallback has been affected by the switch
from Source to AnalyzedClass tracking the most. The main change here is
that AnalysisCallback is responsible for pairing apis of a companion class
and a companion object. It stores apis of classes and objects separately
and does the same of their name hashes. Right before persisting all that
information it puts both apis into Companions data structure and merges
name hashes for a class and its companion object. Merging is performed
only when both class and object with the same name are declared.
It's worth noting why we cannot just compute name hashes once we have a
class and its companion object available instead of merging precomputed
name hashes. We can't do that because APIs are minimized and only their
hashes are stored so names and signatures of members are lost at this
point. We have to computer name hashes before minimization but then we
don't have both of companions available yet.

For that reason `NameHashing.merge` operation has been introduced that
performs a straightforward merge. NameHashingSpecification provides a basic
test for its properties.

The incremental invalidation algorithm has been affected by switching to
class level api tracking. As a consequence, most of invalidation is
happening at class level now and only right before compilation class names
are mapped to source files that declare them. To emphasize that point,
recompilation of classes has been refactored to its own method
`recompileClasses` in the IncrementalCommon. However, we had two make two
exceptions to invalidation performed on class level:

  1. Sources that just has been added and we don't know their declared
     classes yet so we have to scheduled them for compilation as is.
  2. Sources that do not declare any classes have to be scheduled for
     recompilation directly too

This is the reason why `cycle` takes both invalidated classes and modified
srcs as inputs and why `invalidateInitial` computes both. After the first
iteration of `cycle`, the set of modified sources becomes empty and the
remaining of invalidation is performed at the class level only.

Here's a list of changes I think are worth highlighting either for clarity
or to make a point:

  - SameAPI dropped some old, unused code from TopLevel and NameChanges
    classes
  - APIs do not have any reference to `java.io.File` now, this data
    structure operates purely on class names now
  - helpers methods for looking up dependency information from Relations
    has been changed to work on a class level
  - version number in TextAnalysisFormat has been bumped
  - the `inherited_type_params` scripted test has been removed as it looks
    like not testing anything useful and breaks due to changes to the
    `APIs` interface
  - Analyze doesn't store empty apis for Java source files that do not
    declare any classes; we use `AnalysisCallback.startSource` for tracking
  - The Scala 2.8-specific test has been dropped.
  - The test for Ant-style compilation is marked as pending. Supporting of
    Ant-style compilation is tricky because invalidation is happening at
    the class level now.
@paulp

This comment has been minimized.

Show comment
Hide comment
@paulp

paulp Aug 29, 2016

Contributor

Couldn't we just append $ to simple names of objects and track dependencies on objects and classes separately? Changing the naming scheme is easy.

Ha ha, easy. Maybe this ticket will eventually make sense to someone other than me.

Contributor

paulp commented Aug 29, 2016

Couldn't we just append $ to simple names of objects and track dependencies on objects and classes separately? Changing the naming scheme is easy.

Ha ha, easy. Maybe this ticket will eventually make sense to someone other than me.

@jvican

This comment has been minimized.

Show comment
Hide comment
@jvican

jvican Dec 22, 2016

Member

This looks awesome. It is a major improvement in the Scala development and I'm excited to see it coming in sbt 1.0. To the best of my knowledge, @gkossakowski is not working on this anymore. Is there anyone else taking care of finishing this ticket off? Is this blocked by something?

Member

jvican commented Dec 22, 2016

This looks awesome. It is a major improvement in the Scala development and I'm excited to see it coming in sbt 1.0. To the best of my knowledge, @gkossakowski is not working on this anymore. Is there anyone else taking care of finishing this ticket off? Is this blocked by something?

@dwijnand

This comment has been minimized.

Show comment
Hide comment

@dwijnand dwijnand closed this Mar 27, 2017

@gkossakowski gkossakowski referenced this issue in gkossakowski/kentuckymule Oct 14, 2017

Open

Inch toward the finish line: As Seen From & Scala std lib processing #6

14 of 21 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment