Improve performance of loading used names from persisted Analysis file #995

dwijnand · 2021-07-14T16:16:19Z

internal/zinc-core/src/main/scala/sbt/internal/inc/MemberRefInvalidator.scala

And as a side-effect of working in the area: make it more efficient.

eed3si9n

Thanks!

internal/zinc-core/src/main/scala/sbt/internal/inc/MemberRefInvalidator.scala

This is easier to reason about for Hydra which runs this phase in parallel. Now that we're not creating a full Relation we have some performance budget to spend.

retronym · 2021-07-27T06:32:02Z

internal/zinc-core/src/main/scala/sbt/internal/inc/Incremental.scala

@@ -626,7 +626,7 @@ private final class AnalysisCallback(
  private[this] val objectApis = new TrieMap[String, ApiInfo]
  private[this] val classPublicNameHashes = new TrieMap[String, Array[NameHash]]
  private[this] val objectPublicNameHashes = new TrieMap[String, Array[NameHash]]
-  private[this] val usedNames = new RelationBuilder[String, UsedName]
+  private[this] val usedNames = new TrieMap[String, ConcurrentSet[UsedName]]


This is a very late pong to this ping from @dotta

internal/zinc-core/src/main/scala/sbt/internal/inc/Incremental.scala

retronym · 2021-07-28T03:12:06Z

Running this benchmark prior:

package sbt.internal.inc

import org.openjdk.jmh.annotations.{
  Benchmark,
  BenchmarkMode,
  Fork,
  Measurement,
  Mode,
  Param,
  Scope,
  Setup,
  State,
  Warmup
}
import xsbti.UseScope

import java.io.File
import java.util.concurrent.TimeUnit

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3)
class AnalysisSerializationBenchmark {
  @Param(Array("/Users/jz/code/scala/target/compiler/zinc/inc_compile.zip"))
  var analysisFile: String = _
  var firstClassName: String = _

  @Setup
  def setup(): Unit = {
    val store = FileAnalysisStore.binary(new File(analysisFile))
    val analysis = store.get().get().getAnalysis.asInstanceOf[Analysis]
    firstClassName = analysis.relations.classes._2s.head
  }

  @Benchmark def deserialize() = {
    val store = FileAnalysisStore.binary(new File(analysisFile))
    val analysis = store.get().get().getAnalysis.asInstanceOf[Analysis]
    val usedNames = analysis.relations.names
    val mod = ModifiedNames(
      Set(
        UsedName.make(
          "name_that_does_not_exist_QWERTY",
          java.util.EnumSet.allOf[UseScope](classOf[UseScope])
        )
      )
    )
    usedNames.forward(firstClassName).iterator.exists(mod.isModified)
  }
}

And the slightly modified version post:

 @Benchmark def deserialize() = {
    val store = FileAnalysisStore.binary(new File(analysisFile))
    val analysis = store.get().get().getAnalysis.asInstanceOf[Analysis]
    val usedNames = analysis.relations.names
    usedNames.hasAffectedNames(
      ModifiedNames(
        Set(
          UsedName.make(
            "name_that_does_not_exist_QWERTY",
            java.util.EnumSet.allOf[UseScope](classOf[UseScope])
          )
        )
      ),
      firstClassName
    )
  }

Shows:

Baseline

[info] Benchmark                                                                                                               (analysisFile)  Mode  Cnt          Score         Error   Units
[info] AnalysisSerializationBenchmark.deserialize                                   /Users/jz/code/scala/target/compiler/zinc/inc_compile.zip  avgt   20          0.329 ±       0.020    s/op
[info] AnalysisSerializationBenchmark.deserialize:·gc.alloc.rate.norm               /Users/jz/code/scala/target/compiler/zinc/inc_compile.zip  avgt   20  205411175.467 ± 3736502.161    B/op

Post

[info] Benchmark                                                                                                               (analysisFile)  Mode  Cnt          Score          Error   Units
[info] AnalysisSerializationBenchmark.deserialize                                   /Users/jz/code/scala/target/compiler/zinc/inc_compile.zip  avgt   20          0.227 ±        0.010    s/op
[info] AnalysisSerializationBenchmark.deserialize:·gc.alloc.rate.norm               /Users/jz/code/scala/target/compiler/zinc/inc_compile.zip  avgt   20  170022246.320 ±   526307.356    B/op

Post "only intern used names locally ..."

[info] Benchmark                                                                              (analysisFile)  Mode  Cnt  Score   Error  Units
[info] AnalysisSerializationBenchmark.deserialize  /Users/jz/code/scala/target/compiler/zinc/inc_compile.zip  avgt   30  0.186 ± 0.008   s/op

retronym · 2021-07-29T02:25:33Z

Test failure, I think unrelated:

 [info] - should not compile Java for no-op *** FAILED ***
[info]   java.lang.RuntimeException: java.lang.IllegalArgumentException: MALFORMED
[info]   at com.sun.tools.javac.main.Main.compile(Main.java:559)
[info]   at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
[info]   at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
[info]   at sbt.internal.inc.javac.LocalJavaCompiler.run(LocalJava.scala:345)
[info]   at sbt.internal.inc.javac.AnalyzingJavaCompiler.$anonfun$compile$12(AnalyzingJavaCompiler.scala:172)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at sbt.internal.inc.javac.AnalyzingJavaCompiler.timed(AnalyzingJavaCompiler.scala:262)
[info]   at sbt.internal.inc.javac.AnalyzingJavaCompiler.compile(AnalyzingJavaCompiler.scala:161)
[info]   at sbt.internal.inc.MixedAnalyzingCompiler.$anonfun$compileJava$2(MixedAnalyzingCompiler.scala:103)
[info]   at sbt.internal.inc.MixedAnalyzingCompiler.$anonfun$compileJava$2$adapted(MixedAnalyzingCompiler.scala:91)
[info]   ...
[info]   Cause: java.lang.IllegalArgumentException: MALFORMED
[info]   at java.util.zip.ZipCoder.toString(ZipCoder.java:58)
[info]   at java.util.zip.ZipFile.getZipEntry(ZipFile.java:599)
[info]   at java.util.zip.ZipFile.access$900(ZipFile.java:60)
[info]   at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:539)
[info]   at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:514)
[info]   at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:495)
[info]   at com.sun.tools.javac.file.ZipArchive.initMap(ZipArchive.java:77)
[info]   at com.sun.tools.javac.file.ZipArchive.<init>(ZipArchive.java:70)
[info]   at com.sun.tools.javac.file.ZipArchive.<init>(ZipArchive.java:62)
[info]   at com.sun.tools.javac.file.JavacFileManager.openArchive(JavacFileManager.java:526)
[info]   ...

retronym · 2021-07-29T07:39:27Z

I did a little more analysis of variations of used-names interning to convince myself that the new, faster version of is "good enough" for footprint reduction.

package sbt.inc.binary

import sbt.internal.inc.FileAnalysisStore

import java.io.File

object Scratch {
  def main(args: Array[String]): Unit = {
    val files = List(
      "/Users/jz/code/scala/target/partest/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/scaladoc/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/replFrontend/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/reflect/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/repl/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/library/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/testkit/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/scalap/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/interactive/zinc/inc_compile.zip",
      "/Users/jz/code/scala/target/compiler/zinc/inc_compile.zip"
    )
    val analysisList = // comment out this line for the "ZERO" variation
      files.map(f => FileAnalysisStore.binary(new File(f)).get().get().getAnalysis)
    println("Loaded all analysis files")
    while (true) {
      Thread.sleep(1000) // run: jcmd $PID GC.heap_dump /tmp/dump.hprof and open with IntelliJ or another tool
    }
  }
}

  private class StringTable {
    private val strings = new JHashMap[String, String]()
    def lookupOrEnter(string: String): String = {
      string.intern() // STRING INTERN, or
      string          // NO INTERN, or 
      strings.putIfAbsent(string, string) match { // LOCAL INTERN
        case null => string
        case v    => v
      }
    }
  }

Variation	Description	Footprint (bytes)
ZERO	Measure the baseline footprint by loading then discaring the Analysis files	5,012,612
NO INTERN	No interning	277,412,693
STRING INTERN	Use j.l.String.intern (status quo)	231,665,921
LOCAL INTERN	This PR, intern with the scope of an each Analysis	235,342,817

In summary, I think this PR makes the right tradeoff.

The compressed protobuf files amount to 9.5M on disk, so we have a 25x higher footprint as Java objects. This is work some more investigation to look see if we can be more compact in memory.

retronym · 2021-07-30T02:09:28Z

We can reduce that footprint by 20% by avoiding the ArrayList[Schema.UseScope] which comes from the protobuf bindings over:

message UsedName {
    string name = 1;
    repeated UseScope scopes = 2;
}

enum UseScope {
    DEFAULT = 0;
    IMPLICIT = 1;
    PATMAT = 2;
}

It would be preferable if the bindings automatically represented this with an EnumSet or just a raw bitmask.

We can manually do the bitmasking ourselves though:

retronym#5

The overhead would also be eliminated if we went ahead with another idea we considered:

message UsedNames {
    repeated string regular = 1;
    repeated string implicit = 2;
    repeated string patmat = 3;
}

That would also get rid of a further 8% overhead of the Schema.UsedName instances.

Switch used names to a single-direction Map

538ddae

eed3si9n reviewed Jul 14, 2021

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/MemberRefInvalidator.scala Outdated Show resolved Hide resolved

retronym force-pushed the cheaper-usedNames branch 2 times, most recently from ddbae9e to d53f2e1 Compare July 15, 2021 04:02

Avoid re-using empty iterator in debug logging

57bde9e

retronym force-pushed the cheaper-usedNames branch from d53f2e1 to 57bde9e Compare July 15, 2021 04:03

Fix multi-map size calculation in RelationsTextFormat

7185814

And as a side-effect of working in the area: make it more efficient.

eed3si9n approved these changes Jul 16, 2021

View reviewed changes

Define UseNames, wrapping the Protobuf java map

aa41144

dwijnand requested a review from retronym July 26, 2021 13:33

retronym reviewed Jul 27, 2021

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/MemberRefInvalidator.scala Outdated Show resolved Hide resolved

retronym added 2 commits July 27, 2021 16:21

Revert to concurrent map for used names collection

92c17c2

This is easier to reason about for Hydra which runs this phase in parallel. Now that we're not creating a full Relation we have some performance budget to spend.

Also avoid creating debug string when logging isn't enabled.

1a01ef4

retronym reviewed Jul 27, 2021

View reviewed changes

dwijnand commented Jul 27, 2021

View reviewed changes

internal/zinc-core/src/main/scala/sbt/internal/inc/Incremental.scala Outdated Show resolved Hide resolved

2.12/2.13 cross compatible version of Incremental.usedNames

d581aa6

retronym approved these changes Jul 28, 2021

View reviewed changes

retronym added 2 commits July 28, 2021 13:46

Only intern used name strings locally within an Analysis

a267805

Refactor string deduplication

1df0f65

retronym merged commit 9ae025d into sbt:develop Jul 29, 2021

retronym changed the title ~~Switch used names to a single-direction Map~~ Improve performance of loading used names from persisted Analysis file Jul 29, 2021

dwijnand deleted the cheaper-usedNames branch July 29, 2021 07:24

eed3si9n added this to the 1.6.0 milestone Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of loading used names from persisted Analysis file #995

Improve performance of loading used names from persisted Analysis file #995

dwijnand commented Jul 14, 2021

eed3si9n left a comment

retronym Jul 27, 2021

retronym commented Jul 28, 2021 •

edited

retronym commented Jul 29, 2021

retronym commented Jul 29, 2021 •

edited

retronym commented Jul 30, 2021

Improve performance of loading used names from persisted Analysis file #995

Improve performance of loading used names from persisted Analysis file #995

Conversation

dwijnand commented Jul 14, 2021

eed3si9n left a comment

Choose a reason for hiding this comment

retronym Jul 27, 2021

Choose a reason for hiding this comment

retronym commented Jul 28, 2021 • edited

Baseline

Post

Post "only intern used names locally ..."

retronym commented Jul 29, 2021

retronym commented Jul 29, 2021 • edited

retronym commented Jul 30, 2021

retronym commented Jul 28, 2021 •

edited

retronym commented Jul 29, 2021 •

edited