Replace use of Scala collections by Java's #253

wlk · 2017-03-03T17:28:03Z

issue #245

Tested on Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz

shapeless benchmark on 1.0:

[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_50a6156d  sample   18       18956.390 ±        470.882   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_50a6156d  sample            18018.730                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_50a6156d  sample            19142.803                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_50a6156d  sample            19555.523                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_50a6156d  sample   18         176.279 ±          4.358  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_50a6156d  sample   18  3598116903.111 ±    8115563.410    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_50a6156d  sample   18           0.005 ±          0.006  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_50a6156d  sample   18      110556.889 ±     121365.225    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_50a6156d  sample   18           0.038 ±          0.042  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_50a6156d  sample   18      796124.000 ±     874132.490    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_50a6156d  sample   18         176.622 ±         20.857  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_50a6156d  sample   18  3611208362.667 ±  469187660.582    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_50a6156d  sample   18           8.756 ±          9.432  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_50a6156d  sample   18   181558790.222 ±  195372645.622    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_50a6156d  sample   18           3.848 ±          2.094  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_50a6156d  sample   18    79276089.778 ±   44103533.796    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_50a6156d  sample   18          69.000                   counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_50a6156d  sample   18       11534.000                       ms
[info] WarmShapelessBenchmark.compile                                       /tmp/sbt_50a6156d  sample    3       53955.527 ±      89475.604   ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.00                         /tmp/sbt_50a6156d  sample            49190.797                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.50                         /tmp/sbt_50a6156d  sample            53687.091                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.90                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.95                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.99                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.999                        /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.9999                       /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p1.00                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_50a6156d  sample    3          69.162 ±        118.303  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_50a6156d  sample    3  3964893128.000 ±  464895464.749    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_50a6156d  sample    3          66.717 ±        172.841  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_50a6156d  sample    3  3813975968.000 ± 4155442809.041    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_50a6156d  sample    3           2.276 ±          3.402  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_50a6156d  sample    3   130799010.667 ±  172919291.284    B/op
[info] WarmShapelessBenchmark.compile:·gc.count                             /tmp/sbt_50a6156d  sample    3          62.000                   counts
[info] WarmShapelessBenchmark.compile:·gc.time                              /tmp/sbt_50a6156d  sample    3        6208.000                       ms
[info] ColdShapelessBenchmark.compile                                       /tmp/sbt_50a6156d      ss   10       48121.431 ±       4706.772   ms/op
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_50a6156d      ss   10          81.244 ±          7.803  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_50a6156d      ss   10  4169800629.600 ±   45813074.993    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_50a6156d      ss   10          76.312 ±          9.530  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_50a6156d      ss   10  3915401249.600 ±  295384574.736    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_50a6156d      ss   10           2.624 ±          0.343  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_50a6156d      ss   10   134746036.800 ±   13018312.058    B/op
[info] ColdShapelessBenchmark.compile:·gc.count                             /tmp/sbt_50a6156d      ss   10         217.000                   counts
[info] ColdShapelessBenchmark.compile:·gc.time                              /tmp/sbt_50a6156d      ss   10       20536.000                       ms

shapeless benchmark on issue-245:

[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_f45f970f  sample   18       18440.025 ±        359.687   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_f45f970f  sample            17750.295                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_f45f970f  sample            18522.046                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_f45f970f  sample            18911.278                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_f45f970f  sample   18         181.299 ±          3.589  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_f45f970f  sample   18  3603656596.889 ±   13617055.744    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_f45f970f  sample   18           0.006 ±          0.006  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_f45f970f  sample   18      123574.667 ±     121374.571    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_f45f970f  sample   18           0.044 ±          0.043  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_f45f970f  sample   18      887152.444 ±     871235.644    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_f45f970f  sample   18         179.641 ±         24.247  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_f45f970f  sample   18  3568843984.889 ±  463623900.493    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_f45f970f  sample   18          10.104 ±          9.729  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_f45f970f  sample   18   204088802.667 ±  196477548.292    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_f45f970f  sample   18           4.046 ±          2.169  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_f45f970f  sample   18    81169088.889 ±   44090551.274    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_f45f970f  sample   18          69.000                   counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_f45f970f  sample   18       10385.000                       ms
[info] WarmShapelessBenchmark.compile                                       /tmp/sbt_f45f970f  sample    3       47624.924 ±      27981.257   ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.00                         /tmp/sbt_f45f970f  sample            46640.660                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.50                         /tmp/sbt_f45f970f  sample            46841.987                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.90                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.95                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.99                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.999                        /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.9999                       /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p1.00                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_f45f970f  sample    3          78.273 ±         46.764  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_f45f970f  sample    3  3992337394.667 ±  248947971.262    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_f45f970f  sample    3          75.365 ±         83.725  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_f45f970f  sample    3  3847142618.667 ± 4949482181.092    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_f45f970f  sample    3           2.590 ±          6.039  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_f45f970f  sample    3   131857816.000 ±  240929409.334    B/op
[info] WarmShapelessBenchmark.compile:·gc.count                             /tmp/sbt_f45f970f  sample    3          61.000                   counts
[info] WarmShapelessBenchmark.compile:·gc.time                              /tmp/sbt_f45f970f  sample    3        5535.000                       ms
[info] ColdShapelessBenchmark.compile                                       /tmp/sbt_f45f970f      ss   10       44379.722 ±       3177.612   ms/op
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_f45f970f      ss   10          87.801 ±          5.691  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_f45f970f      ss   10  4168487485.600 ±   58228600.899    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_f45f970f      ss   10          81.738 ±          4.820  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_f45f970f      ss   10  3887614885.600 ±  360562942.522    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_f45f970f      ss   10           2.804 ±          0.359  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_f45f970f      ss   10   133293604.800 ±   18777124.672    B/op
[info] ColdShapelessBenchmark.compile:·gc.count                             /tmp/sbt_f45f970f      ss   10         211.000                   counts
[info] ColdShapelessBenchmark.compile:·gc.time                              /tmp/sbt_f45f970f      ss   10       18743.000                       ms

romanowski

I don't know if my comments will provide any speedup in the end so all can be ignored (since except for them code LGTM).

romanowski · 2017-03-03T18:46:21Z

internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala

-              cache += owner -> (handler -> newVisited)
-              visited = newVisited
-              visitedOwner = owner
+          if (cache.containsKey(owner)) {


can always call get and check if this is not a null? I know this is ugly but we save multiple some set operation then.

romanowski · 2017-03-03T18:49:05Z

internal/compiler-bridge/src/main/scala/xsbt/ExtractUsedNames.scala

-          usedNamesFromClasses.put(className, emptySet)
-          emptySet
+    def usedNamesFromClass(className: Name): util.HashSet[Name] = {
+      if (usedNamesFromClasses.containsKey(className)) {


same as above :)

romanowski · 2017-03-03T18:49:32Z

internal/compiler-bridge/src/main/scala/xsbt/ExtractUsedNames.scala

-              val newVisited = mutable.HashSet.empty[Type]
-              visited = newVisited
-              ownersCache += owner -> newVisited
+          if (ownersCache.containsKey(owner)) {


jvican

This work is really good, thank you @wlk.

This PR reduces 700ms for Shapeless, which I consider an excellent improvement for such a small microptimization. I propose that we merge as soon as CI is green.

jvican · 2017-03-03T23:32:20Z

internal/compiler-bridge/src/main/scala/xsbt/API.scala

-        case (className: String, names: Iterable[String]) =>
-          names foreach { (name: String) => callback.usedName(className, name) }
+
+      debuglog("The " + sourceFile + " contains the following used names:\n " + debugOutput(allUsedNames))


Since we're changing this code, minor suggestion: could we turn this into interpolated strings? As in s"The $sourceFile ... ".

jvican · 2017-03-03T23:35:47Z

internal/compiler-bridge/src/main/scala/xsbt/ExtractUsedNames.scala

@@ -49,29 +52,39 @@ import Compat._
 class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers {
  import global._

-  def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = {
+  def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = {


What do you think if we keep these as Java iterables? It could help us hide how they are actually implemented.

Fixed, parially, I can replace the HashMap with Map, but replacing ArrayList with Iterable gives me this:

Error:(87, 5) type mismatch; found : java.util.HashMap[String,java.util.ArrayList[String]] required: java.util.Map[String,Iterable[String]] Note: java.util.ArrayList[String] <: Iterable[String], but Java-defined trait Map is invariant in type V. You may wish to investigate a wildcard type such as `_ <: Iterable[String]`. (SLS 3.2.10) result

update, fixed this one, but requires using scala.lang.existentials

I had to fall back to using concrete types because I was unable to make this work with Java's forEach

Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach. Don't worry about this!

jvican · 2017-03-03T23:47:06Z

internal/compiler-bridge/src/main/scala/xsbt/ExtractUsedNames.scala

+      val usedName = it.next()
+      val usedNameKey = usedName.getKey.toString.trim
+      val usedNameValues = usedName.getValue.iterator()
+      val uses = new ArrayList[String]()


Interestingly, it seems that we're intiializing an array list here every time we iterate through the class name entries. It would be nice if we could remove it.

It happens that HashSet has a constant size method, so do you think we could use that size to create an array of that concrete length and then change the return type of extract? This would buy us the fact that we are not paying the cost of expanding the array list when the capacity is full because we know that size a priori. It can also help us with some spacial cache locality.

I'm not sure if I see what you mean by removing creation of ArrayList per iteration, do you have something specific in mind?

The other issue is fixed

jvican · 2017-03-03T23:48:32Z

internal/compiler-bridge/src/test/scala/xsbt/ExtractUsedNamesSpecification.scala

@@ -13,6 +13,8 @@ class ExtractUsedNamesSpecification extends UnitSpec {
    val usedNames = compilerForTesting.extractUsedNamesFromSrc(src)
    val expectedNames = standardNames ++ Set("a", "A", "A2", "b")
    // names used at top level are attributed to the first class defined in a compilation unit
+
+    println(usedNames)


We can probably remove this 😄.

wlk · 2017-03-04T00:30:51Z

I need to take a look at the compilation problem, not sure why it runs locally OK

dwijnand · 2017-03-04T00:35:01Z

"53955.527 ± 89475.604 ms/op" this measurement is unusable.

For the hot compilation performance goes from "18956.390 ± 470.882 ms/op" to 18440.025 ± 359.687 ms/op". The difference is about the size of the error margin. Is that good enough?

Also, given there have been more changes/commits, we should regenerate the benchmark results before jumping to conclusions about whether this compromise is worth merging.

At least I think, but clearly I'm in the minority here.

jvican · 2017-03-04T10:53:45Z

For the hot compilation performance goes from "18956.390 ± 470.882 ms/op" to 18440.025 ± 359.687 ms/op". The difference is about the size of the error margin. Is that good enough?

Yeah, I think this is fine.

Also, given there have been more changes/commits, we should regenerate the benchmark results before jumping to conclusions about whether this compromise is worth merging.

I don't think we have to ask for new benchmarks every time a new change is committed, especifically if it does not affect the core logic... However, I'll be running this on a controlled environment and report on more results. But I think these changes are perfect to go.

wlk · 2017-03-05T18:49:31Z

There was one complication that I didn't see before: Java collections have forEach, but the scala interop is not ideal here, so I had to implement java.util.function.BiConsumer and java.util.function.Consumer in order to use forEach - afaik this will work in 2.12 - maybe you can advise if this is OK or maybe there's better way to do this?

wlk · 2017-03-05T19:14:59Z

please take a look @jvican

jvican

maybe you can advise if this is OK or maybe there's better way to do this?

I've just pushed a new commit that solves minor issues, including this one 😉. So you don't have to worry. I think the PR is perfect.

I have a busy agenda today, but I'll make sure that I run the benchmarks before tomorrow so that this can get in as soon as possible.

Again, great work.

jvican · 2017-03-06T12:45:20Z

internal/compiler-bridge/src/main/scala/xsbt/ExtractUsedNames.scala

@@ -49,29 +52,39 @@ import Compat._
 class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers {
  import global._

-  def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = {
+  def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = {


Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach. Don't worry about this!

jvican · 2017-03-06T13:34:11Z

@wlk Can you squash all the failing commit messages?

jvican · 2017-03-06T15:27:30Z

I have very good news! I've run this on the Scala standard library, these are the results.

With Scala collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_51938d03  sample   18       21437.554 ±        949.595   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_51938d03  sample            19964.887                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_51938d03  sample            21407.728                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_51938d03  sample            22541.867                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_51938d03  sample   18         280.535 ±         11.459  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_51938d03  sample   18  6446299745.333 ±   14785544.668    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_51938d03  sample   18         276.572 ±         22.333  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_51938d03  sample   18  6355643405.333 ±  460709368.527    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_51938d03  sample   18          11.677 ±         12.619  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_51938d03  sample   18   275401288.000 ±  296971980.543    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_51938d03  sample   18           6.754 ±          1.900  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_51938d03  sample   18   155936858.667 ±   45709221.663    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_51938d03  sample   18          98.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_51938d03  sample   18       19073.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_51938d03  sample    3       55118.747 ±      18980.181   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_51938d03  sample            54358.180                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_51938d03  sample            54693.724                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03  sample    3         119.464 ±         35.856  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03  sample    3  7018134616.000 ±  329861736.207    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03  sample    3         109.472 ±         43.096  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03  sample    3  6431816405.333 ± 2656847732.046    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03  sample    3           2.692 ±          2.508  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03  sample    3   158121976.000 ±  114580201.514    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03  sample    3          76.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03  sample    3        9389.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03      ss   10         140.486 ±          9.407  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03      ss   10  7201372165.600 ±   51736683.622    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03      ss   10         133.059 ±          9.610  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03      ss   10  6824023763.200 ±  383438006.440    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03      ss   10           3.092 ±          0.517  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03      ss   10   159124314.400 ±   34649991.106    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03      ss   10         255.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03      ss   10       30967.000                       ms

With Java collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_55e2f965  sample   18       21213.858 ±        627.059   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_55e2f965  sample            20199.768                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_55e2f965  sample            21189.624                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_55e2f965  sample            22132.503                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_55e2f965  sample   18         283.808 ±          7.726  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_55e2f965  sample   18  6463578940.444 ±   48501608.706    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_55e2f965  sample   18         283.368 ±         18.931  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_55e2f965  sample   18  6458616490.667 ±  464534032.338    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_55e2f965  sample   18          14.097 ±         12.464  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_55e2f965  sample   18   327768881.778 ±  289039158.120    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_55e2f965  sample   18           7.283 ±          3.030  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_55e2f965  sample   18   167634607.111 ±   73095498.785    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_55e2f965  sample   18         106.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_55e2f965  sample   18       22282.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_55e2f965  sample    3       53329.177 ±       5520.744   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_55e2f965  sample            53016.003                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_55e2f965  sample            53351.547                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965  sample    3         122.504 ±         12.714  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965  sample    3  6970875709.333 ±   68053507.295    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965  sample    3         116.331 ±         69.758  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965  sample    3  6619001184.000 ± 3344291079.994    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen              /tmp/sbt_55e2f965  sample    3           0.001 ±          0.027  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm         /tmp/sbt_55e2f965  sample    3       48530.667 ±    1533523.185    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965  sample    3           2.630 ±          0.121  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965  sample    3   149638445.333 ±   17459243.351    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965  sample    3          79.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965  sample    3        9138.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965      ss   10         149.365 ±          3.975  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965      ss   10  7202488660.000 ±   86462765.565    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965      ss   10         139.420 ±          4.400  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965      ss   10  6724772576.000 ±  286711519.947    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965      ss   10           3.052 ±          0.295  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965      ss   10   147178454.400 ±   13834656.130    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965      ss   10         252.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965      ss   10       29945.000                       ms
[success] Total time: 1575 s, completed Mar 6, 2017 3:47:55 PM
[success] Total time: 0 s, completed Mar 6, 2017 3:47:55 PM

Basically, we have a win of ~2.5 seconds in hot compilation, ~2.8 seconds in warm compilation and 3 seconds in cold compilation. This confirms my initial guess in #245 that this change will be significant. I would expect these speedups to increase for very big codebases (/cc @romanowski).

This PR is good to go. @wlk Could you attach these benchmark results in your squashed commit message and prepend it with "Fix #245: ... bla bla"? This a GitHub trick that will close the issue whenever the PR is merged.

dwijnand · 2017-03-06T16:49:10Z

@jvican I don't mean to be a bore, but could you help me understand your numbers?

Basically, we have a win of ~2.5 seconds in hot compilation, ~2.8 seconds in warm compilation and 3 seconds in cold compilation.

Here's a summary of what I see:

Benchmark                   Units       Score 1                  Score 2         Difference
HotScalacBenchmark.compile  ms/op 21437.554 ±   949.595 -> 21213.858 ±  627.059   -223.696 ms
WarmScalacBenchmark.compile ms/op 55118.747 ± 18980.181 -> 53329.177 ± 5520.744  -1789.570 ms
ColdScalacBenchmark.compile ms/op 48069.209 ±  3267.219 -> 45129.825 ± 1432.394  -2939.384 ms

I'm no mathematician or statistician (/cc @fommil) but from my understanding because the delta falls within the error margin this is NOT a win, it is impossible to tell which number is larger and this change is a statical tie in terms of performance. Please correct me if I'm wrong, and where.

I know in general Java collections are faster than Scala collections. But it's equally said to not take things as fact but measure and benchmark them. And if we are to benchmark them we should act according to the benchmark results.

In the face of these results (and my analysis of them above) I vote to reject this change, favouring the Scala collections over the Java ones (for many reasons). But I strongly welcome more attempts to make Zinc faster.

romanowski · 2017-03-06T19:02:06Z

@dwijnand I can see your point but there another win: error margin in version using Java collection is smaller what means that performance is more stable - what itself is a win. The price that we pay for this is quite small (few while loops). I am ok with that.

@jvican I won't be able to test this soon (since there was quite a lot of changes to zinc recently and my 'big project integration' is a little bit behind zinc).

@rorygraves this may be intersting for your scalac optimalization - especially that switching small part of build (~5-6%) of Scala compilation can reduce difference in build times ~25% (but I am not a statistician so please correct me if I am wrong).

romanowski · 2017-03-06T19:04:05Z

internal/compiler-bridge/src/main/scala/xsbt/API.scala

+
+    private def debugOutput(map: JavaMap[String, Array[String]]): String = {
+      val stringBuffer = new StringBuffer()
+      val it = map.entrySet().iterator()


I think we can add commented scala code to document what is going on here (and similar loops). @jvican @eed3si9n @dwijnand WDYT?

added comment in 2 places

eed3si9n · 2017-03-06T19:10:55Z

I've given the approval above, but in light of the benchmark data, I'm leaning towards @dwijnand's opinion. In general Java style (while loop etc) is more error prone to maintain, so unless there's an observable performance improvements we shouldn't be replacing well-tested bridge code.

jvican · 2017-03-06T19:26:57Z

I'm no mathematician or statistician (/cc @fommil) but from my understanding because the delta falls within the error margin this is NOT a win, it is impossible to tell which number is larger and this change is a statical tie in terms of performance. Please correct me if I'm wrong, and where.

Hi Dale, you're not looking at the data correctly.

Benchmarks are compared in percentiles, not averages. Averages don't matter. In particular, for this scenario, I've used percentiles 99 to compare, hence my conclusions. The thing is that percentiles are the ones that describe the accuracy of comparison taking into account the variations of the measurements. The higher the percentile, the more accurate a conclusion is. This is how any data measurement in the JVM world is done, particularly with JMH.

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's. Therefore, I consider this to be a very good change that should be merged.

jvican · 2017-03-06T19:34:05Z

@dwijnand @eed3si9n If you want to read more on why percentiles are the only right way of measuring performance, I suggest you have a look at these articles:

eed3si9n · 2017-03-06T19:58:57Z

The higher the percentile, the more accurate a conclusion is. This is how any data measurement in the JVM world is done, particularly with JMH.

Percentile maximum can give insight into how spread out the sample measurements are, not how accurate.

Before (p0.90):

[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_51938d03  sample            22541.867                    ms/op

After (p0.90):

[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_55e2f965  sample            22132.503                    ms/op

In the above, 90% of the 18 sample measurements fit inside 22541.867 ms/op envelope vs 22132.503 ms/op. So there's approximately 409 ms speedup.

Another useful point to observe is the median point (p0.50).

Before (p0.50):

[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_51938d03  sample            21407.728                    ms/op

After (p0.50):

[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_55e2f965  sample            21189.624                    ms/op

Here also 50% of the 18 sample measurements fit inside 21407.728 ms/op envelope vs 21189.624 ms/op. Here the speedup observed is approximately 218 ms. So there is a consistent trend for speedup. However, if we look at the difference between the median and p0.90, the difference is around 1134ms for Scala, and 943ms for Java. So in that sense, the speedup observed here might not be significant.

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's.

How did you derive "2.5-3x faster"?

jvican · 2017-03-06T20:08:04Z

Another useful point to observe is the median point (p0.50).

This does not tell you anything about the benchmark data. Please read the articles I've linked to you and related literature.

Here also 50% of the 18 sample measurements fit inside 21407.728 ms/op envelope vs 21189.624 ms/op. Here the speedup observed is approximately 218 ms. So there is a consistent trend for speedup.

You're comparing averages. This is not playing by the book and it's not giving you any interesting data. The distribution of the results is not represented in its values.

In the above, 90% of the 18 sample measurements fit inside 22541.867 ms/op envelope vs 22132.503 ms/op. So there's approximately 409 ms speedup.

A percentile of 90 is not enough. Common studies in the benchmarking world show that normal users experience 99 percentiles, not 90 percentiles. For more information, I defer you to Gil Tene's talk (https://www.youtube.com/watch?v=lJ8ydIuPFeU) and his slides https://www.azul.com/files/HowNotToMeasureLatency_LLSummit_NYC_12Nov2013. Gil Tene has been benchmarking and improving the JVM and its GC for the last 20 years. I think he knows more than we all do.

How did you derive "2.5-3x faster"?

Have a look at the 99 percentiles:

Before

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op

After

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op

If your intuition fails you, have a look at the raw data of cold compilation. There are no percentiles (there's only one measurement) and you get the 3 seconds decrease.

eed3si9n · 2017-03-06T20:35:50Z

You're comparing averages. This is not playing by the book and it's not giving you any interesting data.

p0.50 is not an average, it's median.

In determining whether a particular change to the code resulted to significant speedup or not you need to understand median and the spread. This is why benchmarks are written as 21213.858 ± 627.059 ms/op. For our purpose, I'm fine with using p1.0 (maximum of all sample) to compare the worst case scenario. Whether we use p0.99, p0.999, or p0.999 is insignificant in this case, especially because we're not plotting some graph to hide the outliers.

jvican · 2017-03-06T20:39:05Z

p0.50 is not an average, it's median.

Again, this is totally useless.

In determining whether a particular change to the code resulted to significant speedup or not you need to understand median and the spread. This is why benchmarks are written as 21213.858 ± 627.059 ms/op.

Again, false. Please, take the time to read the benchmarking literature. You're not reading the results correctly.

Whether we use p0.99, p0.999, or p0.999 is insignificant in this case, especially because we're not plotting some graph to hide the outliers.

No, outliers are everything and whether they are plotted or not doesn't make a difference. Outliers are the ones that give us the accuracy that we want and happen more often than we think (hence the margin error and the variations experienced). Computers are really subject to variations and the only predictable way in which we can study its behaviour is by studying the percentiles, not the median.

jvican · 2017-03-06T20:42:47Z

By the way, you said something correct in the previous paragraph: "you need to understand the spread" (I guess you mean the distribution of the results).

Yes, exactly. As you need to understand the distribution you always use percentiles above 99. They will give you the correct behaviour of your program 99% of the times by being conservative on the distribution ("spread") of the results.

I raise again my point about cold compilation (that uses 10 iterations and brings a decrease of 3 seconds). Please, reading benchmarks is not a thing that can be done out of intuition. I've linked to several experts in this field that use correct and predictable methodologies to compare changes that they have developed over the years. I would suggest that we don't ignore their expertise and experience.

eed3si9n · 2017-03-06T21:13:24Z

I raise again my point about cold compilation (that uses 10 iterations and brings a decrease of 3 seconds).

Ironically the evidence presented here, i.e. 48069.209 ± 3267.219 ms/op vs 45129.825 ± 1432.394 ms/op is median, which you condemned as useless :)

romanowski · 2017-03-06T21:15:21Z

We are measure full compilation times here IIRC (@jvican correct me if I am wrong). Taking into account that zinc-related phases takes ~5% of compilation time then even 200 ms speed up on cold compilation give us ~10% speed up (but I think we will see bigger gains).

I think few while loops it not big price for 0.5% compiler speedup.

eed3si9n · 2017-03-06T21:15:26Z

Getting back to Wojciech's work, if we're seeing 3s decrease in cold compilation of Shapeless I think that's pretty cool.

jvican · 2017-03-06T21:27:15Z

Getting back to Wojciech's work, if we're seeing 3s decrease in cold compilation of Shapeless I think that's pretty cool.

These benchmarks are from the standard library, but yes, I also think this is pretty cool.

Ironically the evidence presented here, i.e. 48069.209 ± 3267.219 ms/op vs 45129.825 ± 1432.394 ms/op is median, which you condemned as useless :)

They are useless for scientific, predictable benchmarking. I mentioned them because they give, to some extent, some intuition of how the program behaves, but they cannot be used to compare fairly the running times of several iterations because they don't represent the full reality. This is why I said "If your intuition fails you" before suggesting to have a look at them.

zinc-related phases takes ~5% of compilation time

Where did you get this data? My results tell me that they represent more than just 5% of overall compilation.

even 200 ms speed up on cold compilation give us ~10% speed up (but I think we will see bigger gains

Totally agreed. The results we have have some variation, of course, because of cache and memory layout, unpredictability of GC, JVM behaviour and scheduler. But the fact is that this patch improves considerably the behaviour of Zinc phases in 99% of the cases, which is what we should care about.

The compilation of the standard library lasts only ~20 seconds. For a bigger project --talking about projects that compile for minutes or even hours--, these changes are huge. We should aim for satisfying the need of the companies having such big codebases, some of which I represent as a Scala Center employee.

romanowski · 2017-03-06T21:42:43Z

zinc-related phases takes ~5% of compilation time

Where did you get this data? My results tell me that they represent more than just 5% of overall compilation.

TBH it was quite big 'floor' to make my numbers round (and even using smaller numbers speed up is quite significant).

fommil · 2017-03-06T21:49:52Z

I'm of the opinion that all uses of the Scala mutable collections should be removed. They are inefficient and at the point you decide to use mutable collections, you may as well go Premium. They are critical to ensime indexer performance.

fommil · 2017-03-07T09:27:17Z

am I the only one that finds the error column totally confusing? Absolute percentiles is far better... is there a tool to visualise the output?

…scala and related classes Benchmark results: With Scala collections: [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_51938d03 sample 18 21437.554 ± 949.595 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_51938d03 sample 19964.887 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_51938d03 sample 21407.728 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_51938d03 sample 22541.867 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 sample 18 280.535 ± 11.459 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 sample 18 6446299745.333 ± 14785544.668 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 sample 18 276.572 ± 22.333 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 sample 18 6355643405.333 ± 460709368.527 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_51938d03 sample 18 11.677 ± 12.619 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_51938d03 sample 18 275401288.000 ± 296971980.543 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 sample 18 6.754 ± 1.900 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 sample 18 155936858.667 ± 45709221.663 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 sample 18 98.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 sample 18 19073.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_51938d03 sample 3 55118.747 ± 18980.181 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_51938d03 sample 54358.180 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_51938d03 sample 54693.724 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 sample 3 119.464 ± 35.856 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 sample 3 7018134616.000 ± 329861736.207 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 sample 3 109.472 ± 43.096 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 sample 3 6431816405.333 ± 2656847732.046 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 sample 3 2.692 ± 2.508 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 sample 3 158121976.000 ± 114580201.514 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 sample 3 76.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 sample 3 9389.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_51938d03 ss 10 48069.209 ± 3267.219 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 ss 10 140.486 ± 9.407 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 ss 10 7201372165.600 ± 51736683.622 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 ss 10 133.059 ± 9.610 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 ss 10 6824023763.200 ± 383438006.440 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 ss 10 3.092 ± 0.517 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 ss 10 159124314.400 ± 34649991.106 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 ss 10 255.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 ss 10 30967.000 ms With Java collections: [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_55e2f965 sample 18 21213.858 ± 627.059 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_55e2f965 sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_55e2f965 sample 21189.624 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_55e2f965 sample 22132.503 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 sample 18 283.808 ± 7.726 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 sample 18 6463578940.444 ± 48501608.706 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 sample 18 283.368 ± 18.931 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 sample 18 6458616490.667 ± 464534032.338 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_55e2f965 sample 18 14.097 ± 12.464 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_55e2f965 sample 18 327768881.778 ± 289039158.120 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 sample 18 7.283 ± 3.030 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 sample 18 167634607.111 ± 73095498.785 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 sample 18 106.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 sample 18 22282.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_55e2f965 sample 3 53329.177 ± 5520.744 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_55e2f965 sample 53016.003 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_55e2f965 sample 53351.547 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 sample 3 122.504 ± 12.714 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 sample 3 6970875709.333 ± 68053507.295 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 sample 3 116.331 ± 69.758 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 sample 3 6619001184.000 ± 3344291079.994 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_55e2f965 sample 3 0.001 ± 0.027 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_55e2f965 sample 3 48530.667 ± 1533523.185 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 sample 3 2.630 ± 0.121 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 sample 3 149638445.333 ± 17459243.351 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 sample 3 79.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 sample 3 9138.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_55e2f965 ss 10 45129.825 ± 1432.394 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 ss 10 149.365 ± 3.975 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 ss 10 7202488660.000 ± 86462765.565 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 ss 10 139.420 ± 4.400 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 ss 10 6724772576.000 ± 286711519.947 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 ss 10 3.052 ± 0.295 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 ss 10 147178454.400 ± 13834656.130 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 ss 10 252.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 ss 10 29945.000 ms [success] Total time: 1575 s, completed Mar 6, 2017 3:47:55 PM [success] Total time: 0 s, completed Mar 6, 2017 3:47:55 PM

jvican · 2017-03-07T09:42:25Z

No @fommil, to the best of my knowledge there is no tool to visualize JMH output from the results we got.

dwijnand · 2017-03-07T10:11:15Z

Hi all,

First, @wlk, I wanted to reassure you that this conversation has nothing to do with your change in particular, but about the topic in general. Please accept my apologies in the hostage takeover of your PR. 😄

To the rest of the Zinc team, thank you for engaging with me on this topic. I'm embarrassingly ignorant on the topic. I'm happy we're fleshing this topic out as it's an important one. I believe that on the whole we all want the same thing: a better, more correct and faster Zinc. We agree on more than we disagree on.

In particular thank you @jvican for the numerous references you've provided. Sadly I don't have time to fully understand benchmarking - "I don't want a PhD in benchmarking", I just need to understand enough to be able to make probably-correct decisions on changes and PRs.

With that out the way, on to the technical:

@jvican

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's.

How did you derive "2.5-3x faster"?

Have a look at the 99 percentiles:

Before

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op

After

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op

If your intuition fails you, have a look at the raw data of cold compilation. There are no percentiles (there's only one measurement) and you get the 3 seconds decrease.

Why are there no percentiles but only one measurement for cold compilation?
Did you mean to say "2.5-3s faster" instead of "2.5-3x faster"? There's a humongous difference..

Here's my reformatted and compressed summary of the change. Same raw numbers as above, with my calculations added. Again, please correct me where I'm wrong. Units is ms/op

 HotScalacBenchmark.compile:compile·p0.99  24595.399  ->  22313.697             =>  -2281.702 (-9.3%)
WarmScalacBenchmark.compile:compile·p0.99  56304.337  ->  53619.982             =>  -2684.355 (-4.8%)
ColdScalacBenchmark.compile     48069.209 ± 3267.219  ->  45129.825 ± 1432.394  =>  -2939.384 (-6.1%)

Executive summary: 2-3s faster, 5-9% faster.

That's money. Thank you @jvican for creating the ticket. Thank you @wlk for implementing it. And, again, thanks all for conversation. I for one am much better informed.

jvican · 2017-03-07T10:18:55Z

Why are there no percentiles but only one measurement for cold compilation?

Because the JMH benchmarks for cold compilations are one-shot. This means that you don't warm up the JVM before by doing more work. As you only have one sample, per iteration, you cannot get the percentiles and you just get the average of the 10 one-shot benchmark runs.

The official JMH docs say this about the single shot option:

Single Shot Time 	Measures how long time a single benchmark method execution takes to run. This is good to test how it performs under a cold start (no JVM warm up).

Did you mean to say "2.5-3s faster" instead of "2.5-3x faster"? There's a humongous difference..

I did mean seconds, not "x". Sorry for that typo.

And, again, thanks all for conversation. I for one am much better informed.

Thanks to you. I'm glad that from now on we know how to read benchmarks and reason about the changes. Hopefully, as we increase the performance of Zinc, we will all learn about this black magic that is benchmarking the JVM.

wlk · 2017-03-07T10:23:02Z

Thanks for the merge and the discussion, this is a good learning experience because I haven't use JMH before in practice

eed3si9n added the waiting for review label Mar 3, 2017

eed3si9n approved these changes Mar 3, 2017

View reviewed changes

romanowski suggested changes Mar 3, 2017

View reviewed changes

jvican added enhancement Scala Center spree labels Mar 3, 2017

jvican approved these changes Mar 3, 2017

View reviewed changes

wlk mentioned this pull request Mar 5, 2017

Replace use of Scala collections by Java's - alternative version #257

Closed

wlk force-pushed the issue-245 branch from 27f5eaa to d93ff1a Compare March 5, 2017 18:32

wlk force-pushed the issue-245 branch from d93ff1a to c30583b Compare March 5, 2017 19:14

jvican approved these changes Mar 6, 2017

View reviewed changes

jvican force-pushed the issue-245 branch from 7cf9f5b to ff5c6ea Compare March 6, 2017 13:53

romanowski reviewed Mar 6, 2017

View reviewed changes

wlk force-pushed the issue-245 branch 2 times, most recently from 85d191e to da52498 Compare March 7, 2017 08:52

wlk force-pushed the issue-245 branch from da52498 to 3435826 Compare March 7, 2017 09:28

dwijnand approved these changes Mar 7, 2017

View reviewed changes

dwijnand merged commit 335a098 into sbt:1.0 Mar 7, 2017

dwijnand removed the waiting for review label Mar 7, 2017

wlk deleted the issue-245 branch March 7, 2017 10:21

Replace use of Scala collections by Java's #253

Replace use of Scala collections by Java's #253

Conversation

wlk commented Mar 3, 2017 • edited Loading

romanowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlk commented Mar 4, 2017

dwijnand commented Mar 4, 2017

jvican commented Mar 4, 2017 • edited Loading

wlk commented Mar 5, 2017

wlk commented Mar 5, 2017

jvican left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican commented Mar 6, 2017

jvican commented Mar 6, 2017

dwijnand commented Mar 6, 2017

romanowski commented Mar 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eed3si9n commented Mar 6, 2017

jvican commented Mar 6, 2017 • edited Loading

jvican commented Mar 6, 2017

eed3si9n commented Mar 6, 2017

jvican commented Mar 6, 2017

eed3si9n commented Mar 6, 2017

jvican commented Mar 6, 2017 • edited Loading

jvican commented Mar 6, 2017 • edited Loading

eed3si9n commented Mar 6, 2017

romanowski commented Mar 6, 2017

eed3si9n commented Mar 6, 2017

jvican commented Mar 6, 2017 • edited Loading

romanowski commented Mar 6, 2017

fommil commented Mar 6, 2017

fommil commented Mar 7, 2017

jvican commented Mar 7, 2017

dwijnand commented Mar 7, 2017

jvican commented Mar 7, 2017

wlk commented Mar 7, 2017

wlk commented Mar 3, 2017 •

edited

Loading

jvican commented Mar 4, 2017 •

edited

Loading

jvican commented Mar 6, 2017 •

edited

Loading

jvican commented Mar 6, 2017 •

edited

Loading

jvican commented Mar 6, 2017 •

edited

Loading

jvican commented Mar 6, 2017 •

edited

Loading