Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of Scala collections by Java's #253

Merged
merged 1 commit into from
Mar 7, 2017
Merged

Conversation

wlk
Copy link
Contributor

@wlk wlk commented Mar 3, 2017

issue #245

Tested on Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz

shapeless benchmark on 1.0:

[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_50a6156d  sample   18       18956.390 ±        470.882   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_50a6156d  sample            18018.730                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_50a6156d  sample            19142.803                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_50a6156d  sample            19555.523                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_50a6156d  sample            19797.115                    ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_50a6156d  sample   18         176.279 ±          4.358  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_50a6156d  sample   18  3598116903.111 ±    8115563.410    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_50a6156d  sample   18           0.005 ±          0.006  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_50a6156d  sample   18      110556.889 ±     121365.225    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_50a6156d  sample   18           0.038 ±          0.042  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_50a6156d  sample   18      796124.000 ±     874132.490    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_50a6156d  sample   18         176.622 ±         20.857  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_50a6156d  sample   18  3611208362.667 ±  469187660.582    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_50a6156d  sample   18           8.756 ±          9.432  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_50a6156d  sample   18   181558790.222 ±  195372645.622    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_50a6156d  sample   18           3.848 ±          2.094  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_50a6156d  sample   18    79276089.778 ±   44103533.796    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_50a6156d  sample   18          69.000                   counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_50a6156d  sample   18       11534.000                       ms
[info] WarmShapelessBenchmark.compile                                       /tmp/sbt_50a6156d  sample    3       53955.527 ±      89475.604   ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.00                         /tmp/sbt_50a6156d  sample            49190.797                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.50                         /tmp/sbt_50a6156d  sample            53687.091                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.90                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.95                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.99                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.999                        /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.9999                       /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p1.00                         /tmp/sbt_50a6156d  sample            58988.691                    ms/op
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_50a6156d  sample    3          69.162 ±        118.303  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_50a6156d  sample    3  3964893128.000 ±  464895464.749    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_50a6156d  sample    3          66.717 ±        172.841  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_50a6156d  sample    3  3813975968.000 ± 4155442809.041    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_50a6156d  sample    3           2.276 ±          3.402  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_50a6156d  sample    3   130799010.667 ±  172919291.284    B/op
[info] WarmShapelessBenchmark.compile:·gc.count                             /tmp/sbt_50a6156d  sample    3          62.000                   counts
[info] WarmShapelessBenchmark.compile:·gc.time                              /tmp/sbt_50a6156d  sample    3        6208.000                       ms
[info] ColdShapelessBenchmark.compile                                       /tmp/sbt_50a6156d      ss   10       48121.431 ±       4706.772   ms/op
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_50a6156d      ss   10          81.244 ±          7.803  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_50a6156d      ss   10  4169800629.600 ±   45813074.993    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_50a6156d      ss   10          76.312 ±          9.530  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_50a6156d      ss   10  3915401249.600 ±  295384574.736    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_50a6156d      ss   10           2.624 ±          0.343  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_50a6156d      ss   10   134746036.800 ±   13018312.058    B/op
[info] ColdShapelessBenchmark.compile:·gc.count                             /tmp/sbt_50a6156d      ss   10         217.000                   counts
[info] ColdShapelessBenchmark.compile:·gc.time                              /tmp/sbt_50a6156d      ss   10       20536.000                       ms

shapeless benchmark on issue-245:

[info] Benchmark                                                                   (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotShapelessBenchmark.compile                                        /tmp/sbt_f45f970f  sample   18       18440.025 ±        359.687   ms/op
[info] HotShapelessBenchmark.compile:compile·p0.00                          /tmp/sbt_f45f970f  sample            17750.295                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.50                          /tmp/sbt_f45f970f  sample            18522.046                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.90                          /tmp/sbt_f45f970f  sample            18911.278                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.95                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.99                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.999                         /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p0.9999                        /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:compile·p1.00                          /tmp/sbt_f45f970f  sample            19092.472                    ms/op
[info] HotShapelessBenchmark.compile:·gc.alloc.rate                         /tmp/sbt_f45f970f  sample   18         181.299 ±          3.589  MB/sec
[info] HotShapelessBenchmark.compile:·gc.alloc.rate.norm                    /tmp/sbt_f45f970f  sample   18  3603656596.889 ±   13617055.744    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space       /tmp/sbt_f45f970f  sample   18           0.006 ±          0.006  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Compressed_Class_Space.norm  /tmp/sbt_f45f970f  sample   18      123574.667 ±     121374.571    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace                    /tmp/sbt_f45f970f  sample   18           0.044 ±          0.043  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.Metaspace.norm               /tmp/sbt_f45f970f  sample   18      887152.444 ±     871235.644    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space                /tmp/sbt_f45f970f  sample   18         179.641 ±         24.247  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm           /tmp/sbt_f45f970f  sample   18  3568843984.889 ±  463623900.493    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen                   /tmp/sbt_f45f970f  sample   18          10.104 ±          9.729  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Old_Gen.norm              /tmp/sbt_f45f970f  sample   18   204088802.667 ±  196477548.292    B/op
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space            /tmp/sbt_f45f970f  sample   18           4.046 ±          2.169  MB/sec
[info] HotShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm       /tmp/sbt_f45f970f  sample   18    81169088.889 ±   44090551.274    B/op
[info] HotShapelessBenchmark.compile:·gc.count                              /tmp/sbt_f45f970f  sample   18          69.000                   counts
[info] HotShapelessBenchmark.compile:·gc.time                               /tmp/sbt_f45f970f  sample   18       10385.000                       ms
[info] WarmShapelessBenchmark.compile                                       /tmp/sbt_f45f970f  sample    3       47624.924 ±      27981.257   ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.00                         /tmp/sbt_f45f970f  sample            46640.660                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.50                         /tmp/sbt_f45f970f  sample            46841.987                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.90                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.95                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.99                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.999                        /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p0.9999                       /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:compile·p1.00                         /tmp/sbt_f45f970f  sample            49392.124                    ms/op
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_f45f970f  sample    3          78.273 ±         46.764  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_f45f970f  sample    3  3992337394.667 ±  248947971.262    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_f45f970f  sample    3          75.365 ±         83.725  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_f45f970f  sample    3  3847142618.667 ± 4949482181.092    B/op
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_f45f970f  sample    3           2.590 ±          6.039  MB/sec
[info] WarmShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_f45f970f  sample    3   131857816.000 ±  240929409.334    B/op
[info] WarmShapelessBenchmark.compile:·gc.count                             /tmp/sbt_f45f970f  sample    3          61.000                   counts
[info] WarmShapelessBenchmark.compile:·gc.time                              /tmp/sbt_f45f970f  sample    3        5535.000                       ms
[info] ColdShapelessBenchmark.compile                                       /tmp/sbt_f45f970f      ss   10       44379.722 ±       3177.612   ms/op
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate                        /tmp/sbt_f45f970f      ss   10          87.801 ±          5.691  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.alloc.rate.norm                   /tmp/sbt_f45f970f      ss   10  4168487485.600 ±   58228600.899    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space               /tmp/sbt_f45f970f      ss   10          81.738 ±          4.820  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Eden_Space.norm          /tmp/sbt_f45f970f      ss   10  3887614885.600 ±  360562942.522    B/op
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space           /tmp/sbt_f45f970f      ss   10           2.804 ±          0.359  MB/sec
[info] ColdShapelessBenchmark.compile:·gc.churn.PS_Survivor_Space.norm      /tmp/sbt_f45f970f      ss   10   133293604.800 ±   18777124.672    B/op
[info] ColdShapelessBenchmark.compile:·gc.count                             /tmp/sbt_f45f970f      ss   10         211.000                   counts
[info] ColdShapelessBenchmark.compile:·gc.time                              /tmp/sbt_f45f970f      ss   10       18743.000                       ms

Copy link
Contributor

@romanowski romanowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if my comments will provide any speedup in the end so all can be ignored (since except for them code LGTM).

cache += owner -> (handler -> newVisited)
visited = newVisited
visitedOwner = owner
if (cache.containsKey(owner)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can always call get and check if this is not a null? I know this is ugly but we save multiple some set operation then.

usedNamesFromClasses.put(className, emptySet)
emptySet
def usedNamesFromClass(className: Name): util.HashSet[Name] = {
if (usedNamesFromClasses.containsKey(className)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above :)

val newVisited = mutable.HashSet.empty[Type]
visited = newVisited
ownersCache += owner -> newVisited
if (ownersCache.containsKey(owner)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Member

@jvican jvican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work is really good, thank you @wlk.

This PR reduces 700ms for Shapeless, which I consider an excellent improvement for such a small microptimization. I propose that we merge as soon as CI is green.

case (className: String, names: Iterable[String]) =>
names foreach { (name: String) => callback.usedName(className, name) }

debuglog("The " + sourceFile + " contains the following used names:\n " + debugOutput(allUsedNames))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're changing this code, minor suggestion: could we turn this into interpolated strings? As in s"The $sourceFile ... ".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -49,29 +52,39 @@ import Compat._
class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers {
import global._

def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = {
def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we keep these as Java iterables? It could help us hide how they are actually implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, parially, I can replace the HashMap with Map, but replacing ArrayList with Iterable gives me this:

Error:(87, 5) type mismatch;
 found   : java.util.HashMap[String,java.util.ArrayList[String]]
 required: java.util.Map[String,Iterable[String]]
Note: java.util.ArrayList[String] <: Iterable[String], but Java-defined trait Map is invariant in type V.
You may wish to investigate a wildcard type such as `_ <: Iterable[String]`. (SLS 3.2.10)
    result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update, fixed this one, but requires using scala.lang.existentials

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to fall back to using concrete types because I was unable to make this work with Java's forEach

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach. Don't worry about this!

val usedName = it.next()
val usedNameKey = usedName.getKey.toString.trim
val usedNameValues = usedName.getValue.iterator()
val uses = new ArrayList[String]()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, it seems that we're intiializing an array list here every time we iterate through the class name entries. It would be nice if we could remove it.

It happens that HashSet has a constant size method, so do you think we could use that size to create an array of that concrete length and then change the return type of extract? This would buy us the fact that we are not paying the cost of expanding the array list when the capacity is full because we know that size a priori. It can also help us with some spacial cache locality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I see what you mean by removing creation of ArrayList per iteration, do you have something specific in mind?

The other issue is fixed

@@ -13,6 +13,8 @@ class ExtractUsedNamesSpecification extends UnitSpec {
val usedNames = compilerForTesting.extractUsedNamesFromSrc(src)
val expectedNames = standardNames ++ Set("a", "A", "A2", "b")
// names used at top level are attributed to the first class defined in a compilation unit

println(usedNames)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@wlk
Copy link
Contributor Author

wlk commented Mar 4, 2017

I need to take a look at the compilation problem, not sure why it runs locally OK

@dwijnand
Copy link
Member

dwijnand commented Mar 4, 2017

"53955.527 ± 89475.604 ms/op" this measurement is unusable.

For the hot compilation performance goes from "18956.390 ± 470.882 ms/op" to 18440.025 ± 359.687 ms/op". The difference is about the size of the error margin. Is that good enough?

Also, given there have been more changes/commits, we should regenerate the benchmark results before jumping to conclusions about whether this compromise is worth merging.

At least I think, but clearly I'm in the minority here.

@jvican
Copy link
Member

jvican commented Mar 4, 2017

For the hot compilation performance goes from "18956.390 ± 470.882 ms/op" to 18440.025 ± 359.687 ms/op". The difference is about the size of the error margin. Is that good enough?

Yeah, I think this is fine.

Also, given there have been more changes/commits, we should regenerate the benchmark results before jumping to conclusions about whether this compromise is worth merging.

I don't think we have to ask for new benchmarks every time a new change is committed, especifically if it does not affect the core logic... However, I'll be running this on a controlled environment and report on more results. But I think these changes are perfect to go.

@wlk
Copy link
Contributor Author

wlk commented Mar 5, 2017

There was one complication that I didn't see before: Java collections have forEach, but the scala interop is not ideal here, so I had to implement java.util.function.BiConsumer and java.util.function.Consumer in order to use forEach - afaik this will work in 2.12 - maybe you can advise if this is OK or maybe there's better way to do this?

@wlk
Copy link
Contributor Author

wlk commented Mar 5, 2017

please take a look @jvican

Copy link
Member

@jvican jvican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can advise if this is OK or maybe there's better way to do this?

I've just pushed a new commit that solves minor issues, including this one 😉. So you don't have to worry. I think the PR is perfect.

I have a busy agenda today, but I'll make sure that I run the benchmarks before tomorrow so that this can get in as soon as possible.

Again, great work.

@@ -49,29 +52,39 @@ import Compat._
class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers {
import global._

def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = {
def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach. Don't worry about this!

@jvican
Copy link
Member

jvican commented Mar 6, 2017

@wlk Can you squash all the failing commit messages?

@jvican
Copy link
Member

jvican commented Mar 6, 2017

I have very good news! I've run this on the Scala standard library, these are the results.

With Scala collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_51938d03  sample   18       21437.554 ±        949.595   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_51938d03  sample            19964.887                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_51938d03  sample            21407.728                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_51938d03  sample            22541.867                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_51938d03  sample   18         280.535 ±         11.459  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_51938d03  sample   18  6446299745.333 ±   14785544.668    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_51938d03  sample   18         276.572 ±         22.333  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_51938d03  sample   18  6355643405.333 ±  460709368.527    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_51938d03  sample   18          11.677 ±         12.619  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_51938d03  sample   18   275401288.000 ±  296971980.543    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_51938d03  sample   18           6.754 ±          1.900  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_51938d03  sample   18   155936858.667 ±   45709221.663    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_51938d03  sample   18          98.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_51938d03  sample   18       19073.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_51938d03  sample    3       55118.747 ±      18980.181   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_51938d03  sample            54358.180                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_51938d03  sample            54693.724                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03  sample    3         119.464 ±         35.856  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03  sample    3  7018134616.000 ±  329861736.207    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03  sample    3         109.472 ±         43.096  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03  sample    3  6431816405.333 ± 2656847732.046    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03  sample    3           2.692 ±          2.508  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03  sample    3   158121976.000 ±  114580201.514    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03  sample    3          76.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03  sample    3        9389.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03      ss   10         140.486 ±          9.407  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03      ss   10  7201372165.600 ±   51736683.622    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03      ss   10         133.059 ±          9.610  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03      ss   10  6824023763.200 ±  383438006.440    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03      ss   10           3.092 ±          0.517  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03      ss   10   159124314.400 ±   34649991.106    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03      ss   10         255.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03      ss   10       30967.000                       ms

With Java collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_55e2f965  sample   18       21213.858 ±        627.059   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_55e2f965  sample            20199.768                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_55e2f965  sample            21189.624                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_55e2f965  sample            22132.503                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_55e2f965  sample   18         283.808 ±          7.726  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_55e2f965  sample   18  6463578940.444 ±   48501608.706    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_55e2f965  sample   18         283.368 ±         18.931  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_55e2f965  sample   18  6458616490.667 ±  464534032.338    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_55e2f965  sample   18          14.097 ±         12.464  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_55e2f965  sample   18   327768881.778 ±  289039158.120    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_55e2f965  sample   18           7.283 ±          3.030  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_55e2f965  sample   18   167634607.111 ±   73095498.785    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_55e2f965  sample   18         106.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_55e2f965  sample   18       22282.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_55e2f965  sample    3       53329.177 ±       5520.744   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_55e2f965  sample            53016.003                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_55e2f965  sample            53351.547                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965  sample    3         122.504 ±         12.714  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965  sample    3  6970875709.333 ±   68053507.295    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965  sample    3         116.331 ±         69.758  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965  sample    3  6619001184.000 ± 3344291079.994    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen              /tmp/sbt_55e2f965  sample    3           0.001 ±          0.027  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm         /tmp/sbt_55e2f965  sample    3       48530.667 ±    1533523.185    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965  sample    3           2.630 ±          0.121  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965  sample    3   149638445.333 ±   17459243.351    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965  sample    3          79.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965  sample    3        9138.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965      ss   10         149.365 ±          3.975  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965      ss   10  7202488660.000 ±   86462765.565    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965      ss   10         139.420 ±          4.400  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965      ss   10  6724772576.000 ±  286711519.947    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965      ss   10           3.052 ±          0.295  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965      ss   10   147178454.400 ±   13834656.130    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965      ss   10         252.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965      ss   10       29945.000                       ms
[success] Total time: 1575 s, completed Mar 6, 2017 3:47:55 PM
[success] Total time: 0 s, completed Mar 6, 2017 3:47:55 PM

Basically, we have a win of ~2.5 seconds in hot compilation, ~2.8 seconds in warm compilation and 3 seconds in cold compilation. This confirms my initial guess in #245 that this change will be significant. I would expect these speedups to increase for very big codebases (/cc @romanowski).

This PR is good to go. @wlk Could you attach these benchmark results in your squashed commit message and prepend it with "Fix #245: ... bla bla"? This a GitHub trick that will close the issue whenever the PR is merged.

@dwijnand
Copy link
Member

dwijnand commented Mar 6, 2017

@jvican I don't mean to be a bore, but could you help me understand your numbers?

Basically, we have a win of ~2.5 seconds in hot compilation, ~2.8 seconds in warm compilation and 3 seconds in cold compilation.

Here's a summary of what I see:

Benchmark                   Units       Score 1                  Score 2         Difference
HotScalacBenchmark.compile  ms/op 21437.554 ±   949.595 -> 21213.858 ±  627.059   -223.696 ms
WarmScalacBenchmark.compile ms/op 55118.747 ± 18980.181 -> 53329.177 ± 5520.744  -1789.570 ms
ColdScalacBenchmark.compile ms/op 48069.209 ±  3267.219 -> 45129.825 ± 1432.394  -2939.384 ms

I'm no mathematician or statistician (/cc @fommil) but from my understanding because the delta falls within the error margin this is NOT a win, it is impossible to tell which number is larger and this change is a statical tie in terms of performance. Please correct me if I'm wrong, and where.

I know in general Java collections are faster than Scala collections. But it's equally said to not take things as fact but measure and benchmark them. And if we are to benchmark them we should act according to the benchmark results.

In the face of these results (and my analysis of them above) I vote to reject this change, favouring the Scala collections over the Java ones (for many reasons). But I strongly welcome more attempts to make Zinc faster.

@romanowski
Copy link
Contributor

@dwijnand I can see your point but there another win: error margin in version using Java collection is smaller what means that performance is more stable - what itself is a win. The price that we pay for this is quite small (few while loops). I am ok with that.

@jvican I won't be able to test this soon (since there was quite a lot of changes to zinc recently and my 'big project integration' is a little bit behind zinc).

@rorygraves this may be intersting for your scalac optimalization - especially that switching small part of build (~5-6%) of Scala compilation can reduce difference in build times ~25% (but I am not a statistician so please correct me if I am wrong).


private def debugOutput(map: JavaMap[String, Array[String]]): String = {
val stringBuffer = new StringBuffer()
val it = map.entrySet().iterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add commented scala code to document what is going on here (and similar loops). @jvican @eed3si9n @dwijnand WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment in 2 places

@eed3si9n
Copy link
Member

eed3si9n commented Mar 6, 2017

I've given the approval above, but in light of the benchmark data, I'm leaning towards @dwijnand's opinion. In general Java style (while loop etc) is more error prone to maintain, so unless there's an observable performance improvements we shouldn't be replacing well-tested bridge code.

@jvican
Copy link
Member

jvican commented Mar 6, 2017

I'm no mathematician or statistician (/cc @fommil) but from my understanding because the delta falls within the error margin this is NOT a win, it is impossible to tell which number is larger and this change is a statical tie in terms of performance. Please correct me if I'm wrong, and where.

Hi Dale, you're not looking at the data correctly.

Benchmarks are compared in percentiles, not averages. Averages don't matter. In particular, for this scenario, I've used percentiles 99 to compare, hence my conclusions. The thing is that percentiles are the ones that describe the accuracy of comparison taking into account the variations of the measurements. The higher the percentile, the more accurate a conclusion is. This is how any data measurement in the JVM world is done, particularly with JMH.

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's. Therefore, I consider this to be a very good change that should be merged.

@jvican
Copy link
Member

jvican commented Mar 6, 2017

@dwijnand @eed3si9n If you want to read more on why percentiles are the only right way of measuring performance, I suggest you have a look at these articles:

@eed3si9n
Copy link
Member

eed3si9n commented Mar 6, 2017

The higher the percentile, the more accurate a conclusion is. This is how any data measurement in the JVM world is done, particularly with JMH.

Percentile maximum can give insight into how spread out the sample measurements are, not how accurate.

Before (p0.90):

[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_51938d03  sample            22541.867                    ms/op

After (p0.90):

[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_55e2f965  sample            22132.503                    ms/op

In the above, 90% of the 18 sample measurements fit inside 22541.867 ms/op envelope vs 22132.503 ms/op. So there's approximately 409 ms speedup.

Another useful point to observe is the median point (p0.50).

Before (p0.50):

[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_51938d03  sample            21407.728                    ms/op

After (p0.50):

[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_55e2f965  sample            21189.624                    ms/op

Here also 50% of the 18 sample measurements fit inside 21407.728 ms/op envelope vs 21189.624 ms/op. Here the speedup observed is approximately 218 ms. So there is a consistent trend for speedup. However, if we look at the difference between the median and p0.90, the difference is around 1134ms for Scala, and 943ms for Java. So in that sense, the speedup observed here might not be significant.

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's.

How did you derive "2.5-3x faster"?

@jvican
Copy link
Member

jvican commented Mar 6, 2017

Another useful point to observe is the median point (p0.50).

This does not tell you anything about the benchmark data. Please read the articles I've linked to you and related literature.

Here also 50% of the 18 sample measurements fit inside 21407.728 ms/op envelope vs 21189.624 ms/op. Here the speedup observed is approximately 218 ms. So there is a consistent trend for speedup.

You're comparing averages. This is not playing by the book and it's not giving you any interesting data. The distribution of the results is not represented in its values.

In the above, 90% of the 18 sample measurements fit inside 22541.867 ms/op envelope vs 22132.503 ms/op. So there's approximately 409 ms speedup.

A percentile of 90 is not enough. Common studies in the benchmarking world show that normal users experience 99 percentiles, not 90 percentiles. For more information, I defer you to Gil Tene's talk (https://www.youtube.com/watch?v=lJ8ydIuPFeU) and his slides https://www.azul.com/files/HowNotToMeasureLatency_LLSummit_NYC_12Nov2013. Gil Tene has been benchmarking and improving the JVM and its GC for the last 20 years. I think he knows more than we all do.

How did you derive "2.5-3x faster"?

Have a look at the 99 percentiles:

Before

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op

After

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op

If your intuition fails you, have a look at the raw data of cold compilation. There are no percentiles (there's only one measurement) and you get the 3 seconds decrease.

@eed3si9n
Copy link
Member

eed3si9n commented Mar 6, 2017

You're comparing averages. This is not playing by the book and it's not giving you any interesting data.

p0.50 is not an average, it's median.

In determining whether a particular change to the code resulted to significant speedup or not you need to understand median and the spread. This is why benchmarks are written as 21213.858 ± 627.059 ms/op. For our purpose, I'm fine with using p1.0 (maximum of all sample) to compare the worst case scenario. Whether we use p0.99, p0.999, or p0.999 is insignificant in this case, especially because we're not plotting some graph to hide the outliers.

@jvican
Copy link
Member

jvican commented Mar 6, 2017

p0.50 is not an average, it's median.

Again, this is totally useless.

In determining whether a particular change to the code resulted to significant speedup or not you need to understand median and the spread. This is why benchmarks are written as 21213.858 ± 627.059 ms/op.

Again, false. Please, take the time to read the benchmarking literature. You're not reading the results correctly.

Whether we use p0.99, p0.999, or p0.999 is insignificant in this case, especially because we're not plotting some graph to hide the outliers.

No, outliers are everything and whether they are plotted or not doesn't make a difference. Outliers are the ones that give us the accuracy that we want and happen more often than we think (hence the margin error and the variations experienced). Computers are really subject to variations and the only predictable way in which we can study its behaviour is by studying the percentiles, not the median.

@jvican
Copy link
Member

jvican commented Mar 6, 2017

By the way, you said something correct in the previous paragraph: "you need to understand the spread" (I guess you mean the distribution of the results).

Yes, exactly. As you need to understand the distribution you always use percentiles above 99. They will give you the correct behaviour of your program 99% of the times by being conservative on the distribution ("spread") of the results.

I raise again my point about cold compilation (that uses 10 iterations and brings a decrease of 3 seconds). Please, reading benchmarks is not a thing that can be done out of intuition. I've linked to several experts in this field that use correct and predictable methodologies to compare changes that they have developed over the years. I would suggest that we don't ignore their expertise and experience.

@eed3si9n
Copy link
Member

eed3si9n commented Mar 6, 2017

I raise again my point about cold compilation (that uses 10 iterations and brings a decrease of 3 seconds).

Ironically the evidence presented here, i.e. 48069.209 ± 3267.219 ms/op vs 45129.825 ± 1432.394 ms/op is median, which you condemned as useless :)

@romanowski
Copy link
Contributor

We are measure full compilation times here IIRC (@jvican correct me if I am wrong). Taking into account that zinc-related phases takes ~5% of compilation time then even 200 ms speed up on cold compilation give us ~10% speed up (but I think we will see bigger gains).

I think few while loops it not big price for 0.5% compiler speedup.

@eed3si9n
Copy link
Member

eed3si9n commented Mar 6, 2017

Getting back to Wojciech's work, if we're seeing 3s decrease in cold compilation of Shapeless I think that's pretty cool.

@jvican
Copy link
Member

jvican commented Mar 6, 2017

Getting back to Wojciech's work, if we're seeing 3s decrease in cold compilation of Shapeless I think that's pretty cool.

These benchmarks are from the standard library, but yes, I also think this is pretty cool.

Ironically the evidence presented here, i.e. 48069.209 ± 3267.219 ms/op vs 45129.825 ± 1432.394 ms/op is median, which you condemned as useless :)

They are useless for scientific, predictable benchmarking. I mentioned them because they give, to some extent, some intuition of how the program behaves, but they cannot be used to compare fairly the running times of several iterations because they don't represent the full reality. This is why I said "If your intuition fails you" before suggesting to have a look at them.

zinc-related phases takes ~5% of compilation time

Where did you get this data? My results tell me that they represent more than just 5% of overall compilation.

even 200 ms speed up on cold compilation give us ~10% speed up (but I think we will see bigger gains

Totally agreed. The results we have have some variation, of course, because of cache and memory layout, unpredictability of GC, JVM behaviour and scheduler. But the fact is that this patch improves considerably the behaviour of Zinc phases in 99% of the cases, which is what we should care about.

The compilation of the standard library lasts only ~20 seconds. For a bigger project --talking about projects that compile for minutes or even hours--, these changes are huge. We should aim for satisfying the need of the companies having such big codebases, some of which I represent as a Scala Center employee.

@romanowski
Copy link
Contributor

zinc-related phases takes ~5% of compilation time

Where did you get this data? My results tell me that they represent more than just 5% of overall compilation.

TBH it was quite big 'floor' to make my numbers round (and even using smaller numbers speed up is quite significant).

@fommil
Copy link

fommil commented Mar 6, 2017

I'm of the opinion that all uses of the Scala mutable collections should be removed. They are inefficient and at the point you decide to use mutable collections, you may as well go Premium. They are critical to ensime indexer performance.

@wlk wlk force-pushed the issue-245 branch 2 times, most recently from 85d191e to da52498 Compare March 7, 2017 08:52
@fommil
Copy link

fommil commented Mar 7, 2017

am I the only one that finds the error column totally confusing? Absolute percentiles is far better... is there a tool to visualise the output?

…scala and related classes

Benchmark results:
With Scala collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_51938d03  sample   18       21437.554 ±        949.595   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_51938d03  sample            19964.887                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_51938d03  sample            21407.728                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_51938d03  sample            22541.867                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_51938d03  sample   18         280.535 ±         11.459  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_51938d03  sample   18  6446299745.333 ±   14785544.668    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_51938d03  sample   18         276.572 ±         22.333  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_51938d03  sample   18  6355643405.333 ±  460709368.527    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_51938d03  sample   18          11.677 ±         12.619  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_51938d03  sample   18   275401288.000 ±  296971980.543    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_51938d03  sample   18           6.754 ±          1.900  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_51938d03  sample   18   155936858.667 ±   45709221.663    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_51938d03  sample   18          98.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_51938d03  sample   18       19073.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_51938d03  sample    3       55118.747 ±      18980.181   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_51938d03  sample            54358.180                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_51938d03  sample            54693.724                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03  sample    3         119.464 ±         35.856  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03  sample    3  7018134616.000 ±  329861736.207    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03  sample    3         109.472 ±         43.096  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03  sample    3  6431816405.333 ± 2656847732.046    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03  sample    3           2.692 ±          2.508  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03  sample    3   158121976.000 ±  114580201.514    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03  sample    3          76.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03  sample    3        9389.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_51938d03      ss   10         140.486 ±          9.407  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_51938d03      ss   10  7201372165.600 ±   51736683.622    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_51938d03      ss   10         133.059 ±          9.610  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_51938d03      ss   10  6824023763.200 ±  383438006.440    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_51938d03      ss   10           3.092 ±          0.517  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_51938d03      ss   10   159124314.400 ±   34649991.106    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_51938d03      ss   10         255.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_51938d03      ss   10       30967.000                       ms

With Java collections:

[info] Benchmark                                                            (_tempDir)    Mode  Cnt           Score            Error   Units
[info] HotScalacBenchmark.compile                                    /tmp/sbt_55e2f965  sample   18       21213.858 ±        627.059   ms/op
[info] HotScalacBenchmark.compile:compile·p0.00                      /tmp/sbt_55e2f965  sample            20199.768                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.50                      /tmp/sbt_55e2f965  sample            21189.624                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.90                      /tmp/sbt_55e2f965  sample            22132.503                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.95                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.999                     /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999                    /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:compile·p1.00                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] HotScalacBenchmark.compile:·gc.alloc.rate                     /tmp/sbt_55e2f965  sample   18         283.808 ±          7.726  MB/sec
[info] HotScalacBenchmark.compile:·gc.alloc.rate.norm                /tmp/sbt_55e2f965  sample   18  6463578940.444 ±   48501608.706    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space            /tmp/sbt_55e2f965  sample   18         283.368 ±         18.931  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm       /tmp/sbt_55e2f965  sample   18  6458616490.667 ±  464534032.338    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen               /tmp/sbt_55e2f965  sample   18          14.097 ±         12.464  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm          /tmp/sbt_55e2f965  sample   18   327768881.778 ±  289039158.120    B/op
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space        /tmp/sbt_55e2f965  sample   18           7.283 ±          3.030  MB/sec
[info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm   /tmp/sbt_55e2f965  sample   18   167634607.111 ±   73095498.785    B/op
[info] HotScalacBenchmark.compile:·gc.count                          /tmp/sbt_55e2f965  sample   18         106.000                   counts
[info] HotScalacBenchmark.compile:·gc.time                           /tmp/sbt_55e2f965  sample   18       22282.000                       ms
[info] WarmScalacBenchmark.compile                                   /tmp/sbt_55e2f965  sample    3       53329.177 ±       5520.744   ms/op
[info] WarmScalacBenchmark.compile:compile·p0.00                     /tmp/sbt_55e2f965  sample            53016.003                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.50                     /tmp/sbt_55e2f965  sample            53351.547                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.90                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.95                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.999                    /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.9999                   /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:compile·p1.00                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] WarmScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965  sample    3         122.504 ±         12.714  MB/sec
[info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965  sample    3  6970875709.333 ±   68053507.295    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965  sample    3         116.331 ±         69.758  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965  sample    3  6619001184.000 ± 3344291079.994    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen              /tmp/sbt_55e2f965  sample    3           0.001 ±          0.027  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm         /tmp/sbt_55e2f965  sample    3       48530.667 ±    1533523.185    B/op
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965  sample    3           2.630 ±          0.121  MB/sec
[info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965  sample    3   149638445.333 ±   17459243.351    B/op
[info] WarmScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965  sample    3          79.000                   counts
[info] WarmScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965  sample    3        9138.000                       ms
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op
[info] ColdScalacBenchmark.compile:·gc.alloc.rate                    /tmp/sbt_55e2f965      ss   10         149.365 ±          3.975  MB/sec
[info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm               /tmp/sbt_55e2f965      ss   10  7202488660.000 ±   86462765.565    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space           /tmp/sbt_55e2f965      ss   10         139.420 ±          4.400  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm      /tmp/sbt_55e2f965      ss   10  6724772576.000 ±  286711519.947    B/op
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space       /tmp/sbt_55e2f965      ss   10           3.052 ±          0.295  MB/sec
[info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm  /tmp/sbt_55e2f965      ss   10   147178454.400 ±   13834656.130    B/op
[info] ColdScalacBenchmark.compile:·gc.count                         /tmp/sbt_55e2f965      ss   10         252.000                   counts
[info] ColdScalacBenchmark.compile:·gc.time                          /tmp/sbt_55e2f965      ss   10       29945.000                       ms
[success] Total time: 1575 s, completed Mar 6, 2017 3:47:55 PM
[success] Total time: 0 s, completed Mar 6, 2017 3:47:55 PM
@jvican
Copy link
Member

jvican commented Mar 7, 2017

No @fommil, to the best of my knowledge there is no tool to visualize JMH output from the results we got.

@dwijnand
Copy link
Member

dwijnand commented Mar 7, 2017

Hi all,

First, @wlk, I wanted to reassure you that this conversation has nothing to do with your change in particular, but about the topic in general. Please accept my apologies in the hostage takeover of your PR. 😄

To the rest of the Zinc team, thank you for engaging with me on this topic. I'm embarrassingly ignorant on the topic. I'm happy we're fleshing this topic out as it's an important one. I believe that on the whole we all want the same thing: a better, more correct and faster Zinc. We agree on more than we disagree on.

In particular thank you @jvican for the numerous references you've provided. Sadly I don't have time to fully understand benchmarking - "I don't want a PhD in benchmarking", I just need to understand enough to be able to make probably-correct decisions on changes and PRs.

With that out the way, on to the technical:

@jvican

With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's.

How did you derive "2.5-3x faster"?

Have a look at the 99 percentiles:

Before

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_51938d03  sample            24595.399                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_51938d03  sample            56304.337                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_51938d03      ss   10       48069.209 ±       3267.219   ms/op

After

[info] HotScalacBenchmark.compile:compile·p0.99                      /tmp/sbt_55e2f965  sample            22313.697                    ms/op
[info] WarmScalacBenchmark.compile:compile·p0.99                     /tmp/sbt_55e2f965  sample            53619.982                    ms/op
[info] ColdScalacBenchmark.compile                                   /tmp/sbt_55e2f965      ss   10       45129.825 ±       1432.394   ms/op

If your intuition fails you, have a look at the raw data of cold compilation. There are no percentiles (there's only one measurement) and you get the 3 seconds decrease.

  1. Why are there no percentiles but only one measurement for cold compilation?
  2. Did you mean to say "2.5-3s faster" instead of "2.5-3x faster"? There's a humongous difference..

Here's my reformatted and compressed summary of the change. Same raw numbers as above, with my calculations added. Again, please correct me where I'm wrong. Units is ms/op

 HotScalacBenchmark.compile:compile·p0.99  24595.399  ->  22313.697             =>  -2281.702 (-9.3%)
WarmScalacBenchmark.compile:compile·p0.99  56304.337  ->  53619.982             =>  -2684.355 (-4.8%)
ColdScalacBenchmark.compile     48069.209 ± 3267.219  ->  45129.825 ± 1432.394  =>  -2939.384 (-6.1%)

Executive summary: 2-3s faster, 5-9% faster.

That's money. Thank you @jvican for creating the ticket. Thank you @wlk for implementing it. And, again, thanks all for conversation. I for one am much better informed.

@dwijnand dwijnand merged commit 335a098 into sbt:1.0 Mar 7, 2017
@jvican
Copy link
Member

jvican commented Mar 7, 2017

Why are there no percentiles but only one measurement for cold compilation?

Because the JMH benchmarks for cold compilations are one-shot. This means that you don't warm up the JVM before by doing more work. As you only have one sample, per iteration, you cannot get the percentiles and you just get the average of the 10 one-shot benchmark runs.

The official JMH docs say this about the single shot option:

Single Shot Time 	Measures how long time a single benchmark method execution takes to run. This is good to test how it performs under a cold start (no JVM warm up). 

Did you mean to say "2.5-3s faster" instead of "2.5-3x faster"? There's a humongous difference..

I did mean seconds, not "x". Sorry for that typo.

And, again, thanks all for conversation. I for one am much better informed.

Thanks to you. I'm glad that from now on we know how to read benchmarks and reason about the changes. Hopefully, as we increase the performance of Zinc, we will all learn about this black magic that is benchmarking the JVM.

@wlk wlk deleted the issue-245 branch March 7, 2017 10:21
@wlk
Copy link
Contributor Author

wlk commented Mar 7, 2017

Thanks for the merge and the discussion, this is a good learning experience because I haven't use JMH before in practice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants