-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace use of Scala collections by Java's #253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if my comments will provide any speedup in the end so all can be ignored (since except for them code LGTM).
cache += owner -> (handler -> newVisited) | ||
visited = newVisited | ||
visitedOwner = owner | ||
if (cache.containsKey(owner)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can always call get and check if this is not a null? I know this is ugly but we save multiple some set operation then.
usedNamesFromClasses.put(className, emptySet) | ||
emptySet | ||
def usedNamesFromClass(className: Name): util.HashSet[Name] = { | ||
if (usedNamesFromClasses.containsKey(className)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above :)
val newVisited = mutable.HashSet.empty[Type] | ||
visited = newVisited | ||
ownersCache += owner -> newVisited | ||
if (ownersCache.containsKey(owner)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is really good, thank you @wlk.
This PR reduces 700ms for Shapeless, which I consider an excellent improvement for such a small microptimization. I propose that we merge as soon as CI is green.
case (className: String, names: Iterable[String]) => | ||
names foreach { (name: String) => callback.usedName(className, name) } | ||
|
||
debuglog("The " + sourceFile + " contains the following used names:\n " + debugOutput(allUsedNames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're changing this code, minor suggestion: could we turn this into interpolated strings? As in s"The $sourceFile ... "
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -49,29 +52,39 @@ import Compat._ | |||
class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers { | |||
import global._ | |||
|
|||
def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = { | |||
def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think if we keep these as Java iterables? It could help us hide how they are actually implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, parially, I can replace the HashMap with Map, but replacing ArrayList with Iterable gives me this:
Error:(87, 5) type mismatch;
found : java.util.HashMap[String,java.util.ArrayList[String]]
required: java.util.Map[String,Iterable[String]]
Note: java.util.ArrayList[String] <: Iterable[String], but Java-defined trait Map is invariant in type V.
You may wish to investigate a wildcard type such as `_ <: Iterable[String]`. (SLS 3.2.10)
result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update, fixed this one, but requires using scala.lang.existentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to fall back to using concrete types because I was unable to make this work with Java's forEach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach
. Don't worry about this!
val usedName = it.next() | ||
val usedNameKey = usedName.getKey.toString.trim | ||
val usedNameValues = usedName.getValue.iterator() | ||
val uses = new ArrayList[String]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, it seems that we're intiializing an array list here every time we iterate through the class name entries. It would be nice if we could remove it.
It happens that HashSet
has a constant size
method, so do you think we could use that size to create an array of that concrete length and then change the return type of extract
? This would buy us the fact that we are not paying the cost of expanding the array list when the capacity is full because we know that size a priori. It can also help us with some spacial cache locality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I see what you mean by removing creation of ArrayList per iteration, do you have something specific in mind?
The other issue is fixed
@@ -13,6 +13,8 @@ class ExtractUsedNamesSpecification extends UnitSpec { | |||
val usedNames = compilerForTesting.extractUsedNamesFromSrc(src) | |||
val expectedNames = standardNames ++ Set("a", "A", "A2", "b") | |||
// names used at top level are attributed to the first class defined in a compilation unit | |||
|
|||
println(usedNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove this 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I need to take a look at the compilation problem, not sure why it runs locally OK |
"53955.527 ± 89475.604 ms/op" this measurement is unusable. For the hot compilation performance goes from "18956.390 ± 470.882 ms/op" to 18440.025 ± 359.687 ms/op". The difference is about the size of the error margin. Is that good enough? Also, given there have been more changes/commits, we should regenerate the benchmark results before jumping to conclusions about whether this compromise is worth merging. At least I think, but clearly I'm in the minority here. |
Yeah, I think this is fine.
I don't think we have to ask for new benchmarks every time a new change is committed, especifically if it does not affect the core logic... However, I'll be running this on a controlled environment and report on more results. But I think these changes are perfect to go. |
There was one complication that I didn't see before: Java collections have |
please take a look @jvican |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can advise if this is OK or maybe there's better way to do this?
I've just pushed a new commit that solves minor issues, including this one 😉. So you don't have to worry. I think the PR is perfect.
I have a busy agenda today, but I'll make sure that I run the benchmarks before tomorrow so that this can get in as soon as possible.
Again, great work.
@@ -49,29 +52,39 @@ import Compat._ | |||
class ExtractUsedNames[GlobalType <: CallbackGlobal](val global: GlobalType) extends Compat with ClassName with GlobalHelpers { | |||
import global._ | |||
|
|||
def extract(unit: CompilationUnit): Iterable[(String, Iterable[String])] = { | |||
def extract(unit: CompilationUnit): HashMap[String, ArrayList[String]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the issue here is that in Scala 2.11 you don't have SAM types, so you cannot use Scala functions instead of the required input for foreach
. Don't worry about this!
@wlk Can you squash all the failing commit messages? |
I have very good news! I've run this on the Scala standard library, these are the results. With Scala collections:
With Java collections:
Basically, we have a win of ~2.5 seconds in hot compilation, ~2.8 seconds in warm compilation and 3 seconds in cold compilation. This confirms my initial guess in #245 that this change will be significant. I would expect these speedups to increase for very big codebases (/cc @romanowski). This PR is good to go. @wlk Could you attach these benchmark results in your squashed commit message and prepend it with "Fix #245: ... bla bla"? This a GitHub trick that will close the issue whenever the PR is merged. |
@jvican I don't mean to be a bore, but could you help me understand your numbers?
Here's a summary of what I see:
I'm no mathematician or statistician (/cc @fommil) but from my understanding because the delta falls within the error margin this is NOT a win, it is impossible to tell which number is larger and this change is a statical tie in terms of performance. Please correct me if I'm wrong, and where. I know in general Java collections are faster than Scala collections. But it's equally said to not take things as fact but measure and benchmark them. And if we are to benchmark them we should act according to the benchmark results. In the face of these results (and my analysis of them above) I vote to reject this change, favouring the Scala collections over the Java ones (for many reasons). But I strongly welcome more attempts to make Zinc faster. |
@dwijnand I can see your point but there another win: error margin in version using Java collection is smaller what means that performance is more stable - what itself is a win. The price that we pay for this is quite small (few while loops). I am ok with that. @jvican I won't be able to test this soon (since there was quite a lot of changes to zinc recently and my 'big project integration' is a little bit behind zinc). @rorygraves this may be intersting for your scalac optimalization - especially that switching small part of build (~5-6%) of Scala compilation can reduce difference in build times ~25% (but I am not a statistician so please correct me if I am wrong). |
|
||
private def debugOutput(map: JavaMap[String, Array[String]]): String = { | ||
val stringBuffer = new StringBuffer() | ||
val it = map.entrySet().iterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment in 2 places
I've given the approval above, but in light of the benchmark data, I'm leaning towards @dwijnand's opinion. In general Java style (while loop etc) is more error prone to maintain, so unless there's an observable performance improvements we shouldn't be replacing well-tested bridge code. |
Hi Dale, you're not looking at the data correctly. Benchmarks are compared in percentiles, not averages. Averages don't matter. In particular, for this scenario, I've used percentiles 99 to compare, hence my conclusions. The thing is that percentiles are the ones that describe the accuracy of comparison taking into account the variations of the measurements. The higher the percentile, the more accurate a conclusion is. This is how any data measurement in the JVM world is done, particularly with JMH. With this in mind, we can safely extract that Java collections are 2.5-3x faster than Scala's. Therefore, I consider this to be a very good change that should be merged. |
Percentile maximum can give insight into how spread out the sample measurements are, not how accurate. Before (p0.90):
After (p0.90):
In the above, 90% of the 18 sample measurements fit inside 22541.867 ms/op envelope vs 22132.503 ms/op. So there's approximately 409 ms speedup. Another useful point to observe is the median point (p0.50). Before (p0.50):
After (p0.50):
Here also 50% of the 18 sample measurements fit inside 21407.728 ms/op envelope vs 21189.624 ms/op. Here the speedup observed is approximately 218 ms. So there is a consistent trend for speedup. However, if we look at the difference between the median and p0.90, the difference is around 1134ms for Scala, and 943ms for Java. So in that sense, the speedup observed here might not be significant.
How did you derive "2.5-3x faster"? |
This does not tell you anything about the benchmark data. Please read the articles I've linked to you and related literature.
You're comparing averages. This is not playing by the book and it's not giving you any interesting data. The distribution of the results is not represented in its values.
A percentile of 90 is not enough. Common studies in the benchmarking world show that normal users experience 99 percentiles, not 90 percentiles. For more information, I defer you to Gil Tene's talk (https://www.youtube.com/watch?v=lJ8ydIuPFeU) and his slides https://www.azul.com/files/HowNotToMeasureLatency_LLSummit_NYC_12Nov2013. Gil Tene has been benchmarking and improving the JVM and its GC for the last 20 years. I think he knows more than we all do.
Have a look at the 99 percentiles: Before
After
If your intuition fails you, have a look at the raw data of cold compilation. There are no percentiles (there's only one measurement) and you get the 3 seconds decrease. |
p0.50 is not an average, it's median. In determining whether a particular change to the code resulted to significant speedup or not you need to understand median and the spread. This is why benchmarks are written as 21213.858 ± 627.059 ms/op. For our purpose, I'm fine with using p1.0 (maximum of all sample) to compare the worst case scenario. Whether we use p0.99, p0.999, or p0.999 is insignificant in this case, especially because we're not plotting some graph to hide the outliers. |
Again, this is totally useless.
Again, false. Please, take the time to read the benchmarking literature. You're not reading the results correctly.
No, outliers are everything and whether they are plotted or not doesn't make a difference. Outliers are the ones that give us the accuracy that we want and happen more often than we think (hence the margin error and the variations experienced). Computers are really subject to variations and the only predictable way in which we can study its behaviour is by studying the percentiles, not the median. |
By the way, you said something correct in the previous paragraph: "you need to understand the spread" (I guess you mean the distribution of the results). Yes, exactly. As you need to understand the distribution you always use percentiles above 99. They will give you the correct behaviour of your program 99% of the times by being conservative on the distribution ("spread") of the results. I raise again my point about cold compilation (that uses 10 iterations and brings a decrease of 3 seconds). Please, reading benchmarks is not a thing that can be done out of intuition. I've linked to several experts in this field that use correct and predictable methodologies to compare changes that they have developed over the years. I would suggest that we don't ignore their expertise and experience. |
Ironically the evidence presented here, i.e. 48069.209 ± 3267.219 ms/op vs 45129.825 ± 1432.394 ms/op is median, which you condemned as useless :) |
We are measure full compilation times here IIRC (@jvican correct me if I am wrong). Taking into account that zinc-related phases takes ~5% of compilation time then even 200 ms speed up on cold compilation give us ~10% speed up (but I think we will see bigger gains). I think few while loops it not big price for 0.5% compiler speedup. |
Getting back to Wojciech's work, if we're seeing 3s decrease in cold compilation of Shapeless I think that's pretty cool. |
These benchmarks are from the standard library, but yes, I also think this is pretty cool.
They are useless for scientific, predictable benchmarking. I mentioned them because they give, to some extent, some intuition of how the program behaves, but they cannot be used to compare fairly the running times of several iterations because they don't represent the full reality. This is why I said "If your intuition fails you" before suggesting to have a look at them.
Where did you get this data? My results tell me that they represent more than just 5% of overall compilation.
Totally agreed. The results we have have some variation, of course, because of cache and memory layout, unpredictability of GC, JVM behaviour and scheduler. But the fact is that this patch improves considerably the behaviour of Zinc phases in 99% of the cases, which is what we should care about. The compilation of the standard library lasts only ~20 seconds. For a bigger project --talking about projects that compile for minutes or even hours--, these changes are huge. We should aim for satisfying the need of the companies having such big codebases, some of which I represent as a Scala Center employee. |
TBH it was quite big 'floor' to make my numbers round (and even using smaller numbers speed up is quite significant). |
I'm of the opinion that all uses of the Scala mutable collections should be removed. They are inefficient and at the point you decide to use mutable collections, you may as well go Premium. They are critical to ensime indexer performance. |
85d191e
to
da52498
Compare
am I the only one that finds the error column totally confusing? Absolute percentiles is far better... is there a tool to visualise the output? |
…scala and related classes Benchmark results: With Scala collections: [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_51938d03 sample 18 21437.554 ± 949.595 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_51938d03 sample 19964.887 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_51938d03 sample 21407.728 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_51938d03 sample 22541.867 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_51938d03 sample 24595.399 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 sample 18 280.535 ± 11.459 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 sample 18 6446299745.333 ± 14785544.668 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 sample 18 276.572 ± 22.333 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 sample 18 6355643405.333 ± 460709368.527 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_51938d03 sample 18 11.677 ± 12.619 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_51938d03 sample 18 275401288.000 ± 296971980.543 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 sample 18 6.754 ± 1.900 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 sample 18 155936858.667 ± 45709221.663 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 sample 18 98.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 sample 18 19073.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_51938d03 sample 3 55118.747 ± 18980.181 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_51938d03 sample 54358.180 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_51938d03 sample 54693.724 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_51938d03 sample 56304.337 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 sample 3 119.464 ± 35.856 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 sample 3 7018134616.000 ± 329861736.207 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 sample 3 109.472 ± 43.096 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 sample 3 6431816405.333 ± 2656847732.046 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 sample 3 2.692 ± 2.508 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 sample 3 158121976.000 ± 114580201.514 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 sample 3 76.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 sample 3 9389.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_51938d03 ss 10 48069.209 ± 3267.219 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_51938d03 ss 10 140.486 ± 9.407 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_51938d03 ss 10 7201372165.600 ± 51736683.622 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_51938d03 ss 10 133.059 ± 9.610 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_51938d03 ss 10 6824023763.200 ± 383438006.440 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_51938d03 ss 10 3.092 ± 0.517 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_51938d03 ss 10 159124314.400 ± 34649991.106 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_51938d03 ss 10 255.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_51938d03 ss 10 30967.000 ms With Java collections: [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_55e2f965 sample 18 21213.858 ± 627.059 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_55e2f965 sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_55e2f965 sample 21189.624 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_55e2f965 sample 22132.503 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_55e2f965 sample 22313.697 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 sample 18 283.808 ± 7.726 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 sample 18 6463578940.444 ± 48501608.706 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 sample 18 283.368 ± 18.931 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 sample 18 6458616490.667 ± 464534032.338 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_55e2f965 sample 18 14.097 ± 12.464 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_55e2f965 sample 18 327768881.778 ± 289039158.120 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 sample 18 7.283 ± 3.030 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 sample 18 167634607.111 ± 73095498.785 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 sample 18 106.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 sample 18 22282.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_55e2f965 sample 3 53329.177 ± 5520.744 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_55e2f965 sample 53016.003 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_55e2f965 sample 53351.547 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_55e2f965 sample 53619.982 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 sample 3 122.504 ± 12.714 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 sample 3 6970875709.333 ± 68053507.295 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 sample 3 116.331 ± 69.758 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 sample 3 6619001184.000 ± 3344291079.994 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_55e2f965 sample 3 0.001 ± 0.027 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_55e2f965 sample 3 48530.667 ± 1533523.185 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 sample 3 2.630 ± 0.121 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 sample 3 149638445.333 ± 17459243.351 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 sample 3 79.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 sample 3 9138.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_55e2f965 ss 10 45129.825 ± 1432.394 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_55e2f965 ss 10 149.365 ± 3.975 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_55e2f965 ss 10 7202488660.000 ± 86462765.565 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_55e2f965 ss 10 139.420 ± 4.400 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_55e2f965 ss 10 6724772576.000 ± 286711519.947 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_55e2f965 ss 10 3.052 ± 0.295 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_55e2f965 ss 10 147178454.400 ± 13834656.130 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_55e2f965 ss 10 252.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_55e2f965 ss 10 29945.000 ms [success] Total time: 1575 s, completed Mar 6, 2017 3:47:55 PM [success] Total time: 0 s, completed Mar 6, 2017 3:47:55 PM
No @fommil, to the best of my knowledge there is no tool to visualize JMH output from the results we got. |
Hi all, First, @wlk, I wanted to reassure you that this conversation has nothing to do with your change in particular, but about the topic in general. Please accept my apologies in the hostage takeover of your PR. 😄 To the rest of the Zinc team, thank you for engaging with me on this topic. I'm embarrassingly ignorant on the topic. I'm happy we're fleshing this topic out as it's an important one. I believe that on the whole we all want the same thing: a better, more correct and faster Zinc. We agree on more than we disagree on. In particular thank you @jvican for the numerous references you've provided. Sadly I don't have time to fully understand benchmarking - "I don't want a PhD in benchmarking", I just need to understand enough to be able to make probably-correct decisions on changes and PRs. With that out the way, on to the technical:
Here's my reformatted and compressed summary of the change. Same raw numbers as above, with my calculations added. Again, please correct me where I'm wrong. Units is ms/op
Executive summary: 2-3s faster, 5-9% faster. That's money. Thank you @jvican for creating the ticket. Thank you @wlk for implementing it. And, again, thanks all for conversation. I for one am much better informed. |
Because the JMH benchmarks for cold compilations are one-shot. This means that you don't warm up the JVM before by doing more work. As you only have one sample, per iteration, you cannot get the percentiles and you just get the average of the 10 one-shot benchmark runs. The official JMH docs say this about the single shot option:
I did mean seconds, not "x". Sorry for that typo.
Thanks to you. I'm glad that from now on we know how to read benchmarks and reason about the changes. Hopefully, as we increase the performance of Zinc, we will all learn about this black magic that is benchmarking the JVM. |
Thanks for the merge and the discussion, this is a good learning experience because I haven't use JMH before in practice |
issue #245
Tested on
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz
shapeless benchmark on
1.0
:shapeless benchmark on
issue-245
: