Skip to content
This repository has been archived by the owner. It is now read-only.

Add micro-benchmarks #19

Merged
merged 4 commits into from Jan 28, 2017

Conversation

Projects
None yet
9 participants
@julienrf
Copy link
Contributor

julienrf commented Jan 18, 2017

This PR adds a few micro-benchmarks that may help us to get some numbers when tuning our implementations.

I was not able to produce nice charts, the output is text-only (/cc @axel22).

@julienrf julienrf requested review from szeiger , odersky and Ichoran Jan 18, 2017

@julienrf julienrf referenced this pull request Jan 18, 2017

Closed

Immutable array #6

@szeiger

This comment has been minimized.

Copy link
Collaborator

szeiger commented Jan 18, 2017

You should talk to @lrytz and @retronym who are currently working on the benchmarking infrastructure for scalac.

build.sbt Outdated
@@ -16,5 +16,6 @@ parallelExecution in Test := false

libraryDependencies ++= Seq(
"org.scala-lang.modules" %% "scala-java8-compat" % "0.8.0",
"com.novocode" % "junit-interface" % "0.11" % "test"
"com.novocode" % "junit-interface" % "0.11" % "test",
"com.storm-enroute" %% "scalameter" % "0.8.2"

This comment has been minimized.

@szeiger

szeiger Jan 18, 2017

Collaborator

Have you considered JMH (via sbt-jmh)? That's what we used for other Scala library benchmarking so far.

This comment has been minimized.

@julienrf

julienrf Jan 18, 2017

Author Contributor

Yes, I actually started with JMH but didn’t find a way to make it correctly run the tests. The problem is that JMH has to guess how to build your test suite, and the only way you can tell it how to do so is via annotations with very poor composition capabilities. As a result, I was unable to define a configuration generating test data based on the test @Params (maybe that’s possible, but I couldn’t figure out how to do so…).
With scalameter, in contrast, the control is not inverted: you can build your test suite as you want and then ask scalameter to benchmark it.

Are there known issues in using scalameter?

This comment has been minimized.

@julienrf

julienrf Jan 18, 2017

Author Contributor

BTW, there is another team which is responsible of writing a more polished test suite. In the meanwhile, I think this one can give us a correct estimation of the differences between the various collection implementations.

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

There are no known issues in using scalameter that I am aware of, and I use it directly in my projects. I am not actively maintaining it, since I am satisfied with the current feature set, which is roughly measuring running time, measuring running time with outlier elimination, measuring running time with GC elimination, measuring memory footprint, measuring method invocation counts, measuring boxing counts, and measuring GC counts.

There is an HTML reporter that outputs a webpage with curves, and there is also a MongoDB reporter that flushes the results into a MongoDB instance.

I personally switched from using the Bench frontend to JBench frontend which is based on annotations, but both work fine.

This comment has been minimized.

@szeiger

szeiger Jan 19, 2017

Collaborator

I'm not aware of any issues. I don't know why we use sbt-jmh instead. Maybe for historical reasons.

This comment has been minimized.

@DarkDimius

DarkDimius Jan 19, 2017

Member

Based on my experience scalameter is really good in case you want to store the history of runs and maybe run regression testing. It's also good as long as value you're measuring isn't less than a millisecond.
Also, I've found JVM to be a better tool in answering the why? question, as it can produce both the heatmap and the actual generated assembly.
I think that for the kind of benchmarking that is done here now both would do fine.

This comment has been minimized.

@Ichoran

Ichoran Jan 19, 2017

Contributor

I remember when I started using JMH I found some workloads on which JMH gave useful results and Scalameter didn't. I don't remember any longer what this was, though, just that I specifically went looking for it based on the JMH documentation, and found it. Thyme also got it wrong but it usually got head-to-head comparisons qualitatively right due to explicitly testing them head-to-head. (But Thyme is vulnerable to peculiarities of JVM history, since it runs entirely within the VM you've already got.)

Anyway, for now, I think either Scalameter or JMH is fine. Scalameter is prettier.

@@ -0,0 +1,239 @@
package bench

This comment has been minimized.

@szeiger

szeiger Jan 18, 2017

Collaborator

These benchmarks shouldn't be in the main project. In fact, the more sbt projects I create, the more I am convinced that a single-project build in . (like I created it here) is a bad idea. Eventually you'll have to refactor it, which means moving all existing sources (or using an ugly, non-standard sbt setup or complicating things with configurations instead of projects).

This comment has been minimized.

@julienrf

julienrf Jan 18, 2017

Author Contributor

These benchmarks shouldn't be in the main project

Can you elaborate why?

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

If you want to use configurations, here is an example of how you can create a separate SBT configuration for benchmarks, which means that benchmarks:

https://github.com/scalameter/scalameter-examples/blob/master/basic-with-separate-config/build.sbt

If you do this, your dependency should have an extra % "bench" at the end.

Otherwise, you could use a separate project.

This comment has been minimized.

@szeiger

szeiger Jan 19, 2017

Collaborator

I suppose it's not really important here because we don't expect to publish any artifacts. In general you don't want to have test or benchmarking code be part of the main project / artifact to avoid cluttering the source file structure and introducing accidental dependencies.


benchmark("scala.List", scalaLists) { xs =>
var x: Object = null
xs.foreach(x = _)

This comment has been minimized.

@Ichoran

Ichoran Jan 18, 2017

Contributor

This can potentially be optimized away by a sufficiently aggressive JIT. You need something that leaves some sort of impact that depends on every element. For instance,

var n = 0
xs.foreach{ x => if (x eq null) n += 1 }
n

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

Agreed.

If you really want to be sure that this want be completely eliminated, you would need to assign the value n to a volatile variable at the end of the benchmark. The problem is that aggressive JIT optimizations could also compile the benchmark function (or its caller) and aggressively inline everything including the foreach call here.

This comment has been minimized.

@Ichoran

Ichoran Jan 18, 2017

Contributor

Doesn't ScalaMeter do a BlackHole-like thing to consume the results of a benchmark? It doesn't matter if the JIT compiler inlines the foreach because it will do that in production code, too, if it can. But it does matter if it can throw away the answer and thus throw away all the work.

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

It does so for memory footprint measurement, but it so far did not do this for time measurements. It does so for running time too from now on:

scalameter/scalameter@f82d561

About the foreach comment - I meant, inline both the caller and foreach in this test, which would be a prerequisite for completely throwing the result away.

}

benchmark("scala.List", scalaLists)(_.tail)
benchmark("List", lists)(_.tail)

This comment has been minimized.

@Ichoran

Ichoran Jan 18, 2017

Contributor

Does a single tail take enough time to be measurable? I rarely get good results on single ns-length operations even with JMH which arguably tries the hardest to make this work. You have to be careful with ArrayBuffer which will be O(n), but I think this may need a little more thought.

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

+1 - it looks like tail should be a super-fast operation here, and you will not get any good numbers.

Would be better to use the tail operation in a while loop.

This comment has been minimized.

@julienrf

julienrf Jan 18, 2017

Author Contributor

Indeed, I think I first did it in a loop but that was way too long with Arrays. I had the same issue with apply.

@@ -39,7 +39,7 @@ class ArrayBuffer[A] private (initElems: Array[AnyRef], initLength: Int)
start = 0
}
else {
val newelems = new Array[AnyRef](end * 2)
val newelems = new Array[AnyRef](if (end == 0) 16 else (end * 2))

This comment has been minimized.

@Ichoran

Ichoran Jan 18, 2017

Contributor

I think this algorithm could be improved in three aspects. First, allocating size 1 then 2 then 4 then 8 then 16 is pretty wasteful. I'd go for at least 4, maybe 8, on the first realloc. Second, no point doing an Array.copy if end==0. Third, you max out at (1 << 30) elements this way when you could have ((1 << 31) - 1) if you handled the last buffer increase separately. I know you didn't write the entire algorithm, but if we're improving this, let's do it all the way.

This comment has been minimized.

@julienrf

julienrf Jan 19, 2017

Author Contributor

I just created #20 to keep track of this.

This comment has been minimized.

@szeiger

szeiger Jan 19, 2017

Collaborator

Another problem of this implementation is integer overflow. We should always look at the current collections library for implementation details where these corner cases have already been discovered and fixed.

This comment has been minimized.

@Ichoran

Ichoran Jan 19, 2017

Contributor

The integer overflow isn't a terrible problem because you get an exception when you try to allocate -(1 << 31) bytes. It's not the ideal exception to have, but you were going to get an exception anyway, and practically nobody checks whether something is a NegativeArraySizeException or whatever we'd throw on a too-full array. But in general I fully agree that we need to keep careful watch on out of bounds errors and other such corner cases.

This comment has been minimized.

@szeiger

szeiger Jan 20, 2017

Collaborator

The overflow cuts the usable size for arrays in half. Once you exceed MaxInt/2 size any attempt to grow the buffer fails even though you should be able to grow up to MaxInt size.

This comment has been minimized.

@Ichoran

Ichoran Jan 21, 2017

Contributor

@szeiger - Yes, I already pointed out the "max out at (1 << 30)" problem. I thought you meant that there was an overflow problem beyond the lack of capacity that I mentioned the first time around.

@Ichoran

This comment has been minimized.

Copy link
Contributor

Ichoran commented Jan 18, 2017

Aside from using ScalaMeter instead of JMH, this looks reasonable to me. I haven't used ScalaMeter very much, so I can't vouch for its accuracy except to say that when I did head to head testing of ScalaMeter against Thyme (which I wrote), I didn't find any cases where Thyme seemed to get things right but ScalaMeter got things wrong.

Picking up the benchmarking framework from the main library might be better, though. I'm not sure. I don't think it's terribly important at this point as long as the tests are telling us something reasonably meaningful and we're not duplicating huge amounts of work.

@SethTisue

This comment has been minimized.

Copy link
Member

SethTisue commented Jan 18, 2017

Picking up the benchmarking framework from the main library might be better, though

not sure if everybody even knows about the existence of the (JMH-based) stuff in https://github.com/scala/scala/blob/2.12.x/test/benchmarks/README.md

that's what we normally use for benchmarking stdlib stuff, so if using the same stuff is workable here, we wouldn't have to wonder whether we're comparing apples and oranges when comparing results from two different tools

@axel22

axel22 approved these changes Jan 18, 2017

}

benchmark("scala.List", scalaLists)(_.tail)
benchmark("List", lists)(_.tail)

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

+1 - it looks like tail should be a super-fast operation here, and you will not get any good numbers.

Would be better to use the tail operation in a while loop.


benchmark("scala.List", scalaLists)(_.tail)
benchmark("List", lists)(_.tail)
benchmark("LazyList", lazyLists)(_.tail)

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

Same here.


benchmark("scala.List", scalaLists) { xs =>
var x: Object = null
xs.foreach(x = _)

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

Agreed.

If you really want to be sure that this want be completely eliminated, you would need to assign the value n to a volatile variable at the end of the benchmark. The problem is that aggressive JIT optimizations could also compile the benchmark function (or its caller) and aggressively inline everything including the foreach call here.


}

trait Cons extends Bench.ForkedTime with Generators {

This comment has been minimized.

@axel22

axel22 Jan 18, 2017

Member

Note - the ForkedTime predefined template ignores GC:

https://github.com/scalameter/scalameter/blob/dd7f057e70c4653ba4a38afe9699c5c4c1f097a3/src/main/scala/org/scalameter/Bench.scala#L108

This is more stable (hence useful if you care about regression testing), but depending on what you want to measure, it might make more sense to include GC pauses to the measurement. You can do this by overriding the measurer method of the test to return a Measurer.Default instance.

This comment has been minimized.

@julienrf

julienrf Jan 19, 2017

Author Contributor

I would say that, for micro-benchmarks like those I think it is better to ignore GCs (though for a more complex benchmark like a complete sorting algorithm, maybe it would be better to not ignore GCs). What do you think?

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 19, 2017

@Ichoran and @axel22, I tuned the benchmarks according to your comments: I perform tail and apply within a while loop to make it longer. I didn’t follow the @volatile trick, though.

@axel22

This comment has been minimized.

Copy link
Member

axel22 commented Jan 19, 2017

@DarkDimius DarkDimius referenced this pull request Jan 20, 2017

Closed

Comparison with JMH? #173

@szeiger

This comment has been minimized.

Copy link
Collaborator

szeiger commented Jan 20, 2017

We also discussed this internally in the Scala team and the consensus is that JMH is the preferred tool. The current benchmarking infrastructure is for the compiler only but could eventually be extended to cover the library, and it is based on JMH. The biggest argument against ScalaMeter though is binary compatibility. If there are any impediments that prevent us from getting useful numbers with JMH now, we can start with ScalaMeter, but all benchmarks that we want to keep around and move into the main Scala project together with the library will have to be ported to JMH eventually.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 20, 2017

@szeiger Alright, I will migrate to JMH.

@SethTisue

This comment has been minimized.

Copy link
Member

SethTisue commented Jan 20, 2017

The current benchmarking infrastructure is for the compiler only

what about the JMH-based stuff in https://github.com/scala/scala/tree/2.12.x/test/benchmarks ?

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jan 22, 2017

@julienrf julienrf force-pushed the benchmark branch from a56019d to 2b5fb0b Jan 23, 2017

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 23, 2017

I just migrated to JMH.

I found no equivalent of the “memory footprint” benchmark of ScalaMeter, so I added a small sub-project with some ad-hoc code doing that. This is not something I’m used to do, so please let me know if you think we should use ScalaMeter (or anything else) for that.

@retronym

This comment has been minimized.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 24, 2017

Thanks @retronym. I just gave it a try and got the following exception:

java.lang.IllegalAccessError: tried to access class org.openjdk.jol.vm.InstrumentationSupport$Installer from class org.openjdk.jol.vm.InstrumentationSupport
        at org.openjdk.jol.vm.InstrumentationSupport.saveAgentJar(InstrumentationSupport.java:109)
        at org.openjdk.jol.vm.InstrumentationSupport.tryDynamicAttach(InstrumentationSupport.java:93)
        at org.openjdk.jol.vm.InstrumentationSupport.instance(InstrumentationSupport.java:63)
        at org.openjdk.jol.vm.VM.current(VM.java:73)
        at org.openjdk.jol.info.GraphPathRecord.<init>(GraphPathRecord.java:60)
        at org.openjdk.jol.info.GraphPathRecord.<init>(GraphPathRecord.java:45)
        at org.openjdk.jol.info.GraphWalker.walk(GraphWalker.java:67)
        at org.openjdk.jol.info.GraphLayout.parseInstance(GraphLayout.java:56)
        at strawman.collection.MemoryFootprint$.$anonfun$benchmark$1(MemoryFootprint.scala:23)
        at scala.runtime.java8.JFunction1$mcJI$sp.apply(JFunction1$mcJI$sp.java:12)
        at scala.collection.immutable.List.map(List.scala:272)
        at strawman.collection.MemoryFootprint$.benchmark(MemoryFootprint.scala:22)

I also get the following error from time to time:

java.lang.IncompatibleClassChangeError: strawman.collection.MemoryFootprint and strawman.collection.MemoryFootprint$delayedInit$body disagree on InnerClasses attribute

This might be related to the nature of the bytecode generated by the Scala compiler.

Or maybe there is some installation step that is missing… I pushed the code in the benchmark-jol branch, in case you want to have a look.

I would be really happy to use the tools that you recommend but I must confess that I am super tired of working with Java-based tools that break your programming model.

@retronym

This comment has been minimized.

Copy link
Member

retronym commented Jan 24, 2017

Yeah, JOL and JMH are somewhat magic, but part of that is necessary complexity from the jobs they are doing. But there are real downsides to using Scala frameworks for testing the Scala distribution itself, for instance I've spent close to a week working through with Scalacheck/SBT in our build.

Hopefully #22 helps some.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 25, 2017

It is now possible to produce the following kind of charts for each benchmark (with the timeBenchmark/charts sbt task):

cons

@ktoso

This comment has been minimized.

Copy link

ktoso commented Jan 25, 2017

These are very cool @julienrf! Would you consider contributing the charts code back to sbt-jmh?
I'd love the scala benchmarking ecosystem as a whole get easier access to such things, it would improve the way people post results online a lot.

refs ktoso/sbt-jmh#103

@DarkDimius

This comment has been minimized.

Copy link
Member

DarkDimius commented Jan 25, 2017

@julienrf, would you consider making the y time-scale be linear instead of logarithmic? Log-scale makes it hard to see the factors.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 25, 2017

@ktoso The charts are specific to our benchmarks because they expect them to be parameterized by a size. Also, note that they add dependencies to JFreeChart and play-json to the build.

@ktoso

This comment has been minimized.

Copy link

ktoso commented Jan 25, 2017

The dependency is not a problem for the plugin, it could be an extra plugin. If you'd be willing to help out let's chat in the other ticket, would be very cool IMO.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 25, 2017

@DarkDimius Here is what we would get, for instance:

Linear scale:

cons

Log scale:

cons

IMHO the problem with the linear scale is that the high values on the right make it difficult to read the small values on the left.

I think log scales are good to visualize the behavior of one collection over the number of its elements.

If we want to compare the factors between different collection implementations, I would normalize the values so that for size n we compare values between 0 and 1, for size n + 1 we also compare values between 0 and 1, etc. I will try to implement that, to see if it is worth it.

@Ichoran

This comment has been minimized.

Copy link
Contributor

Ichoran commented Jan 26, 2017

I strongly support log scales for comprehensibility.

If you are just doing head-to-head comparisons, you can set one collection as the baseline and plot everything else as log ratio against that collection. This has the effect of spreading the Y-axis out enough so everything is visible. Takes a few seconds longer to grok, but often worth it if you otherwise can't see the differences.

@julienrf

This comment has been minimized.

Copy link
Contributor Author

julienrf commented Jan 26, 2017

@DarkDimius We could have something like the following to figure out factors between implementations:

cons-factors

For instance, we see that to build a collection of 8 elements, ArrayBuffer is ~30% faster than ListBuffer. With 512 elements, on the other hand, ListBuffer is ~20% faster than ArrayBuffer.

What do you think?

@szeiger

This comment has been minimized.

Copy link
Collaborator

szeiger commented Jan 26, 2017

Looks good! We should get this merged so we can use the infrastructure for benchmarking other changes.

@odersky

This comment has been minimized.

Copy link
Contributor

odersky commented Jan 28, 2017

Looks good to me. @julienrf please merge at your convenience.

@julienrf julienrf merged commit 960e273 into master Jan 28, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@julienrf julienrf deleted the benchmark branch Jan 28, 2017

@biboudis biboudis referenced this pull request Apr 26, 2017

Closed

Dotty benchmarks #2307

1 of 7 tasks complete
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.