Performance drop off of ~18% in newer JDK vs. JDK 8 #5789
Mentioned this in passing to @headius thinking it was known but he urged me to share so here I am :)
I was doing my "semi-regular" benchmark of rubykon (an MCTS Go-bot implemented in Ruby, not very good mind you ;) ). The benchmark I run here does a full Monte Carlo Tree Search with 1000 playouts, results are reported in iterations per minute (the higher the better).
In my reports JRuby was slower than in the last editions and the only thing that really changed was the JVM (well and Spectre and Melt).
as I'm currently looking for jobs, the blog post has been put on the backburner but here's a preliminary share of the results of JVM x JRuby:
(remember iterations per minute, higher is better)
ID denotes running JRuby with
The notable thing here focussing on JRuby 9.2 ID is a that newer JDKs run at ~12 i/min while the old ones ran at ~14.7 which is about a ~18% drop in performance. It's much worse for JRuby 9.1.
You can see the full run here (it includes standard deviation which was usually small,
Running on the newer JDKs I get this warnings once at the start:
(I think they're the same but didn't double check)
As always, thank you all for your fantastic work
If there's any additional information you'd need please ping me, although I hope I covered everything.
The text was updated successfully, but these errors were encountered:
Thanks for the report @PragTob!
Reproduced on a build of JDK-12 from a few months ago. With Java 8, I got a final result of 10.56ips iterations, and with JDK-12 I got 9.24ips. Similar drop-off to your results.
Interestingly I got the same results with JDK-11, even though I've not seen any other benchmarks degrade going from 8 to 11. There may be something specific about rubykon that's optimizing poorly.
Good news! After analyzing a run using async-profiler I noticed a large chunk of the resulting flamegraph was spent inside the G1 garbage collector, which became the default starting with Java 9. G1 is better with large heaps and somewhat more compact in memory, but it does not have the throughput of the parallel collector, so on that hunch I added a flag to switch GCs:
Boom! The new number for Java 12 is 12.25ips, which appears to be the best JRuby result out of all configurations so far, a solid 13% faster than the Java 8 result I posted above.
I would still like to know why G1 has such a visible degradation. We are obviously very allocation-heavy.
Tested with recent JDK 13 EA and multiple collectors. Judging from GC logs, it is heavily-allocating, but fairly young-gc workload. Both Parallel and G1 run very short Young GCs during the run, taking about 1% of total time, which means allocation pressure itself is not the issue here.
Per collector experiments:
I do believe this is caused by GC barriers: any GC that does something non-trivial barrier-wise runs slower. Shenandoah in "passive" mode (has all barriers disabled) runs the same as Parallel, and gets progressively worse as we enable two major barriers: SATB (like in G1), and LRB. Alas, this is the price we pay for better GC latency.
Hey fascinating stuff! A bit OT, but GCs are super interesting is this a good source to learn about barriers and why/how they affect performance: https://shipilev.net/jvm/anatomy-quarks/13-intergenerational-barriers/ ?
Other reading suggestions welcome, thanks!