Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use invokedynamic for lambdas (behind -Ydelambdafy:method) #8359

Closed
scabug opened this issue Mar 4, 2014 · 26 comments
Closed

Use invokedynamic for lambdas (behind -Ydelambdafy:method) #8359

scabug opened this issue Mar 4, 2014 · 26 comments

Comments

@scabug
Copy link

@scabug scabug commented Mar 4, 2014

The implementation of lambdas in Java 8 is more performant[1] than the current implementation of lambdas in Scala. Moreover it is more lightweight in that no synthetic class definitions are emitted; this means fewer classes to classload [2], translating into compilation speed and perm-gen benefits. The generated bytecode is also much shorter [3], allowing for more inlining opportunities. Lastly, as no new anonymous class instance is created per execution (rather an invokedynamic+ execution local class call takes place), this helps with garbage collection (fewer heap allocations). This may also help with cache locality.

A presentation [4] by Vlastimik Mencik claims that Scala 2.12.x will target Java 8 but it's not clear if indy (invokedynamic) will be used or not. So I'm starting this to track progress and potentially get confirmation about whether this is a formal decision, given that Martin Odersky seems to be saying [5] this is not possible without breaking binary compatibility.

In my humble view, the benefits are too great to ignore and this impacts myself and a lot of us who want to continue using functional concepts in high-performance applications.

[1] p32, http://www.slideshare.net/jaxlondon2012/lambda-a-peek-under-the-hood-brian-goetz
[2] p6-7, http://www.slideshare.net/czechscala/java-8-under-the-hood
[3] http://www.javacodegeeks.com/2014/01/compiling-lambda-expressions-scala-vs-java-8.html
[4] p13, http://www.slideshare.net/czechscala/java-8-under-the-hood
[5]

@scabug
Copy link
Author

@scabug scabug commented Mar 4, 2014

Imported From: https://issues.scala-lang.org/browse/SI-8359?orig=1
Reporter: Nik (cosmicvisitors)
Affected Versions: 2.11.0, 2.12.0-RC1

@scabug
Copy link
Author

@scabug scabug commented Mar 4, 2014

@adriaanm said:
Thanks for writing this down! We're working on this and related issues. Here's one piece of the puzzle: https://github.com/retronym/java-8-function1

@scabug
Copy link
Author

@scabug scabug commented Mar 5, 2014

@gkossakowski said:
Hi Nik,

Java 8 doesn't have more performant lambdas implementation compared to anonymous classes. With Java 8, you need to load a class per every lambda but the class is generated at runtime. The size of the resulting bytecode is the same as with anonymous classes. I explained those issues in detail, here:

http://www.takipiblog.com/2014/01/16/compiling-lambda-expressions-scala-vs-java-8/#comment-1206323508
http://www.takipiblog.com/2014/01/16/compiling-lambda-expressions-scala-vs-java-8/#comment-1209699556

So the benefits are not that big as they might appear from your description. However, there are still benefits like smaller jars (but the same bytecode at runtime!) that are worth pursuing Java 8-style translation of lambdas.

@scabug
Copy link
Author

@scabug scabug commented Mar 5, 2014

Nik (cosmicvisitors) said:
My description linked a benchmark where anonymous classes vs invokedynamic are compared which shows there are runtime performance benefits (above and beyond extra classes to load and compilation speed and GC benefits). The link you posted is just a comment you've made in another site. You should produce some proof to back your claims. I plan on running my own benchmark and will post the findings here.

@scabug
Copy link
Author

@scabug scabug commented Mar 5, 2014

@gkossakowski said:
Nik,

In the comments I linked to I described the reasons why there can't be any performance difference: Java 8 expands invokedynamc calls to regular anonymous classes generated at runtime. My claim is supported with the link to documentation by Brian Goetz (from Java compiler team) which explains lambda translation in great detail. That's the proof. :)

In the second comment I linked I admit that Java 8 is indeed faster when it comes to lambdas that do not close over variables but this optimization is not related to use of invokedynamic.

Still, I would be interested in results of your benchmarks.

@scabug
Copy link
Author

@scabug scabug commented Mar 7, 2014

Nik (cosmicvisitors) said (edited on Mar 8, 2014 3:21:46 PM UTC):
I have now completed a preliminary set of benchmarks for simple "foreach" lambdas (but not closures) using JMH; the results are extremely interesting!

// Here's the Java version
@State(Scope.Thread)
public class Main
{
private final List testValues = Arrays.asList("a", "bb", "c", "dd", "e", "ff");

@GenerateMicroBenchmark
public void myTest()
throws InterruptedException
{
testValues.forEach( p -> p.length() );
}
}

Iteration 1: 129700.223 ops/ms
Iteration 2: 129887.216 ops/ms
Iteration 3: 129932.265 ops/ms
Iteration 4: 129925.015 ops/ms
Iteration 5: 129940.037 ops/ms
Iteration 6: 130211.447 ops/ms
Iteration 7: 130111.218 ops/ms
Iteration 8: 130020.330 ops/ms
Iteration 9: 129785.982 ops/ms
Iteration 10: 130034.805 ops/ms
Iteration 11: 130027.400 ops/ms
Iteration 12: 129799.267 ops/ms
Iteration 13: 130067.029 ops/ms
Iteration 14: 129933.144 ops/ms
Iteration 15: 129782.205 ops/ms
Iteration 16: 130047.119 ops/ms
Iteration 17: 129964.672 ops/ms
Iteration 18: 129829.663 ops/ms
Iteration 19: 130142.463 ops/ms
Iteration 20: 129720.404 ops/ms

Result : 129943.095 ±(99.9%) 123.534 ops/ms
Statistics: (min, avg, max) = (129700.223, 129943.095, 130211.447), stdev = 142.262
Confidence interval (99.9%): [129819.561, 130066.629]

// Here's the Scala version
@State(Scope.Thread)
class MyBenchmark {

final val testValues = Array("a", "bb", "c", "dd", "e", "ff").toBuffer

@GenerateMicroBenchmark
def test() {
testValues.foreach(x => x.length())
}
}

Iteration 1: 121176.798 ops/ms
Iteration 2: 121206.352 ops/ms
Iteration 3: 122627.047 ops/ms
Iteration 4: 122361.254 ops/ms
Iteration 5: 122044.951 ops/ms
Iteration 6: 121160.859 ops/ms
Iteration 7: 121245.793 ops/ms
Iteration 8: 122784.802 ops/ms
Iteration 9: 121072.250 ops/ms
Iteration 10: 121485.324 ops/ms
Iteration 11: 121488.497 ops/ms
Iteration 12: 121888.145 ops/ms
Iteration 13: 120965.945 ops/ms
Iteration 14: 121130.525 ops/ms
Iteration 15: 121521.937 ops/ms
Iteration 16: 122170.860 ops/ms
Iteration 17: 122369.025 ops/ms
Iteration 18: 121495.005 ops/ms
Iteration 19: 122511.899 ops/ms
Iteration 20: 121582.566 ops/ms

Result : 121714.492 ±(99.9%) 504.060 ops/ms
Statistics: (min, avg, max) = (120965.945, 121714.492, 122784.802), stdev = 580.476
Confidence interval (99.9%): [121210.432, 122218.552]

So basically it seems the Java version of lambdas executing a simple foreach lambda is a bit faster than Scala's version, when using the defacto mutable list data structure in each language. However this could be attributed to the differences in implementation of mutable lists of each language; it is not conclusive (although most people will realistically see this kind of performance for lists).

The best results we could get would be from some data structure which is implemented in the same way in both languages (or is exactly the same) such as an array.
However, the reason I benchmarked against lists is that (and this is baffling... can it really be the case?), I could not find a foreach() method on Java 8 arrays.
This is very frustrating...

On the other hand, Scala does allow me to run a foreach lambda on an array, so I tried this as well, as I tend to use a lot of arrays in my Scala code.
Actually I try to avoid mutable collections due to their drawbacks and prefer Arrays/Seqs.

// Scala array version
@State(Scope.Thread)
class MyBenchmark {

final val testValues = Array("a", "bb", "c", "dd", "e", "ff")

@GenerateMicroBenchmark
def test() {
testValues.foreach(x => x.length())
}
}

Iteration 1: 158810.042 ops/ms
Iteration 2: 158133.661 ops/ms
Iteration 3: 158151.405 ops/ms
Iteration 4: 157634.051 ops/ms
Iteration 5: 158225.598 ops/ms
Iteration 6: 158189.632 ops/ms
Iteration 7: 158677.971 ops/ms
Iteration 8: 158221.994 ops/ms
Iteration 9: 158284.515 ops/ms
Iteration 10: 157807.267 ops/ms
Iteration 11: 158571.406 ops/ms
Iteration 12: 158202.246 ops/ms
Iteration 13: 158525.803 ops/ms
Iteration 14: 158286.323 ops/ms
Iteration 15: 157755.862 ops/ms
Iteration 16: 158437.783 ops/ms
Iteration 17: 158230.546 ops/ms
Iteration 18: 158245.174 ops/ms
Iteration 19: 158617.236 ops/ms
Iteration 20: 158382.108 ops/ms

Result : 158269.531 ±(99.9%) 260.444 ops/ms
Statistics: (min, avg, max) = (157634.051, 158269.531, 158810.042), stdev = 299.928
Confidence interval (99.9%): [158009.087, 158529.975]

So there, Scala lambdas may execute faster than Java's, if you use arrays; not necessarily because of their implementation, but because of their flexibility!

@scabug
Copy link
Author

@scabug scabug commented Mar 7, 2014

@adriaanm said:
Cool! Thanks for the detailed analysis!

@scabug
Copy link
Author

@scabug scabug commented Mar 10, 2014

Nik (cosmicvisitors) said:
Hi

I received a comment from Alexey Shipilev (from Oracle), an expert in JMH micro-benchmarking, and he suggested that as the length() integer is not used, there may be some dead code elimination optimization taking place that may skew results. I've now re-run the micro-benchmarks with using the value returned from the lambda, and now it seems the Scala version wins when using lambdas on lists as well.

// Java version
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.logic.BlackHole;

@State(Scope.Thread)
public class Main
{
private final List testValues = Arrays.asList("a", "bb", "c", "dd", "e", "ff");

@GenerateMicroBenchmark
public void myTest()
throws InterruptedException
{
testValues.forEach( p -> new BlackHole().consume(p.length()));
}
}

Iteration 1: 1783.691 ops/ms
Iteration 2: 1780.233 ops/ms
Iteration 3: 1754.277 ops/ms
Iteration 4: 1797.102 ops/ms
Iteration 5: 1781.456 ops/ms
Iteration 6: 1764.169 ops/ms
Iteration 7: 1824.704 ops/ms
Iteration 8: 1787.806 ops/ms
Iteration 9: 1813.450 ops/ms
Iteration 10: 1780.823 ops/ms
Iteration 11: 1806.183 ops/ms
Iteration 12: 1782.916 ops/ms
Iteration 13: 1781.730 ops/ms
Iteration 14: 1788.937 ops/ms
Iteration 15: 1792.292 ops/ms
Iteration 16: 1782.705 ops/ms
Iteration 17: 1786.505 ops/ms
Iteration 18: 1781.541 ops/ms
Iteration 19: 1787.609 ops/ms
Iteration 20: 1783.801 ops/ms

Result : 1787.097 ±(99.9%) 13.281 ops/ms
Statistics: (min, avg, max) = (1754.277, 1787.097, 1824.704), stdev = 15.295
Confidence interval (99.9%): [1773.815, 1800.378]

// Scala version
import org.openjdk.jmh.annotations._
import org.openjdk.jmh.logic.BlackHole

@State(Scope.Thread)
class MyBenchmark {

final val testValues = Array("a", "bb", "c", "dd", "e", "ff").toBuffer

@GenerateMicroBenchmark
def test() {
testValues.foreach(x => new BlackHole().consume(x.length()))
}
}

Iteration 1: 1957.770 ops/ms
Iteration 2: 1916.698 ops/ms
Iteration 3: 1975.874 ops/ms
Iteration 4: 1949.468 ops/ms
Iteration 5: 1951.326 ops/ms
Iteration 6: 1933.750 ops/ms
Iteration 7: 1911.839 ops/ms
Iteration 8: 1894.889 ops/ms
Iteration 9: 1918.366 ops/ms
Iteration 10: 1923.638 ops/ms
Iteration 11: 1918.477 ops/ms
Iteration 12: 1994.459 ops/ms
Iteration 13: 1958.502 ops/ms
Iteration 14: 1928.451 ops/ms
Iteration 15: 1984.478 ops/ms
Iteration 16: 1944.786 ops/ms
Iteration 17: 1976.232 ops/ms
Iteration 18: 1905.595 ops/ms
Iteration 19: 1930.285 ops/ms
Iteration 20: 1964.354 ops/ms

Result : 1941.962 ±(99.9%) 24.382 ops/ms
Statistics: (min, avg, max) = (1894.889, 1941.962, 1994.459), stdev = 28.078
Confidence interval (99.9%): [1917.580, 1966.344]

All the results in this page were ran with JDK 1.8.0 build129, both Java & Scala versions of programs ran on the same JRE version. Hardware: Intel Core i5 3570K (Ivy Bridge) @ 3.4GHz

Given these results, I think perhaps the JDK guys will need to spend some time improving the performance of the current invokedynamic implementation, in their next releases. Hopefully if there is some development on the Scala side to use it, as Adriaan has indicated, switching to it (given some potential speed advantage in the future) will be easily possible.

Regards,
Nik

@scabug
Copy link
Author

@scabug scabug commented Mar 10, 2014

@gkossakowski said:
Thanks Nik for sharing your results. Which version of Scala did you use for benchmarking?

@scabug
Copy link
Author

@scabug scabug commented Mar 13, 2014

Nik (cosmicvisitors) said:
Version 2.10.1

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@retronym said (edited on Mar 14, 2014 2:25:52 PM UTC):
If we remove the ArrayBuffer vs ArrayList noise and just profile lambda vs anon functions against ArrayList

/*
 * Copyright (c) 2005, 2013, Oracle and/or its affiliates. All rights reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.  Oracle designates this
 * particular file as subject to the "Classpath" exception as provided
 * by Oracle in the LICENSE file that accompanied this code.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
 * or visit www.oracle.com if you need additional information or have any
 * questions.
 */

package org.sample;

import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.logic.BlackHole;

import java.util.Arrays;
import java.util.List;
import java.util.function.Consumer;

@State(Scope.Benchmark)
public class MyBenchmark {
    private final List<String> testValues = Arrays.asList("a", "bb", "c", "dd", "e", "ff");

    @GenerateMicroBenchmark
    public void lambda() throws InterruptedException { testValues.forEach(p -> new BlackHole().consume(p.length())); }

    @GenerateMicroBenchmark
    public void anonClass() throws InterruptedException {
        Consumer<String> action = new Consumer<String>() {
            public void accept(String x) {
                new BlackHole().consume(x.length());
            }
        };
        testValues.forEach(action);
    }
}
% (export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/; $JAVA_HOME/bin/java -version )

java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b124)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b66, mixed mode)

% (export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/; $JAVA_HOME/bin/java -jar target/microbenchmarks.jar )
...
Benchmark                     Mode   Samples         Mean   Mean error    Units
o.s.MyBenchmark.anonClass    thrpt       200     1656.678        7.368   ops/ms
o.s.MyBenchmark.lambda       thrpt       200     1627.326        9.527   ops/ms

I suspect the slight win for the lambda is the fact it is statically hoisted, as it doesn't close over any free variables.

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@adriaanm said (edited on Mar 14, 2014 7:05:03 PM UTC):
Isn't higher thr(ough)p(u)t better? (So, anonClass is better than lambda?)

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@gkossakowski said:
I think the answer is yes. However, that result would be rather surprising. The only thing that comes to my mind is that ConstantCallSite returned by invokedynamic trips JIT.

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@retronym said (edited on Mar 14, 2014 9:14:05 PM UTC):
Right you are. It is an odd result. JMH spent twelve minutes calculating it while I played sudoku on my phone. But maybe something kicked in on my machine.

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@retronym said:
I think it is fair to assume, however, that even if lambdas were marginally slower today, they are unlikely to stay that way after the HotSpot team tunes JIT further.

@scabug
Copy link
Author

@scabug scabug commented Mar 14, 2014

@gkossakowski said:
I agree that Java-style bytecode shape is the future. Just reflecting on claims/assumptions we started from in this ticket.

I just recalled this talk which goes pretty deep into lambda performance vs anon classes: http://medianetwork.oracle.com/video/player/2623576348001. Unsurprisingly, there are no simple answers.

@scabug
Copy link
Author

@scabug scabug commented Mar 15, 2014

Nik (cosmicvisitors) said:
@Grzegorz- Well the 'claims' were substantiated with citations. I mean Brian Goetz said they are much more performant, I just reference his talk. I wonder if they overfitted some performance use case...

@jason- I don't really think something kicking in would've had a large effect, this benchmark uses a single core.
Also, the reason I used lists is because I want to use them idiomatically, the way they will be used.
However I was thinking of doing another benchmark in isolation (using Scala however), but given the bytecode would look the same I think what you've done would be very similar, so thanks for doing it!

@scabug
Copy link
Author

@scabug scabug commented Jul 14, 2014

Thomas Santana (nebulorum) said:
To add to this discussion: http://www.infoq.com/presentations/lambda-invokedynamic
One point that is interesting is that the Java 8 method uses less lock on loaders and dictionaries. Also interesting the interaction of lambdas and serialization.

Another comment is that there is likely to be performance improvements in the future on this strategy, specially if Java Lambdas start to be used all over the place and become more relevant.

@scabug
Copy link
Author

@scabug scabug commented Aug 5, 2014

@gkossakowski said:
The 2.11.2 is out so I'm rescheduling the issue for 2.11.3.

@scabug
Copy link
Author

@scabug scabug commented Oct 26, 2014

hepin (hepin1989) said:
http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood

just a note,you guys could take a look

@scabug
Copy link
Author

@scabug scabug commented Oct 26, 2014

@gkossakowski said:
Hepin,

Thanks for the link. However, that article is wrong. This comment I've made some time ago: http://www.takipiblog.com/compiling-lambda-expressions-scala-vs-java-8/#comment-1206323508 applies to the Infoq article too. If you are interested in understanding the details check out the talk I've given at SoundCloud some time ago: http://vimeo.com/soundcloud/review/105332268/201bc5a9f7
My presentation starts around 28:00.

@scabug
Copy link
Author

@scabug scabug commented Oct 26, 2014

hepin (hepin1989) said:
thanks for that,In fact I am going to watch a talk too,tonight,I will download yours and watch that,why I link that cause after I search via keywords scala invokedynamic,and I found really two different voice,I haven't check that myself,but I will first watch the presentation,and then check it myself ,thanks for that,anyway ,scala will track java platform evolution right?!

@scabug
Copy link
Author

@scabug scabug commented Oct 27, 2014

@gkossakowski said:
Yes, it's part of our roadmap: http://www.scala-lang.org/news/2.12-roadmap

@scabug
Copy link
Author

@scabug scabug commented Nov 4, 2014

@retronym said:
Updating fix-by version to 2.11.5.

@scabug
Copy link
Author

@scabug scabug commented Apr 22, 2015

@retronym said:
I've submitted a major part of this work for 2.11.7: scala/scala#4463

@scabug
Copy link
Author

@scabug scabug commented May 1, 2015

Nik (cosmicvisitors) said:
Nice one Jason!

@scabug scabug added this to the 2.11.7 milestone Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.