Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static MethodHandle performance worse on native-image by factors in comparison to an OpenJDK baseline. #6487

Open
SergejIsbrecht opened this issue Apr 25, 2023 · 5 comments

Comments

@SergejIsbrecht
Copy link

SergejIsbrecht commented Apr 25, 2023

Describe the issue
During a benchmark I noticed high numbers for a slow path within our codebase. It uses static MethodHandles.

OpenJDK 17 amd64

Benchmark                                                      Mode  Cnt   Score   Error   Units
MethodHandleBench.baseline                                     avgt    2   0.491           ns/op
MethodHandleBench.baseline:·gc.alloc.rate                      avgt    2  ≈ 10⁻⁴          MB/sec
MethodHandleBench.baseline:·gc.alloc.rate.norm                 avgt    2  ≈ 10⁻⁸            B/op
MethodHandleBench.baseline:·gc.count                           avgt    2     ≈ 0          counts
-
MethodHandleBench.methodHandleStatic                           avgt    2   4.536           ns/op
MethodHandleBench.methodHandleStatic:·gc.alloc.rate            avgt    2  ≈ 10⁻⁴          MB/sec
MethodHandleBench.methodHandleStatic:·gc.alloc.rate.norm       avgt    2  ≈ 10⁻⁷            B/op
MethodHandleBench.methodHandleStatic:·gc.count                 avgt    2     ≈ 0          counts
-
MethodHandleBench.methodHandleStaticFinal                      avgt    2   6.053           ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate       avgt    2  ≈ 10⁻⁴          MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate.norm  avgt    2  ≈ 10⁻⁷            B/op
MethodHandleBench.methodHandleStaticFinal:·gc.count            avgt    2     ≈ 0          counts

(lower values are better)

native-image JDK17 amd64

Benchmark                                                 Mode  Cnt    Score    Error   Units
MethodHandleBench.baseline                                avgt    5    1.132 ±  0.051   ns/op
MethodHandleBench.baseline:·gc.alloc.rate                 avgt    5      ≈ 0           MB/sec
MethodHandleBench.baseline:·gc.count                      avgt    5      ≈ 0           counts
-
MethodHandleBench.methodHandleStatic                      avgt    5  439.598 ± 12.781   ns/op
MethodHandleBench.methodHandleStatic:·gc.alloc.rate       avgt    5      ≈ 0           MB/sec
MethodHandleBench.methodHandleStatic:·gc.count            avgt    5  126.000           counts
MethodHandleBench.methodHandleStatic:·gc.time             avgt    5  235.000               ms
-
MethodHandleBench.methodHandleStaticFinal                 avgt    5  440.528 ± 19.104   ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate  avgt    5      ≈ 0           MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.count       avgt    5  125.000           counts
MethodHandleBench.methodHandleStaticFinal:·gc.time        avgt    5  238.000               ms

(lower values are better)

It seems MethodHandles are quite slower (440ns/op vs 5ns/op) in the native-image implementation compared to a OpenJDK baseline. The difference is significant, if invoked a lot.

Further flamegraphs are included in the repository.

Steps to reproduce the issue
Please include both build steps as well as run steps

  1. git clone https://github.com/SergejIsbrecht/Bench_MethodHandle.git
  2. see README.md to build and run on OpenJDK and native-image

Describe GraalVM and your environment:

  • GraalVM version GraalVM CE 23.1.0-dev-20230414_0206
  • JDK major version: JDK17
  • OS: Ubuntu 20.04 - 5.x
  • Architecture: AMD64

More details

See build.gradke.kts -> graalvmNative for used native-image command line options:

            buildArgs.add("--verbose")
            buildArgs.add("--no-fallback")
            // buildArgs.add("-g")
            // buildArgs.add("-H:-DeleteLocalSymbols")
            buildArgs.add("-H:IncludeResources=.*/BenchmarkList")
            buildArgs.add("-H:Log=registerResource:verbose")
            buildArgs.add("--initialize-at-build-time=org.openjdk.jmh.infra,org.openjdk.jmh.util.Utils,org.openjdk.jmh.runner.InfraControl,org.openjdk.jmh.runner.InfraControlL0,org.openjdk.jmh.runner.InfraControlL1,org.openjdk.jmh.runner.InfraControlL2,org.openjdk.jmh.runner.InfraControlL3,org.openjdk.jmh.runner.InfraControlL4")
            buildArgs.add("-H:-SpawnIsolates")
            buildArgs.add("-H:+UseSerialGC")
            buildArgs.add("-H:InitialCollectionPolicy=BySpaceAndTime")
            buildArgs.add("-H:AlignedHeapChunkSize=524288")
            buildArgs.add("-H:+ReportExceptionStackTraces")
            buildArgs.add("--enable-monitoring=all")
            buildArgs.add("-J-Xmx20g")
            buildArgs.add("-march=native")
``
@SergejIsbrecht
Copy link
Author

SergejIsbrecht commented Apr 25, 2023

It seems MethodHandleUtils::cast is used two times from different stack-frames. It results in 36.3% of the overall cost.

method_handle_utils_cast

sun.invoke.util.Wrapper has an aggregated cost of 37.6%

method_handles_wrapper

@SergejIsbrecht
Copy link
Author

SergejIsbrecht commented Apr 25, 2023

Interestingly this seems to be an problem with primitive return types. In contrast returning an Object seems to be invoked directly and therefore quite fast.

MethodHandleBench.mh_plainObject_staticFinal                     avgt    5    3.750 ±  0.106   ns/op
MethodHandleBench.mh_plainObject_staticFinal:·gc.alloc.rate      avgt    5      ≈ 0           MB/sec
MethodHandleBench.mh_plainObject_staticFinal:·gc.count           avgt    5      ≈ 0           counts
final class Calls {
  static final MethodHandle mh_plainLong_staticFinal;
  static final MethodHandle mh_staticPlainLong_staticFinal;
  static final MethodHandle mh_plainObject_staticFinal;

  static {
    try {
      mh_plainLong_staticFinal = MethodHandles.lookup().findVirtual(Calls.class, "plainLong", MethodType.methodType(long.class));
      mh_plainObject_staticFinal = MethodHandles.lookup().findVirtual(Calls.class, "plainObject", MethodType.methodType(Object.class));
      mh_staticPlainLong_staticFinal = MethodHandles.lookup().findStatic(Calls.class, "staticPlainLong", MethodType.methodType(long.class));
    } catch (NoSuchMethodException | IllegalAccessException e) {
      throw new RuntimeException(e);
    }
  }

  private final Object object1 = new Object();

  public long plainLong() {
    return 42L;
  }

  public Object plainObject() {
    return object1;
  }
}
  @Benchmark
  @OutputTimeUnit(TimeUnit.NANOSECONDS)
  @BenchmarkMode(Mode.AverageTime)
  public void mh_plainObject_staticFinal(Blackhole blackhole) throws Throwable {
    blackhole.consume(Calls.mh_plainObject_staticFinal.invoke(calls));
  }

mh_plainObject

@SergejIsbrecht
Copy link
Author

GraalVM CE 23.1.0-dev-20230427_0251-java20-linux-amd64-dev

Benchmark                                                 Mode  Cnt    Score   Error   Units
MethodHandleBench.methodHandleStaticFinal                 avgt    5  318.454 ± 5.598   ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate  avgt    5      ≈ 0          MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.count       avgt    5  174.000          counts
MethodHandleBench.methodHandleStaticFinal:·gc.time        avgt    5  372.000              ms

GraalVM CE 23.1.0-dev-20230427_0251-java17-linux-amd64-dev

Benchmark                                                 Mode  Cnt    Score    Error   Units
MethodHandleBench.methodHandleStaticFinal                 avgt    5  443.662 ± 30.415   ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate  avgt    5      ≈ 0           MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.count       avgt    5  125.000           counts
MethodHandleBench.methodHandleStaticFinal:·gc.time        avgt    5  241.000               ms

There seems to be an improvement, when using JDK20 in comparision to JDK17.

@fernando-valdez fernando-valdez self-assigned this May 11, 2023
@fernando-valdez
Copy link
Member

Thanks for reporting those findings. I will try to reproduce those results. If I have any questions I will let you know.

@SergejIsbrecht
Copy link
Author

native-image 21.0.2 2024-01-16
GraalVM Runtime Environment Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30)
Substrate VM Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS, serial gc, compressed references)
Benchmark                                                        Mode  Cnt      Score    Error   Units
MethodHandleBench.baseline                                       avgt    5      1.163 ±  0.006   ns/op
MethodHandleBench.baseline:·gc.alloc.rate                        avgt    5        ≈ 0           MB/sec
MethodHandleBench.baseline:·gc.count                             avgt    5        ≈ 0           counts
MethodHandleBench.methodHandleStaticFinal                        avgt    5    214.005 ± 22.452   ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate         avgt    5        ≈ 0           MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.count              avgt    5  11180.000           counts
MethodHandleBench.methodHandleStaticFinal:·gc.time               avgt    5   1401.000               ms
MethodHandleBench.mh_plainObject_staticFinal                     avgt    5      1.835 ±  0.010   ns/op
MethodHandleBench.mh_plainObject_staticFinal:·gc.alloc.rate      avgt    5        ≈ 0           MB/sec
MethodHandleBench.mh_plainObject_staticFinal:·gc.count           avgt    5        ≈ 0           counts
MethodHandleBench.mh_staticPlainLong_staticFinal                 avgt    5    197.652 ±  0.404   ns/op
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.alloc.rate  avgt    5        ≈ 0           MB/sec
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.count       avgt    5  11213.000           counts
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.time        avgt    5   2015.000               ms

native-image 21.0.2 2024-01-16
GraalVM Runtime Environment GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30)
Substrate VM GraalVM CE 21.0.2+13.1 (build 21.0.2+13, serial gc)
Benchmark                                                        Mode  Cnt      Score   Error   Units
MethodHandleBench.baseline                                       avgt    5      1.173 ± 0.055   ns/op
MethodHandleBench.baseline:·gc.alloc.rate                        avgt    5        ≈ 0          MB/sec
MethodHandleBench.baseline:·gc.count                             avgt    5        ≈ 0          counts
MethodHandleBench.methodHandleStaticFinal                        avgt    5    329.430 ± 0.613   ns/op
MethodHandleBench.methodHandleStaticFinal:·gc.alloc.rate         avgt    5        ≈ 0          MB/sec
MethodHandleBench.methodHandleStaticFinal:·gc.count              avgt    5  13464.000          counts
MethodHandleBench.methodHandleStaticFinal:·gc.time               avgt    5   1917.000              ms
MethodHandleBench.mh_plainObject_staticFinal                     avgt    5      3.681 ± 0.031   ns/op
MethodHandleBench.mh_plainObject_staticFinal:·gc.alloc.rate      avgt    5        ≈ 0          MB/sec
MethodHandleBench.mh_plainObject_staticFinal:·gc.count           avgt    5        ≈ 0          counts
MethodHandleBench.mh_staticPlainLong_staticFinal                 avgt    5    309.755 ± 3.340   ns/op
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.alloc.rate  avgt    5        ≈ 0          MB/sec
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.count       avgt    5  11697.000          counts
MethodHandleBench.mh_staticPlainLong_staticFinal:·gc.time        avgt    5   1904.000              ms

MethodHandles with primitive are still kind of slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants