Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

headius · 2016-09-28T20:17:50Z

While discussing ways to implement #68, user @Spasi opened our eyes to the magic of HotSpot's JavaCritical "Critical Natives" feature described here:

http://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni/36309652#36309652

The potential for jnr-* here is tremendous:

We could make critical-compatible function calls much cheaper...possibly no overhead at all.
We could provide better performance for calls that pass around arrays of primitives, like IO operations.

In thinking through the original feature request at #68 and combining it with the whirl of ideas going through my head right now, here's some rough direction...

Basically we'd add new invoke endpoints to jffi that are JavaCritical. When supported and requested by a user (jnr-ffi on up, probably an annotation), we'd use these endpoints to do invocation. Ignoring the function called, we already meet most of the requirements for JavaCritical since most (all?) forms of Foreign.invoke just takes primitive arguments.

This would feed into supporting primitive arrays, since a JavaCritical-assisted FFI-bound function could get at that array directly. This would probably be done via a parameter annotation indicating that the array should be passed through following JavaCritical's requirements, and on the other side our new endpoints would pass it on to the function.

It seems like the initial work to add new JavaCritical endpoints and support for them in jnr-* wouldn't be too bad. It's the first change in a long time that requires rebuilding all our native bits, but there's compelling reasons to go forward.

headius · 2016-09-28T20:26:32Z

Reference jnr/jffi#34.

Spasi · 2016-09-28T22:54:31Z

Hey @headius,

I'm afraid my testing has shown that Critical Natives do not improve performance in primitive-only functions. They are really only a solution for efficiently passing array parameters.

I too was incredibly excited at first when I heard about it. In LWJGL we have hundreds of JNI functions and they all are primitive-only. Direct NIO buffers are used as pointers to data, but we only pass and return their addresses to functions, never instances (via Unsafe, we do not call JNI's GetDirectBufferAddress). Based on the post on stackoverflow I was expecting much lower overhead calling such functions, but unfortunately that is not the case:

Using Critical Natives for functions that accept arrays is indeed much faster, meaning that a function with Java arrays is as fast as the same function with NIO buffers (passed as addresses). This has the benefit that, overall, compute on array + critical native is slightly faster than compute on buffer + standard native (on Java 8 at least, buffers may catch up in 9). Obviously, arrays are also more convenient to use.
Using Critical Natives for primitive-only functions is not any faster than standard JNI. The biggest difference I could measure with JMH was sub-nanosecond (maybe 1-2 CPU cycles). Afaict, critical natives skip work that is already skipped in standard JNI when no jobject/synchronized is involved.

(please confirm this, I would love to be proven wrong)

FWIW, I think there's some room for improvement. One experiment I did was to create a custom JDK 9 build that had a hacked version of Critical Natives. Basically, I (naively/dangerously) removed everything that didn't seem absolutely necessary for calling a primitive-only function. For example, it wasn't changing the thread state from Java to native and back. The build worked and I could measure a significant reduction in overhead, almost 40% (from ~9ns to ~5ns for a no-arg function).

Anyway, Critical Natives is a nice trick for arrays. It would be great to magically get better performance for primitive-only calls (and native-to-Java upcalls, they're horrible) in Java 8u/9, but it would be hard to justify the engineering cost with Project Panama on the way.

headius · 2016-09-29T07:07:53Z

@Spasi Wow, ok...lots here. I'll address what I can at 2AM :-)

My interesting cases for using JavaCritical are probably different from LWJGL's: I want trivial functions like getpid to be closer to their raw C cost; I want to bounce back and forth across that boundary manipulating native structs/pointers with minimal cost; I want to efficiently implement library wrappers that are entirely non-blocking but which depend on rich native structures. Most of the operations I expect to see benefit from this are nearly trivial...JNI overhead is by far the lion's share.

Arrays will be a great unexpected bonus. I did not realize that object pinning was a reality in current HotSpot at all, and the ability to actually directly access arrays of primitives will serve us extremely well.

Another point of difference is that on your C side, you're calling normal functions in a normal C way. My interest is JNR...using the same endpoint to call an arbitrary number of C functions. Anything I can do to allow users to reduce overhead on what is essentially reflective calls will have an impact.

I also have no idea how much chatter LWJGL has across that JNI boundary, but JRuby (and JRuby+Truffle) is moving rapidly toward having many key, core operations implemented entirely atop native functions: IO, filesystem access, potentially crypto and more.

I'm definitely aware of what Panama could provide us, and my other I-have-no-time-for-it pet project is to do a Panama backend for jnr-ffi. But Panama may be difficult or impossible to access in Java 9, and there's a whole EG+JSR process needed to even consider it as a public API in 10. We need better options now.

FWIW, I'd really love to find some ways to share efforts between LWJGL and JNR. Any incarnation of Panama will require thoughtful consideration of API structure, and us collaborating more would be a great way to figure out what that API should look like for both a real-world project and a low-level tool other projects are built upon.

I hope I will have time to hack some critical calls into jffi+jnr-ffi in the near term, but time is a hard stallion to break. I will say that I'm very excited about the possibilities.

headius · 2016-09-29T07:17:16Z

Oh, I forgot an interesting use case we still dream about: implementing the Ruby C extension API so much overhead from the JNI interface. Those would be more "normal" JNI calls, but then we could at least have a fighting chance of running those extensions at a similar speed to the fast-and-loose C Ruby.

Spasi · 2016-09-29T11:45:34Z

My interesting cases for using JavaCritical are probably different from LWJGL's: I want trivial functions like getpid to be closer to their raw C cost; I want to bounce back and forth across that boundary manipulating native structs/pointers with minimal cost; I want to efficiently implement library wrappers that are entirely non-blocking but which depend on rich native structures. Most of the operations I expect to see benefit from this are nearly trivial...JNI overhead is by far the lion's share.

What we have seen is that Critical Natives do not lower the overhead of simple functions like getpid. In fact, we tested functions that do absolutely nothing and there was no real difference between critical and standard JNI.

The above perfectly describes what LWJGL does and JNI overhead is a pain for us too. Not in all bindings, but some APIs require frequent, low-complexity calls and any overhead hurts. For example, Vulkan is a much more verbose API than OpenGL.

Another point of difference is that on your C side, you're calling normal functions in a normal C way. My interest is JNR...using the same endpoint to call an arbitrary number of C functions. Anything I can do to allow users to reduce overhead on what is essentially reflective calls will have an impact.

There are two cases in LWJGL:

Libraries that are bundled with LWJGL as static binaries (e.g. lmdb) are called using normal JNI code.
Libraries loaded dynamically are called using deduplicated JNI methods. Otherwise our native binaries would be massive.

The major difference is that JNR does 2 dynamically and in LWJGL it's generated statically, based on a fixed set of supported APIs.

I also have no idea how much chatter LWJGL has across that JNI boundary, but JRuby (and JRuby+Truffle) is moving rapidly toward having many key, core operations implemented entirely atop native functions: IO, filesystem access, potentially crypto and more.

This is the list of bindings we currently support and this is the plan for future bindings. We avoid C++ APIs and C APIs that are heavy on callbacks (too much overhead, Cliff Click mentioned that they're always interpreted?).

We need better options now.

Agreed.

FWIW, I'd really love to find some ways to share efforts between LWJGL and JNR. Any incarnation of Panama will require thoughtful consideration of API structure, and us collaborating more would be a great way to figure out what that API should look like for both a real-world project and a low-level tool other projects are built upon.

That'd be great. The LWJGL design has been driven by what JVMs can do right now. Everything's going to change with Panama (implementation-wise) and Valhalla (API-wise, major type-safety wins with value types and some simplifications with generic specialization). But yes, I'd be glad to share our experience with various native APIs and how to best approach usability and safety issues.

DemiMarie · 2016-10-07T18:15:09Z

@headius Struct and pointer operations can be done using Unsafe, without entering native code at all. The methods of Unsafe are marked as native, but are actually intrinsics that compile to the same code you would get from a C compiler.

This is a case where the GPLv2 (with no linking exception) licensing of Java 9's compilation interface is a problem. If it could be changed by Oracle that would be awesome (they did that for Truffle), but that seems unlikely.

Spasi · 2016-10-10T01:45:56Z

I've been doing a lot of testing lately and have a few things to report.

First, we encountered two bugs related to critical natives and have reported them (with corresponding fixes):

Second, I took the opportunity to weigh some of the overhead in the JNI wrappers. The parts that, by removing them, make a measurable difference:

GC check before the call, ~0.7ns
DTrace method probes (e.g. this) before and after the call, ~1.5ns
Thread state transitions around the call and safepoint check after the call, ~1.85ns

Removing a few more things (ic check on entry and restoring of CPU control state after the call), brings the total overhead reduction to ~4.66ns. That means a function like getpid could go from ~8.1ns to ~3.5ns. All tests were performed on a Sandy Bridge 3.1GHz (so YMMV) with a fresh build of JDK 9.

Some of the above are scary, others are just annoying (sigh... the DTrace probes). FWIW, I tested a build that removed the above only for JNI functions that were primitive-only and was able to complete the entire LWJGL test and demo suite without any issue.

ghost · 2016-12-04T09:59:15Z

@Spasi doesn't look like the 2 bugs you filed will be fixed any time soon. Will this be a concern in using this "feature"? The 2 bug reports indicate it's only used in Solaris (for the JDK) so no issue and deferred.

Spasi · 2016-12-04T10:14:12Z

We have implemented workarounds in LWJGL for both:

For JDK-8167408: LWJGL/lwjgl3@234f169 (exports functions without __stdcall decorations on Windows x86)
For JDK-8167409: LWJGL/lwjgl3@ee39d2a (disables Critical Natives on problematic function signatures only, on Linux & macOS only)

chrisvest · 2018-05-03T15:09:17Z

Just checked. JDK-8167408 and JDK-8167409 are now marked as resolved/fixed in Java 10.

headius changed the title ~~Implement critical JNI endpoints and pass-through primitives~~ Utilize critical jffi invokes to bind compatible functions and pass-through primitives Sep 28, 2016

This was referenced Sep 28, 2016

Implement critical JNI endpoints and primitive array pass-through jnr/jffi#34

Open

Passing heap-allocated byte[]s #68

Closed

headius added this to the 2.2.0 milestone Sep 28, 2016

vlsi mentioned this issue Nov 29, 2017

Bindings for Java via "Critical Natives" erthink/t1ha#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

headius commented Sep 28, 2016 •

edited

headius commented Sep 28, 2016

Spasi commented Sep 28, 2016

headius commented Sep 29, 2016

headius commented Sep 29, 2016

Spasi commented Sep 29, 2016

DemiMarie commented Oct 7, 2016

Spasi commented Oct 10, 2016

ghost commented Dec 4, 2016

Spasi commented Dec 4, 2016

chrisvest commented May 3, 2018

Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

Comments

headius commented Sep 28, 2016 • edited

headius commented Sep 28, 2016

Spasi commented Sep 28, 2016

headius commented Sep 29, 2016

headius commented Sep 29, 2016

Spasi commented Sep 29, 2016

DemiMarie commented Oct 7, 2016

Spasi commented Oct 10, 2016

ghost commented Dec 4, 2016

Spasi commented Dec 4, 2016

chrisvest commented May 3, 2018

headius commented Sep 28, 2016 •

edited