-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86
Comments
Reference jnr/jffi#34. |
Hey @headius, I'm afraid my testing has shown that Critical Natives do not improve performance in primitive-only functions. They are really only a solution for efficiently passing array parameters. I too was incredibly excited at first when I heard about it. In LWJGL we have hundreds of JNI functions and they all are primitive-only. Direct NIO buffers are used as pointers to data, but we only pass and return their addresses to functions, never instances (via
(please confirm this, I would love to be proven wrong) FWIW, I think there's some room for improvement. One experiment I did was to create a custom JDK 9 build that had a hacked version of Critical Natives. Basically, I (naively/dangerously) removed everything that didn't seem absolutely necessary for calling a primitive-only function. For example, it wasn't changing the thread state from Java to native and back. The build worked and I could measure a significant reduction in overhead, almost 40% (from ~9ns to ~5ns for a no-arg function). Anyway, Critical Natives is a nice trick for arrays. It would be great to magically get better performance for primitive-only calls (and native-to-Java upcalls, they're horrible) in Java 8u/9, but it would be hard to justify the engineering cost with Project Panama on the way. |
@Spasi Wow, ok...lots here. I'll address what I can at 2AM :-) My interesting cases for using JavaCritical are probably different from LWJGL's: I want trivial functions like getpid to be closer to their raw C cost; I want to bounce back and forth across that boundary manipulating native structs/pointers with minimal cost; I want to efficiently implement library wrappers that are entirely non-blocking but which depend on rich native structures. Most of the operations I expect to see benefit from this are nearly trivial...JNI overhead is by far the lion's share. Arrays will be a great unexpected bonus. I did not realize that object pinning was a reality in current HotSpot at all, and the ability to actually directly access arrays of primitives will serve us extremely well. Another point of difference is that on your C side, you're calling normal functions in a normal C way. My interest is JNR...using the same endpoint to call an arbitrary number of C functions. Anything I can do to allow users to reduce overhead on what is essentially reflective calls will have an impact. I also have no idea how much chatter LWJGL has across that JNI boundary, but JRuby (and JRuby+Truffle) is moving rapidly toward having many key, core operations implemented entirely atop native functions: IO, filesystem access, potentially crypto and more. I'm definitely aware of what Panama could provide us, and my other I-have-no-time-for-it pet project is to do a Panama backend for jnr-ffi. But Panama may be difficult or impossible to access in Java 9, and there's a whole EG+JSR process needed to even consider it as a public API in 10. We need better options now. FWIW, I'd really love to find some ways to share efforts between LWJGL and JNR. Any incarnation of Panama will require thoughtful consideration of API structure, and us collaborating more would be a great way to figure out what that API should look like for both a real-world project and a low-level tool other projects are built upon. I hope I will have time to hack some critical calls into jffi+jnr-ffi in the near term, but time is a hard stallion to break. I will say that I'm very excited about the possibilities. |
Oh, I forgot an interesting use case we still dream about: implementing the Ruby C extension API so much overhead from the JNI interface. Those would be more "normal" JNI calls, but then we could at least have a fighting chance of running those extensions at a similar speed to the fast-and-loose C Ruby. |
What we have seen is that Critical Natives do not lower the overhead of simple functions like The above perfectly describes what LWJGL does and JNI overhead is a pain for us too. Not in all bindings, but some APIs require frequent, low-complexity calls and any overhead hurts. For example, Vulkan is a much more verbose API than OpenGL.
There are two cases in LWJGL:
The major difference is that JNR does 2 dynamically and in LWJGL it's generated statically, based on a fixed set of supported APIs.
This is the list of bindings we currently support and this is the plan for future bindings. We avoid C++ APIs and C APIs that are heavy on callbacks (too much overhead, Cliff Click mentioned that they're always interpreted?).
Agreed.
That'd be great. The LWJGL design has been driven by what JVMs can do right now. Everything's going to change with Panama (implementation-wise) and Valhalla (API-wise, major type-safety wins with value types and some simplifications with generic specialization). But yes, I'd be glad to share our experience with various native APIs and how to best approach usability and safety issues. |
@headius Struct and pointer operations can be done using This is a case where the GPLv2 (with no linking exception) licensing of Java 9's compilation interface is a problem. If it could be changed by Oracle that would be awesome (they did that for Truffle), but that seems unlikely. |
I've been doing a lot of testing lately and have a few things to report. First, we encountered two bugs related to critical natives and have reported them (with corresponding fixes):
Second, I took the opportunity to weigh some of the overhead in the JNI wrappers. The parts that, by removing them, make a measurable difference:
Removing a few more things (ic check on entry and restoring of CPU control state after the call), brings the total overhead reduction to ~4.66ns. That means a function like Some of the above are scary, others are just annoying (sigh... the DTrace probes). FWIW, I tested a build that removed the above only for JNI functions that were primitive-only and was able to complete the entire LWJGL test and demo suite without any issue. |
@Spasi doesn't look like the 2 bugs you filed will be fixed any time soon. Will this be a concern in using this "feature"? The 2 bug reports indicate it's only used in Solaris (for the JDK) so no issue and deferred. |
We have implemented workarounds in LWJGL for both:
|
Just checked. JDK-8167408 and JDK-8167409 are now marked as resolved/fixed in Java 10. |
While discussing ways to implement #68, user @Spasi opened our eyes to the magic of HotSpot's JavaCritical "Critical Natives" feature described here:
http://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni/36309652#36309652
The potential for jnr-* here is tremendous:
In thinking through the original feature request at #68 and combining it with the whirl of ideas going through my head right now, here's some rough direction...
Basically we'd add new invoke endpoints to jffi that are JavaCritical. When supported and requested by a user (jnr-ffi on up, probably an annotation), we'd use these endpoints to do invocation. Ignoring the function called, we already meet most of the requirements for JavaCritical since most (all?) forms of Foreign.invoke just takes primitive arguments.
This would feed into supporting primitive arrays, since a JavaCritical-assisted FFI-bound function could get at that array directly. This would probably be done via a parameter annotation indicating that the array should be passed through following JavaCritical's requirements, and on the other side our new endpoints would pass it on to the function.
It seems like the initial work to add new JavaCritical endpoints and support for them in jnr-* wouldn't be too bad. It's the first change in a long time that requires rebuilding all our native bits, but there's compelling reasons to go forward.
The text was updated successfully, but these errors were encountered: