Passing heap-allocated byte[]s #68

benalexau · 2016-07-16T12:27:24Z

We are using JNR-FFI to wrap the C LMDB library in LmdbJava. LMDB requires a MDB_val with a size and pointer to the data. This works fine with direct ByteBuffers (where we can fetch the memory address and capacity from the buffer and put it directly into memory we allocated for the struct), but we've had a request to support heap-allocated byte[]s. We can copy the byte[] to a direct buffer, but this has considerable cost and performance is a major consideration of LMDB users.

Is there some way we can fetch the on-heap byte[] address and protect it from being moved during the timespan of an JNR-FFF native call? GetByteArrayElements in JNINativeInterface might be suitable, but I cannot see any information on how to use it. Any suggestions appreciated.

The text was updated successfully, but these errors were encountered:

headius · 2016-09-26T18:08:18Z

There's no way through standard JNI to get a direct reference to memory on the heap. Some of the APIs that allow you to access such memory (such as GetByteArrayElements you mentioned) say that the VM may give you a pinned reference directly into the heap, but that this is not guaranteed. I believe Hotspot, the most-deployed JVM, always chooses to copy instead.

There's JDK improvements coming up that will make it easier to blur the lines between on-heap and off-heap memory, but for the foreseeable future there's no safe way to expose heap memory directly to native code.

phraktle · 2016-09-26T20:08:33Z

Hi @headius,

I believe GetPrimitiveArrayCritical should provide direct access to the array without copying, on HotSpot as well (with the caveat that the operation shouldn't block for long, since that could delay VM housekeeping).

headius · 2016-09-26T20:31:14Z

@phraktle I can't find information as to whether HotSpot will actually pin these days; most discussions of that function are old and refer to the now-defunct, non-moving CMS GC. From what I can gather, it did do this at some point in the past, you pay a locking/unlocking cost in addition to normal JNI overhead, and you still might not actually get the real array anyway. So in the best case, you have to acquire a lock and block the GC and other critical JVM subsystems. In the worst case, you're no better (or worse) than copying the data out.

It's probably worth looking into. If someone would like to do that, I'd be happy to open this and look forward to a PR :-)

phraktle · 2016-09-26T20:44:54Z

The difference in GCs re GetCritical is mostly about whether it completely suspends GC operations or only partially (as is the case with CMS, which is still very much alive, and still works best for many use cases :). Several popular native bindings use GetCritical where performance was a concern (e.g. Netty networking layer or LZ4). Of course a benchmark would be best to demonstrate it.

Not familiar w/ JNR-FFI internals, but if you can outline a sketch of where one would need to apply the hammer, I can take a look.

headius · 2016-09-26T21:20:21Z

@benalexau Thoughts on using GetCritical? I haven't dug into the logic for passing heap byte[] out but perhaps the right direction would be tagging the param as "direct" or "critical" and modifying the value-marshaling logic to use GetCritical for those parameters?

headius · 2016-09-26T21:21:05Z

@phraktle Thanks for the info. I'm not sure the right location myself, at the moment, but you may be on to something. I'll reopen this.

phraktle · 2016-09-27T09:36:41Z

Some more on details on how HotSpot implements Get*Critical without actually locking on the fast-path: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/58d961f47dd4/src/share/vm/memory/gcLocker.hpp#l127

Spasi · 2016-09-27T11:34:28Z

You may want to have a look at Hotspot Critical Natives. A random example from LWJGL's lmdb bindings, here. (compare Java_org_lwjgl_util_lmdb_LMDB_nmdb_1version___3I_3I_3I to JavaCritical_org_lwjgl_util_lmdb_LMDB_nmdb_1version___3I_3I_3I)

Pros:

Works and is as fast as passing ByteBuffer addresses.

Cons:

Undocumented and supported unofficially on Hotspot only.
The critical natives are ignored in C0 (that's why having both standard and critical versions of the same function is required). This is important for functions that are called infrequently with big arrays, you risk always paying the array copy cost.

headius · 2016-09-28T19:55:46Z

👍 from me...this is potentially huge news!

@Spasi That is incredibly interesting! I did not know about JavaCritical, but this could allow us to improve the perf of JNR significantly by allowing users to opt-in to critical function binding. Libraries like jnr-posix could start using it immediately for known non-blocking calls.

So yeah I think this needs to happen, and soon.

headius · 2016-09-28T20:13:26Z

@phraktle Thank you also for the information on locking and critical array references from JNI.

I think we will focus in this issue on the possibility of marking primitive arrays as "critical" or "pass-through" and I will open a separate issue for the more ambitious use of JavaCritical across jnr-*.

headius · 2016-09-28T20:24:01Z

I believe supporting byte[] pass-through will still require new jffi native binaries, so this bug is likely to stall while we work on #86 and jnr/jffi#34 (since we don't want to have to re-build the native bits again later).

@Pinned

Pinning primitive arrays enables access to their contents in native code without requiring copying. The contents of the array is accessed via the GetPrimitiveArrayCritical JNI method. It may be able to pin the object on the heap in some implementations. In G1 (on Hotspot in OpenJDK 11) it currently acquires a lock that prevents garbage collection ("GCLocker") and thus can negatively impact forward progress of mutators if a collection is required. Native methods are thus expected to not hold these arrays for long, as is documented in JNI docs [1] and in the @Pinned javadocs. In the case that a VM is unable for some reason to avoid a copy, GetPrimitiveArrayCritical will return a copy. This will cause test failure: I've included a test to verify that pinning 'works' - this is the best strategy I've come up with to verify that GetPrimitiveArrayCritical is being called as a result of the param annotation + flag. Access to some sort of counter, if available, might be preferrable, but I'm unaware any such portable and accessible counters. Fortunately, OpenJDK 11, OpenJ9's JDK 11-compatible distribution both seem to support pinning, as does Dalvik/ART. Given that all the popular implementations seem to support pinning, I've left the test enabled. Performance benefits of pinning is significant: the attached test displays this, as does the example at bkgood/jnr-ffi-array-pinning-tests. I embarked on this due to surprisingly limited performance I got in a WIP libsnappy binding: with copying (including judicial use of @in and @out) I get about 90 MB/s but enabling array pinning gets it closer to 250 MB/s. This was largely implemented in jnr/jffi@a61b1fc42aa7; the requisite flag just Fixes jnr#68. [1] https://docs.oracle.com/en/java/javase/11/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical

@in

Pinning primitive arrays enables access to their contents in native code without requiring copying. The contents of the array is accessed via the GetPrimitiveArrayCritical JNI method. It may be able to pin the object on the heap in some implementations. In G1 (on Hotspot in OpenJDK 11) it currently acquires a lock that prevents garbage collection ("GCLocker") and thus can negatively impact forward progress of mutators if a collection is required. Native methods are thus expected to not hold these arrays for long, as is documented in JNI docs [1] and in the `@Pinned` javadocs. In the case that a VM is unable for some reason to avoid a copy, GetPrimitiveArrayCritical will return a copy. This will cause test failure: I've included a test to verify that pinning 'works' - this is the best strategy I've come up with to verify that GetPrimitiveArrayCritical is being called as a result of the param annotation + flag. Access to some sort of counter, if available, might be preferrable, but I'm unaware any such portable and accessible counters. Fortunately, OpenJDK 11, OpenJ9's JDK 11-compatible distribution both seem to support pinning, as does Dalvik/ART. Given that all the popular implementations seem to support pinning, I've left the test enabled. Performance benefits of pinning is significant: the attached test displays this, as does the example at bkgood/jnr-ffi-array-pinning-tests. I embarked on this due to surprisingly limited performance I got in a WIP libsnappy binding: with copying (including judicial use of @in and @out) I get about 90 MB/s but enabling array pinning gets it closer to 250 MB/s. This was largely implemented in jnr/jffi@a61b1fc42aa7; the requisite flag just Fixes jnr#68. [1] https://docs.oracle.com/en/java/javase/11/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical

Pinning primitive arrays enables access to their contents in native code without requiring copying. The contents of the array is accessed via the GetPrimitiveArrayCritical JNI method. It may be able to pin the object on the heap in some implementations. In G1 (on Hotspot in OpenJDK 11) it currently acquires a lock that prevents garbage collection ("GCLocker") and thus can negatively impact forward progress of mutators if a collection is required. Native methods are thus expected to not hold these arrays for long, as is documented in JNI docs [1] and in the `@Pinned` javadocs. In the case that a VM is unable for some reason to avoid a copy, GetPrimitiveArrayCritical will return a copy. This will cause test failure: I've included a test to verify that pinning 'works' - this is the best strategy I've come up with to verify that GetPrimitiveArrayCritical is being called as a result of the param annotation + flag. Access to some sort of counter, if available, might be preferrable, but I'm unaware any such portable and accessible counters. Fortunately, OpenJDK 11, OpenJ9's JDK 11-compatible distribution both seem to support pinning, as does Dalvik/ART. Given that all the popular implementations seem to support pinning, I've left the test enabled. Performance benefits of pinning is significant: the attached test displays this, as does the example at bkgood/jnr-ffi-array-pinning-tests. I embarked on this due to surprisingly limited performance I got in a WIP libsnappy binding: with copying (including judicial use of `@In` and `@Out`) I get about 90 MB/s but enabling array pinning gets it closer to 250 MB/s. This was largely implemented in jnr/jffi@a61b1fc42aa7; the requisite flag just Fixes jnr#68. [1] https://docs.oracle.com/en/java/javase/11/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical

Pinning primitive arrays enables access to their contents in native code without requiring copying. The contents of the array is accessed via the GetPrimitiveArrayCritical JNI method. It may be able to pin the object on the heap in some implementations. In G1 (on Hotspot in OpenJDK 11) it currently acquires a lock that prevents garbage collection ("GCLocker") and thus can negatively impact forward progress of mutators if a collection is required. Native methods are thus expected to not hold these arrays for long, as is documented in JNI docs [1] and in the `@Pinned` javadocs. In the case that a VM is unable for some reason to avoid a copy, GetPrimitiveArrayCritical will return a copy. This will cause test failure: I've included a test to verify that pinning 'works' - this is the best strategy I've come up with to verify that GetPrimitiveArrayCritical is being called as a result of the param annotation + flag. Access to some sort of counter, if available, might be preferrable, but I'm unaware any such portable and accessible counters. Fortunately, OpenJDK 11, OpenJ9's JDK 11-compatible distribution both seem to support pinning, as does Dalvik/ART. Given that all the popular implementations seem to support pinning, I've left the test enabled. Performance benefits of pinning is significant: the attached test displays this, as does the example at bkgood/jnr-ffi-array-pinning-tests. I embarked on this due to surprisingly limited performance I got in a WIP libsnappy binding: with copying (including judicious use of `@In` and `@Out`) I get about 90 MB/s but enabling array pinning gets it closer to 250 MB/s. This was largely implemented in jnr/jffi@a61b1fc42aa7; the requisite flag just needs to make it to the native stubs. Fixes jnr#68. [1] https://docs.oracle.com/en/java/javase/11/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical

Pinning primitive arrays enables access to their contents in native code without requiring copying. The contents of the array is accessed via the GetPrimitiveArrayCritical JNI method. Different JVMs and different JVM configurations have different implementations and consequences of this method. In G1 (on Hotspot in OpenJDK 11) it currently acquires a lock that prevents garbage collection ("GCLocker") and thus can negatively impact forward progress of mutators if a collection is required. Native methods are thus expected to not hold these arrays for long, as is documented in JNI docs [1] and in the `@Pinned` javadocs. In the case that a VM is unable for some reason to avoid a copy, GetPrimitiveArrayCritical will return a copy. This will cause test failure: I've included a test to verify that pinning 'works': this is the best strategy I've come up with to verify that GetPrimitiveArrayCritical is being called as a result of the param annotation + flag. Access to some sort of counter, if available, might be preferrable but I'm unaware any such portable and accessible counters. Fortunately, OpenJDK 11 and OpenJ9's JDK 11-compatible distribution both seem to support pinned access, as does Dalvik/ART. Given that all the popular implementations seem to support pinning, I've left the test enabled. Performance benefits of pinning is significant: the attached test displays this, as does the example at bkgood/jnr-ffi-array-pinning-tests. I embarked on this due to surprisingly limited performance I got in a WIP libsnappy binding: with copying (including judicious use of `@In` and `@Out`) I get about 90 MB/s but enabling array pinning gets it closer to 250 MB/s. This was largely implemented in jnr/jffi@a61b1fc42aa7; the requisite flag just needs to make it to the native stubs. Fixes jnr#68. [1] https://docs.oracle.com/en/java/javase/11/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical

headius · 2021-02-23T18:46:25Z

Fixed by @bkgood in #219, for at least the asm-generated stub logic. We will look at getting releases out based on this change soon.

benalexau mentioned this issue Jul 16, 2016

add Dbi<byte[]> support lmdbjava/lmdbjava#3

Closed

headius closed this as completed Sep 26, 2016

headius reopened this Sep 26, 2016

This was referenced Sep 28, 2016

Utilize critical jffi invokes to bind compatible functions and pass-through primitives #86

Open

Implement critical JNI endpoints and primitive array pass-through jnr/jffi#34

Open

bkgood mentioned this issue Feb 21, 2021

Pass array pinning flag down to jffi #219

Merged

headius closed this as completed in #219 Feb 23, 2021

headius added this to the 2.2.2 milestone Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing heap-allocated byte[]s #68

Passing heap-allocated byte[]s #68

benalexau commented Jul 16, 2016

headius commented Sep 26, 2016

phraktle commented Sep 26, 2016

headius commented Sep 26, 2016

phraktle commented Sep 26, 2016 •

edited

Loading

headius commented Sep 26, 2016

headius commented Sep 26, 2016

phraktle commented Sep 27, 2016

Spasi commented Sep 27, 2016 •

edited

Loading

headius commented Sep 28, 2016

headius commented Sep 28, 2016

headius commented Sep 28, 2016

headius commented Feb 23, 2021

Passing heap-allocated byte[]s #68

Passing heap-allocated byte[]s #68

Comments

benalexau commented Jul 16, 2016

headius commented Sep 26, 2016

phraktle commented Sep 26, 2016

headius commented Sep 26, 2016

phraktle commented Sep 26, 2016 • edited Loading

headius commented Sep 26, 2016

headius commented Sep 26, 2016

phraktle commented Sep 27, 2016

Spasi commented Sep 27, 2016 • edited Loading

headius commented Sep 28, 2016

headius commented Sep 28, 2016

headius commented Sep 28, 2016

headius commented Feb 23, 2021

phraktle commented Sep 26, 2016 •

edited

Loading

Spasi commented Sep 27, 2016 •

edited

Loading