Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254693: Add Panama feature to pass heap segments to native code #16201

Closed
wants to merge 52 commits into from

Conversation

JornVernee
Copy link
Member

@JornVernee JornVernee commented Oct 16, 2023

Add the ability to pass heap segments to native code. This requires using Linker.Option.critical(true) as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call.

The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive.

Components of this patch:

  • New binding operator SegmentBase, which gets the base object of a MemorySegment.
  • Rename UnboxAddress to SegmentOffset. Add flag to specify whether processing heap segments should be allowed.
  • CallArranger impls use new binding operators when Linker.Option.critical(/* allowHeap= */ true) is specified.
  • NativeMethodHandle/NativeEntryPoint allow Object in their signatures.
  • The object/oop + offset is exposed as temporary address to native code.
  • Since we stay in the _thread_in_Java state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code).
  • Only x64 and AArch64 for now.
  • I've refactored ArgumentShuffle in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using CallingConventionClosure), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to ArgumentShuffle, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: d2b40f1
  • I've factored out DowncallStubGenerator in the x64 and AArch64 code to use a common DowncallLinker::StubGenerator.
  • Fallback linker is also supported using JNI's GetPrimitiveArrayCritical/ReleasePrimitiveArrayCritical

Aside: fixed existing issue with DowncallLinker not properly acquiring segments in interpreted mode.

Numbers for the included benchmark on my machine are:

Benchmark                     (size)  Mode  Cnt       Score       Error  Units
CriticalCalls.callNotPinned      100  avgt   30      84.818 �     0.729  ns/op
CriticalCalls.callNotPinned    10000  avgt   30    2966.918 �    39.898  ns/op
CriticalCalls.callNotPinned  1000000  avgt   30  952864.052 � 34996.156  ns/op
CriticalCalls.callPinned         100  avgt   30      30.640 �     0.173  ns/op
CriticalCalls.callPinned       10000  avgt   30    2241.403 �    35.473  ns/op
CriticalCalls.callPinned     1000000  avgt   30  221152.247 �  1697.968  ns/op
CriticalCalls.callRecycled       100  avgt   30      40.205 �     0.458  ns/op
CriticalCalls.callRecycled     10000  avgt   30    2845.316 �    13.331  ns/op
CriticalCalls.callRecycled   1000000  avgt   30  287752.178 �  2322.382  ns/op

In particular the difference between the callNotPinned, which allocates a native segment and copies the heap segment into it, and the callPinned which is zero allocation and zero copy, is important. While the allocation can sometimes be avoided (callRecycled), sometimes the API's structure prevents allocations from being amortized.

Testing: jdk_foreign, testing using a fuzzer.


Progress

  • Change requires CSR request JDK-8318645 to be approved
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issues

  • JDK-8254693: Add Panama feature to pass heap segments to native code (Enhancement - P2)
  • JDK-8318645: Add Panama feature to pass heap segments to native code (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201
$ git checkout pull/16201

Update a local copy of the PR:
$ git checkout pull/16201
$ git pull https://git.openjdk.org/jdk.git pull/16201/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16201

View PR using the GUI difftool:
$ git pr show -t 16201

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16201.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 16, 2023

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 16, 2023

@JornVernee The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Oct 16, 2023
String fName, FunctionDescriptor fDesc) {}

@Test(dataProvider = "allowHeapCases")
public void testAllowHeap(AllowHeapCase testCase) throws Throwable {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is the only new test. The diff looks bigger because I've renamed the enclosing directory.

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New javadoc note looks good

Comment on lines +181 to +182
mh_j, entry, out_sig_bt, total_out_args, ret_type,
abi, conv, needs_return_buffer, checked_cast<int>(ret_buf_size));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we no longer pass the input signature here. This was required by the old ArgumentShuffle code, but not by the new one.

@@ -52,10 +52,12 @@ JNI_ENTRY(jlong, NEP_makeDowncallStub(JNIEnv* env, jclass _unused, jobject metho
GrowableArray<VMStorage> input_regs(pcount);
for (int i = 0, bt_idx = 0; i < pcount; i++) {
oop type_oop = java_lang_invoke_MethodType::ptype(type, i);
assert(java_lang_Class::is_primitive(type_oop), "Only primitives expected");
BasicType bt = java_lang_Class::primitive_type(type_oop);
BasicType bt = java_lang_Class::as_BasicType(type_oop);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now see T_OBJECT here as well, so we just look at the basic type.

Comment on lines +57 to +60
oop reg_oop = arg_moves_oop->obj_at(i);
if (reg_oop != nullptr) {
input_regs.push(ForeignGlobals::parse_vmstorage(reg_oop));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some registers are null, which indicates that the corresponding value passed to the downcall stub should not be forwarded to the native call, but is instead used directly in the downcall stub (e.g. the offset of an oop).

Comment on lines +148 to +150
if (signature[i] != T_VOID) {
out_regs.push(as_VMStorage(pair.first(), signature[i]));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only care about registers-sized values. It is safe to ignore upper halves of T_LONG and T_DOUBLE (i.e. T_VOID).

Comment on lines +102 to +110
static void runtime_call(MacroAssembler* _masm, address target) {
__ vzeroupper();
__ mov(r12, rsp); // remember sp
__ subptr(rsp, frame::arg_reg_save_area_bytes); // windows
__ andptr(rsp, -16); // align stack as required by ABI
__ call(RuntimeAddress(target));
__ mov(rsp, r12); // restore sp
__ reinit_heapbase();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor cleanup to share this code for the three use sites below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: r12 does not need to remember?

According to CallingSequences in OpenJDK Wiki, r12 may be reserved for HeapBase if COOP is enabled.
(r12 is also used in another places in downcallLinker_x86_64.cpp without restoring...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean reinit_heapbase can be removed? I'm not sure whether the caller expects it to be preserved.

Note that r12 is used in this case to save and restore rsp. This is needed since we access data in the frame relative to rsp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YaSuenag r12 is restored in reinit_heapbase() if needed and no, r12 does not need remembering because it is a constant and can be restored from somewhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your code is fine. Restoring r12_heapbase at this point is not bad because runtime_call is only for slow paths. I don't think it should be moved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone! I understood r12 is restored in reinit_heapbase(). It is necessary. I have no comments.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Oct 30, 2023
Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hotspot changes look fine.

src/java.base/share/classes/java/lang/foreign/Linker.java Outdated Show resolved Hide resolved
GrowableArray<VMStorage> out_regs = ForeignGlobals::replace_place_holders(_input_registers, locs);
ArgumentShuffle arg_shuffle(filtered_java_regs, out_regs, shuffle_reg);

#ifndef PRODUCT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to exclude the logging in product builds? ArgumentShuffle::print_on() is unconditionally available there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partly historical. The log output is only intended for debugging, not for end-user eyes. So, I think I originally excluded it as a way of trimming fat from the product build.

Either way, ArgumentShuffle::print_on should probably be excluded/included on the same basis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, this seems like something that should be addressed in a separate patch

*/
static Option critical() {
return LinkerOptions.Critical.INSTANCE;
static Option critical(boolean allowHeapAccess) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of public API, I'm surprised to see critical function property conflated with ability to perform on-heap accesses. These aspects look orthogonal to me. Any particular reason not to represent them as 2 separate Options?

Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here?

Copy link
Member Author

@JornVernee JornVernee Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here?

The issue is that most GCs don't support object pinning. I don't think we want an API that only works for some GCs. But, if we do, there's a better API that we can have for just pinning support, which is a MemorySegment::pin(Arena) method that returns a MemorySegment wrapping the pinned array. That would allow doing multiple native calls with just a single pin operation, and also allows embedding pointers to pinned segments in other data structures.

For the current approach where we make the array accessible for the duration of the native call: without pinning support, other GCs would have to use GCLocker. That means that the native call also has to be relatively short-lived, at which point I figured we might as well drop the thread state transition, since that has the same requirement. I.e. we detect that the call is short-lived, and do the optimization ourselves without the hint from the user (critical). This coincidentally also greatly simplifies the implementation. In a prior iteration I did have a separate allowHeap Option that implied critical. But it was suggested to just merge the two together in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand by the current design: a GCLocker-based mechanism (as the current implementation is) needs to have similar restrictions both on-heap access and also removal of state transitions. It's true that a more general notion of pinning is possible, which doesn't necessarily require special support from the linker (because we can turn an heap segment into a native segment by pinning it and then pass that to the linker). But at this point in this support for region-based pinning is not mature enough to justify such an API (and, if we'll ever get to that point, that would not invalidate the critical linker options).

@JornVernee
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 14, 2023

Going to push as commit 9c98270.
Since your change was applied there have been 17 commits pushed to the master branch:

  • c80e691: 8319970: AArch64: enable tests compiler/intrinsics/Test(Long|Integer)UnsignedDivMod.java on aarch64
  • 856c280: 8319960: RISC-V: compiler/intrinsics/TestInteger/LongUnsignedDivMod.java failed with "counts: Graph contains wrong number of nodes"
  • cb7875d: 8318218: RISC-V: C2 CompressBits
  • 1535528: 8318479: [jmh] the test security.CacheBench failed for multiple threads run
  • 95bd92a: 8210807: Printing a JTable with a JScrollPane prints table without rows populated
  • 21cda19: 8309203: C2: remove copy-by-value of GrowableArray for InterfaceSet
  • b120a05: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates
  • 7df73a2: 8318817: Could not reserve enough space in CodeHeap 'profiled nmethods' (0K)
  • 07eaea8: 8303920: Avoid calling out to python in DataDescriptorSignatureMissing test
  • fe0ccdf: 8319640: ClassicFormat::parseObject (from DateTimeFormatter) does not conform to the javadoc and may leak DateTimeException
  • ... and 7 more: https://git.openjdk.org/jdk/compare/03db82818b905f21cb5ad1d56a687e238b4a6e33...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 14, 2023
@openjdk openjdk bot closed this Nov 14, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 14, 2023
@openjdk
Copy link

openjdk bot commented Nov 14, 2023

@JornVernee Pushed as commit 9c98270.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@JornVernee JornVernee deleted the AllowHeapNoLock branch November 14, 2023 11:19
@TheRealMDoerr
Copy link
Contributor

One additional comment on the pinning topic: We may even want to pin objects across several downcalls. One downcall could be used to initiate async I/O and other downcalls check the result. The buffer must be stable in the time between them:
"The buffer area being written out must not be accessed during the operation or undefined results may occur. The memory areas involved must remain valid." https://man7.org/linux/man-pages/man3/aio_write.3.html
(Not sure if we would use on heap memory for that.)

@mrserb
Copy link
Member

mrserb commented Nov 14, 2023

Does the usage of this new API can be checked by something like -Xcheck:jni? Especially a part when the app still doing some upcalls to the JVM from native when the pin is "active".

@JornVernee
Copy link
Member Author

@mrserb Upcalls are blocked unless a thread is in the native thread state 1. So, if an upcall happens from a critical function (which doesn't transition to the native state), the VM will terminate with a fatal error.

@mrserb
Copy link
Member

mrserb commented Nov 15, 2023

Can I assume it will always cause the fatal error, or it is not specified/UB?

@JornVernee
Copy link
Member Author

JornVernee commented Nov 15, 2023

It is not specified, but the current implementation will always cause a fatal error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated