Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8262896: [macos_aarch64] Crash in jni_fast_GetLongField #3422

Closed

Conversation

@AntonKozlov
Copy link
Member

@AntonKozlov AntonKozlov commented Apr 9, 2021

Hi, please review a fix for a random crash on macos/aarch64.

By default, GetXXXField JNI Interface implementation is a generated function (-XX:+UseFastJNIAccessors). Usually the function is called by JNI code running in WXExec mode and everything is fine. But sometime we attempt to call it in WXWrite context, like in the stack trace attached to the bug:

v  ~BufferBlob::jni_fast_GetLongField
V  [libjvm.dylib+0x7a6538]  Perf_Detach+0x168
j  jdk.internal.perf.Perf.detach(Ljava/nio/ByteBuffer;)V+0 java.base@17-internal
j  jdk.internal.perf.Perf$CleanerAction.run()V+8 java.base@17-internal
j  jdk.internal.ref.CleanerImpl$PhantomCleanableRef.performCleanup()V+4 java.base@17-internal
j  jdk.internal.ref.PhantomCleanable.clean()V+12 java.base@17-internal
j  jdk.internal.ref.CleanerImpl.run()V+57 java.base@17-internal
j  java.lang.Thread.run()V+11 java.base@17-internal
j  jdk.internal.misc.InnocuousThread.run()V+20 java.base@17-internal
v  ~StubRoutines::call_stub

One way to fix the bug is to ensure WXExec mode before calling GetXXXField, but it depends on finding and fixing all such cases.

This patch instead adds additional actions to GetXXXField implementation to ensure correct W^X mode regardless if it is called from WXWrite or WXExec mode.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8262896: [macos_aarch64] Crash in jni_fast_GetLongField

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3422/head:pull/3422
$ git checkout pull/3422

Update a local copy of the PR:
$ git checkout pull/3422
$ git pull https://git.openjdk.java.net/jdk pull/3422/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3422

View PR using the GUI difftool:
$ git pr show -t 3422

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3422.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 9, 2021

👋 Welcome back akozlov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Apr 9, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@AntonKozlov The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Loading

@openjdk openjdk bot added the hotspot label Apr 9, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 9, 2021

Webrevs

Loading

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Apr 10, 2021

So I'm curious because I don't know the overhead of W^X switching. How much does this gain over turning off UseFastJNIAccessors? I guess it doesn't much matter because you keep track of the current mode and only flip it if you really need to, which is rare?

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 13, 2021

Mailing list message from David Holmes on hotspot-dev:

Hi Anton,

On 10/04/2021 4:32 am, Anton Kozlov wrote:

Hi, please review a fix for a random crash on macos/aarch64.

By default, GetXXXField JNI Interface implementation is a generated function (-XX:+UseFastJNIAccessors). Usually the function is called by JNI code running in WXExec mode and everything is fine. But sometime we attempt to call it in WXWrite context, like in the stack trace attached to the bug:

v ~BufferBlob::jni_fast_GetLongField
V [libjvm.dylib+0x7a6538] Perf_Detach+0x168

I'm struggling to see how we get to the jni_fast_GetLongField from
Perf_Detach ??

j jdk.internal.perf.Perf.detach(Ljava/nio/ByteBuffer;)V+0 java.base at 17-internal
j jdk.internal.perf.Perf$CleanerAction.run()V+8 java.base at 17-internal
j jdk.internal.ref.CleanerImpl$PhantomCleanableRef.performCleanup()V+4 java.base at 17-internal
j jdk.internal.ref.PhantomCleanable.clean()V+12 java.base at 17-internal
j jdk.internal.ref.CleanerImpl.run()V+57 java.base at 17-internal
j java.lang.Thread.run()V+11 java.base at 17-internal
j jdk.internal.misc.InnocuousThread.run()V+20 java.base at 17-internal
v ~StubRoutines::call_stub

One way to fix the bug is to ensure WXExec mode before calling GetXXXField, but it depends on finding and fixing all such cases.

This patch instead adds additional actions to GetXXXField implementation to ensure correct W^X mode regardless if it is called from WXWrite or WXExec mode.

Once again I'm finding it very hard to understand what the actual rules
are for these W^X mode changes and what code requires what. IIUC the
normal(?) mode is WXExec and we use that when executing JITed or other
native code. But if we enter the VM we need to switch to WXWrite mode.
So that is what the PERF_ENTRY (via JVM_ENTRY) does for Perf_Detach. But
for some reason these generated fast-accessors need to be run in WXExec
mode. So IIUC the transitions that are already in place for executing
"VM code" are going to be wrong any time that VM code ends up executing
something like a fast-accessor - is that the case? And what other code
is in the same category as these generated fast-accessors?

Thanks,
David

Loading

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Apr 13, 2021

Looking at this further, the use of W^X for functions like this one, callable from JNI code, makes no sense.
These functions are never written to once they have been generated. Could we generate them during init into a non-executable page, then remap that page as read/execute only? Then we wouldn't have to care about the state of W^X.

Loading

@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 13, 2021

How much does this gain over turning off UseFastJNIAccessors?

The benchmark is AntonKozlov/macos-aarch64-transition-bench@6de6c41.

For loopCnt=1:

Before, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  201397788.701 ± 907059.494  ops/s

Before, -UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  20435101.708 ± 233303.518  ops/s

After, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  151830846.914 ± 654947.292  ops/s

After, -UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  20278690.117 ± 287142.720  ops/s

After with ThreadWXEnable commented out, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  165277289.798 ± 939646.095  ops/s

After with ThreadWXEnable and  JavaThread::thread_from_jni_environment commented out, +UseFastJNIAccessors:
MyBenchmark.testGetField          1  thrpt   25  187846151.217 ± 1014919.707  ops/s

In summary, the fast accessor is ~10x faster now, before the patch.
The code of this patch is not optimal, it introduces ~25% performance penalty, and only ~7% is W^X overhead. Indirection costs ~7% and JavaThread::thread_from_jni_environment is another ~11%. I was going to look at performance side after fixing the annoying failure.

I guess it doesn't much matter because you keep track of the current mode and only flip it if you really need to, which is rare?

In most cases GetXXXField is called from WXExec. So the overhead should be only checking current thread mode.

It's also sad that the cost is constant while we rarely call these accessors from WXWrite.

Loading

@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 13, 2021

Hi David,

v ~BufferBlob::jni_fast_GetLongField
V [libjvm.dylib+0x7a6538] Perf_Detach+0x168

I'm struggling to see how we get to the jni_fast_GetLongField from
Perf_Detach ??

Perf_Detach calls e.g GetDirectBufferAddress

address = env->GetDirectBufferAddress(buffer);
, which calls GetLongField
ret = (void*)(intptr_t)env->GetLongField(buf, directBufferAddressField);

Once again I'm finding it very hard to understand what the actual rules
are for these W^X mode changes and what code requires what.

Sorry for this. Probably there should be a document or comment in the code, but I didn't find a right place for that. But your understanding below is totally correct.

IIUC the normal(?) mode is WXExec and we use that when executing JITed or other native code.

Right, since most of the time we execute generated code.

But if we enter the VM we need to switch to WXWrite mode.

Not a strong necessity, but a choice. It is done to avoid going back and forth between W^X modes while executing VM code, also to avoid many subtle issues similar to this one but in much bigger counts.

for some reason these generated fast-accessors need to be run in WXExec
mode.

It just impossible to execute generated code without switching to WXExec.

So IIUC the transitions that are already in place for executing
"VM code" are going to be wrong any time that VM code ends up executing
something like a fast-accessor - is that the case?

Partially, yes. It's not totally wrong, since VM code still may need to write code cache, i.e. during deoptimization happening randomly.

And what other code
is in the same category as these generated fast-accessors?

There are rather small number of such cases (I found these by grepping WXExec)

  • SafeFetch32/N
  • StubRoutines::UnsafeArrayCopy_stub
  • ProgrammableInvoker::invoke_native

Hope it clarifies minor details.

Loading

@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 13, 2021

Looking at this further, the use of W^X for functions like this one, callable from JNI code, makes no sense.
These functions are never written to once they have been generated. Could we generate them during init into a non-executable page, then remap that page as read/execute only? Then we wouldn't have to care about the state of W^X.

Indeed, this code does not need to be changed after it was generated. But I don't think it will be easy to implement simple remapping (due security restrictions). It would be helpful if we could unlock only a part of the code cache for writing, leaving the other executable, but I don't know immediate way to do this.

Probably I'll convert this to draft. Meanwhile it is still worth to fix the random crash. What do you think about this patch compared to turning off fast JNI accessors?

Loading

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Apr 13, 2021

Indeed, this code does not need to be changed after it was generated. But I don't think it will be easy to implement simple remapping (due security restrictions).

Why not? As far as I can see, the "Allow Unsigned Executable Memory" Hardened Runtime capability allows exactly this. Maybe it'd stop OpenJDK being notarized, but Apple doen't say so.

It would be helpful if we could unlock only a part of the code cache for writing, leaving the other executable, but I don't know immediate way to do this.

Probably I'll convert this to draft. Meanwhile it is still worth to fix the random crash. What do you think about this patch compared to turning off fast JNI accessors?

It depends on performance. I've had a quick look at how pthread_jit_write_protect_np works, and it looks like some serious overhead. Can you measure, say, a few million JNI accessors to see which works best?

Loading

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Apr 13, 2021

It depends on performance. I've had a quick look at how pthread_jit_write_protect_np works, and it looks like some serious overhead. Can you measure, say, a few million JNI accessors to see which works best?

I just had a look, and according to https://dougallj.github.io/applecpu/firestorm.html MSR (APRR) only takes a single cycle, but there is still some overhead because of an ISB. Please try it and see, though.

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 13, 2021

Mailing list message from Andrew Haley on hotspot-dev:

On 4/13/21 11:22 AM, Anton Kozlov wrote:

In summary, the fast accessor is ~10x faster now, before the patch.

OK, so still a clear win. Looks like Apple have made pthread_jit_write_protect_np
extremely fast. I think that what you have here is close to the best we're going
to do.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Loading

@VladimirKempik
Copy link

@VladimirKempik VladimirKempik commented Apr 13, 2021

Why not? As far as I can see, the "Allow Unsigned Executable Memory" Hardened Runtime capability allows exactly this. Maybe it'd stop OpenJDK being notarized, but Apple doen't say so.

AFAIR, this entitlement https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-unsigned-executable-memory is not available on macos_arm64

this entitlement allows you to allocate rwx memory on macos_intel, but on macos_arm64 kernel forbids to allocate rwx memory(withotu MAP_JIT) no matter what.

Loading

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Apr 13, 2021

Why not? As far as I can see, the "Allow Unsigned Executable Memory" Hardened Runtime capability allows exactly this. Maybe it'd stop OpenJDK being notarized, but Apple doen't say so.

AFAIR, this entitlement https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-unsigned-executable-memory is not available on macos_arm64

this entitlement allows you to allocate rwx memory on macos_intel, but on macos_arm64 kernel forbids to allocate rwx memory(withotu MAP_JIT) no matter what.

Fair enough.

Loading

Copy link
Contributor

@theRealAph theRealAph left a comment

Approving here, but if any more of these come up I think we'll need a more general-purpose solution.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 13, 2021

@AntonKozlov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262896: [macos_aarch64] Crash in jni_fast_GetLongField

Reviewed-by: aph, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 52 new commits pushed to the master branch:

  • 55d5649: 8263157: [macos]: java.library.path is being set incorrectly
  • e80012e: 8264768: JFR: Allow events to be printed to the log
  • 3b576ed: 8265100: (fs) WindowsFileStore.hashCode() should read cached hash code once
  • 8df8512: 8265125: IGV: cannot edit forms with NetBeans GUI builder
  • 9cd5400: 8265138: Simplify DerUtils::checkAlg
  • c797511: 8264940: java/lang/invoke/6998541/Test6998541.java failed "guarantee(ik->is_initialized()) failed: java/lang/Byte$ByteCache must be initialized"
  • 943503e: 8265035: Remove unneeded exception check from refill_ic_stubs()
  • fced0f0: 8265113: ProblemList gtest/GTestWrapper.java:os.release_multi_mappings on macosx-aarch64
  • 2aae29c: 8265111: ProblemList java/util/concurrent/locks/Lock/TimedAcquireLeak.java on macosx-aarch64
  • a84d886: 8265112: ProblemList some java/foreign tests on macosx-aarch64
  • ... and 42 more: https://git.openjdk.java.net/jdk/compare/33fa855d5aaa5c5c9bdf2efec1af26ffec5dad51...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @dholmes-ora) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

Loading

@openjdk openjdk bot added the ready label Apr 13, 2021
@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 13, 2021

Thanks! Yes, I also not happy with this from performance side. I'll look in this.

/integrate

Loading

@openjdk openjdk bot added the sponsor label Apr 13, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 13, 2021

@AntonKozlov
Your change (at version 4eac115) is now ready to be sponsored by a Committer.

Loading

Copy link
Member

@dholmes-ora dholmes-ora left a comment

I'm also concerned about the overall approach to W^X, but like Andrew I can accept the current proposal.

I will also sponsor this for you.

Thanks,
David

Loading

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Apr 13, 2021

/sponsor

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 13, 2021

@dholmes-ora @AntonKozlov Since your change was applied there have been 52 commits pushed to the master branch:

  • 55d5649: 8263157: [macos]: java.library.path is being set incorrectly
  • e80012e: 8264768: JFR: Allow events to be printed to the log
  • 3b576ed: 8265100: (fs) WindowsFileStore.hashCode() should read cached hash code once
  • 8df8512: 8265125: IGV: cannot edit forms with NetBeans GUI builder
  • 9cd5400: 8265138: Simplify DerUtils::checkAlg
  • c797511: 8264940: java/lang/invoke/6998541/Test6998541.java failed "guarantee(ik->is_initialized()) failed: java/lang/Byte$ByteCache must be initialized"
  • 943503e: 8265035: Remove unneeded exception check from refill_ic_stubs()
  • fced0f0: 8265113: ProblemList gtest/GTestWrapper.java:os.release_multi_mappings on macosx-aarch64
  • 2aae29c: 8265111: ProblemList java/util/concurrent/locks/Lock/TimedAcquireLeak.java on macosx-aarch64
  • a84d886: 8265112: ProblemList some java/foreign tests on macosx-aarch64
  • ... and 42 more: https://git.openjdk.java.net/jdk/compare/33fa855d5aaa5c5c9bdf2efec1af26ffec5dad51...master

Your commit was automatically rebased without conflicts.

Pushed as commit 283d64f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants