Skip to content

8265292: [macos_aarch64] java/foreign/TestDowncall.java crashes with SIGBUS #3921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AntonKozlov
Copy link
Member

@AntonKozlov AntonKozlov commented May 7, 2021

Please review a fix for the intermittent crash. It is caused by a mistake in the ProgrammableInvoker::invoke_native, the wrong order of W^X and JavaThread state transition. We need WXExec since we are about to call a generated stub. But we need to switch to WXExec only after JavaThread state change. The thread state change may trigger a safepoint, that would need to do bookkeeping in the codecache (MarkActivationClosure::do_code_blob from the bug). So the fix is to change JavaThread state first, then change WX.

The fix was verified with the help of https://bugs.openjdk.java.net/browse/JDK-8266742. The new check catches all test failures reported by 8265292, 8265183, 8265182. I've verified tests pass after the fix with that new check enabled.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8265292: [macos_aarch64] java/foreign/TestDowncall.java crashes with SIGBUS
  • JDK-8265183: [macos_aarch64] java/foreign/TestIntrinsics.java crashes with SIGBUS
  • JDK-8265182: [macos_aarch64] java/foreign/TestUpcall.java crashes with SIGBUS

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3921/head:pull/3921
$ git checkout pull/3921

Update a local copy of the PR:
$ git checkout pull/3921
$ git pull https://git.openjdk.java.net/jdk pull/3921/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3921

View PR using the GUI difftool:
$ git pr show -t 3921

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3921.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 7, 2021

👋 Welcome back akozlov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 7, 2021
@AntonKozlov
Copy link
Member Author

/solves 8265183
/solves 8265182

@openjdk
Copy link

openjdk bot commented May 7, 2021

@AntonKozlov The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 7, 2021
@openjdk
Copy link

openjdk bot commented May 7, 2021

@AntonKozlov
Adding additional issue to solves list: 8265183: [macos_aarch64] java/foreign/TestIntrinsics.java crashes with SIGBUS.

@openjdk
Copy link

openjdk bot commented May 7, 2021

@AntonKozlov
Adding additional issue to solves list: 8265182: [macos_aarch64] java/foreign/TestUpcall.java crashes with SIGBUS.

@mlbridge
Copy link

mlbridge bot commented May 7, 2021

Webrevs

ThreadToNativeFromVM ttnfvm(thread);
MACOS_AARCH64_ONLY(ThreadWXEnable wx(WXExec, thread));
Copy link
Contributor

@theRealAph theRealAph May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a comment here, not just in the bug report. if this confused you, it'll surely be enough to confuse a maintainer. Some think like this, perhaps?

"We need WXExec because we are about to call a generated stub. But we need to switch to WXExec only after JavaThread state change. The thread state change may trigger a safepoint, that would need to do bookkeeping in the codecache. See JDK-8265292."

I'm also wondering if it would be better to enable writes in the methods that actually write to the nmethod, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And incidentally, this seems to be rather error prone. Wouldn't it be simpler to check W^X status in all of the nmethod accessors, and change it when actually required, rather than hoping that it's already in the correct state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added the comment here, and also one near VM entries. I would like to avoid enabling write near writing itself. There are too many such places, and either we'll cover a rather large part of hotspot with them (with trial and fix like here, also having overhead from going back and forth), or we'll start clustering them in bigger write-contexts. The current approach is to have the largest write-context possible: the whole JVM should be able to write. Therefore, the bug is not that we didn't get to WXWrite when we are trying to write, the bug is that we were not in the right state.

Also to @dholmes-ora. Sorry for failing to provide a clear principle in the first place. I really assumed that we were on the same page until reviews of JDK-8262896. Since then I'm preparing a kind of doc, although it's going hard. Thank you for still asking questions :) I'm going to include discussed topics into that doc.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fixes the problem - great, approved.

But this just highlights the limited understanding around W^X and thread states IMO.

Thanks,
David

@openjdk
Copy link

openjdk bot commented May 10, 2021

@AntonKozlov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265292: [macos_aarch64] java/foreign/TestDowncall.java crashes with SIGBUS
8265183: [macos_aarch64] java/foreign/TestIntrinsics.java crashes with SIGBUS
8265182: [macos_aarch64] java/foreign/TestUpcall.java crashes with SIGBUS

Reviewed-by: dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 61 new commits pushed to the master branch:

  • 2d2cd78: 8266761: AssertionError in sun.net.httpserver.ServerImpl.responseCompleted
  • 9c9c47e: 8266813: Shenandoah: Use shorter instruction sequence for checking if marking in progress
  • 0344e75: 8266794: Remove dead code notify_allocation_jvmti_allocation_event
  • 9e6e222: 8266892: avoid maybe-uninitialized gcc warnings on linux s390x
  • 6575566: 8266787: Potential overflow of pointer arithmetic in G1ArchiveAllocator
  • 8468001: 8263452: Javac slow compilation due to algorithmic complexity
  • 67cb22a: 8266601: Fix bugs in AddLNode::Ideal transformations
  • 18e9d28: 8266676: G1: Remove dead code init_node_id_to_index_map()
  • 0e7bdae: 8265062: Remove duplication constant MaxTextureSize
  • 10a049e: 8265956: JVM crashes when matching LShiftVB Node
  • ... and 51 more: https://git.openjdk.java.net/jdk/compare/c665dba591ae5c15c9ca49e14d1aaa4eea38e7ae...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 10, 2021
@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from Andrew Haley on hotspot-dev:

On 5/10/21 6:19 AM, David Holmes wrote:

On Fri, 7 May 2021 15:38:24 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

Please review a fix for the intermittent crash. It is caused by a mistake in the ProgrammableInvoker::invoke_native, the wrong order of W^X and JavaThread state transition. We need WXExec since we are about to call a generated stub. But we need to switch to WXExec only after JavaThread state change. The thread state change may trigger a safepoint, that would need to do bookkeeping in the codecache (MarkActivationClosure::do_code_blob from the bug). So the fix is to change JavaThread state first, then change WX.

The fix was verified with the help of https://bugs.openjdk.java.net/browse/JDK-8266742. The new check catches all test failures reported by 8265292, 8265183, 8265182. I've verified tests pass after the fix with that new check enabled.

If this fixes the problem - great, approved.

But this just highlights the limited understanding around W^X and thread states IMO.

Yes, that's what I was getting at. This looks to me like an instance of
the Temporal Coupling design smell.

@AntonKozlov
Copy link
Member Author

This looks to me like an instance of the Temporal Coupling design smell.

I would agree if we'd change the W^X as necessary before an appropriate action. But we don't, we maintain a property "JVM's runtime (C++) code runs in WXWrite" (now it is specified in one of the comments). And the change restores this property which was just broken inadvertently.

@theRealAph
Copy link
Contributor

This looks to me like an instance of the Temporal Coupling design smell.

I would agree if we'd change the W^X as necessary before an appropriate action. But we don't, we maintain a property "JVM's runtime (C++) code runs in WXWrite" (now it is specified in one of the comments). And the change restores this property which was just broken inadvertently.

Sure, but this code has not been in mainline for very long. I have no objection to this patch being committed, which is obviously necessary, but I agree with David Holmes. If we're quick we can get this fixed before it ever reaches a release.

@AntonKozlov
Copy link
Member Author

OK, thanks! I still don't think there is something needs fixing, but a consensus is necessary, I agree.

/integrate

@openjdk openjdk bot closed this May 18, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 18, 2021
@openjdk
Copy link

openjdk bot commented May 18, 2021

@AntonKozlov Since your change was applied there have been 159 commits pushed to the master branch:

  • fadf580: 8262952: [macos_aarch64] os::commit_memory failure
  • f8f40ab: 8230486: G1BarrierSetAssembler::g1_write_barrier_post unnecessarily pushes/pops new_val
  • 9d168e2: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses
  • 02507bc: 8267166: Remove test file vmTestbase/vm/mlvm/tools/LoadClass.java
  • ce88b33: 8266615: C2 incorrectly folds subtype checks involving an interface array
  • 894547d: 8266897: com/sun/net/httpserver/FilterTest.java fails intermittently with AssertionError
  • da7c846: 8264408: test_oopStorage no longer needs to disable some tests on WIN32
  • f6c2891: 8267229: Split runtime/Metaspace/elastic test configurations for better scalability
  • b60975d: 8267237: ARM32: bad AD file in matcher.cpp after 8266810
  • 905b41a: 8265711: C1: Intrinsify Class.getModifier method
  • ... and 149 more: https://git.openjdk.java.net/jdk/compare/c665dba591ae5c15c9ca49e14d1aaa4eea38e7ae...master

Your commit was automatically rebased without conflicts.

Pushed as commit b92c5a4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants