-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync #18238
8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync #18238
Conversation
👋 Welcome back rrich! A progress list of the required criteria for merging this PR into |
@reinrich The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/label add hotspot |
@reinrich |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote in JBS, shouldn't this be handled by ThreadInVMfromNative
?
(I wanted to publish the PR before answering your comment) This would be reasonable in my opinion. While I agree I'd prefer to do it as a separate enhancement. |
// WXWrite is needed before entering the vm below and in callee methods. | ||
MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, THREAD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you placed this here to cover the transition inside create_classes_array
and the immediate one at line 170, but doesn't this risk having the wrong WX state for code in between?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked this myself (after making the change).
Being in WXWrite
mode would be wrong if the thread would execute dynamically generated code. There's not too much happening outside the scope of the ThreadInVMfromNative
instances. I see jni calls (GetObjectArrayElement
, ExceptionOccurred
) and a jvmti call (RetransformClasses
) but these are safe because the callees enter the vm right away. We even avoid switching to WXWrite
and back there.
So I thought it would be ok to coarsen the WXMode switching.
But maybe it's still better to avoid any risk especially since there's likely no performance effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or could the ThreadInVMfromNative tvmfn(THREAD);
in check_exception_and_log
be move out to JfrJvmtiAgent::retransform_classes
? And then only use one ThreadInVMfromNative
in JfrJvmtiAgent::retransform_classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require hoisting the ThreadInVMfromNative
out of the loop with the check_exception_and_log
call but then the thread would be in _thread_in_vm
when doing the GetObjectArrayElement
jni call which would be wrong.
@MBaesken found 2 more locations in jvmti that need switching to JvmtiExport::get_jvmti_interface Both use |
@reinrich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 5 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Should we address those 2 more findings in this PR ? Or open a separate JBS issue ? btw those were the jtreg tests triggering the 2 additional findings / asserts runtime/Thread/AsyncExceptionOnMonitorEnter.java |
I noticed a few more asserts (assert(_wx_state == expected) failed: wrong state) in the jfr area (jdk tier3 jfr tests).
and
|
I'm leaning towards fixing all locations in this PR even though this will prevent clean backports. Would that be ok? |
I think this is ok. |
|
I agree. This is something I am investigating at the moment. Ideally, AssertWXAtThreadSync would also be true by default. |
I've added a bunch more locations we've seen when testing with AssertWXAtThreadSync. @tobiasholenstein would you think that this PR is actually not needed because you are going to push a better way of handling the WXMode in the near future? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel that this should be fixed inside the ThreadInVMfromNative
transition - the number of callsites that need it just reinforces that for me. Granted we then need to look at where we would now have redundant calls.
That said we have had a lot of people looking at the overall WX state management logic in the past week or so due to https://bugs.openjdk.org/browse/JDK-8327860. The workaround there requires us to be in EXEC mode and there are a number of folk who are questioning why "we" chose WRITE mode as the default with a switch to EXEC, instead of EXEC as the default with a switch to WRITE. But whichever way that goes I think the VM state transitions are the places to enforce that choice.
That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?
Hmm. Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades. It's causing problems for us now. Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to? That'd (by definition) reduce the number of transitions to the minimum if we were through. |
@theRealAph that is the mystery we hope will be resolved once we know the nature of the underlying OS bug. Somehow switching to exec mode fixes/works-around the issue. I can imagine a missing conditional to check if the region is MAP_JIT.
The original introducers of WXEnable made the decision that the VM should be in WRITE mode unless it needs EXEC. That is the state we are presently trying to achieve with this change. If that original design choice is wrong then ...
And I've asked about this every time a missing WXEnable has had to be added. We seem to be generically able to describe what kind of code needs which mode, but we seem to struggle to pin it down. Though that is what https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.
Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop. We need to optimise the actual call paths as well as identify specific methods. But all this discussion suggests to me that this PR is not really worth pursuing at this time - IIUC no actual failures are observed other than those pertaining to |
We saw sporadic crashes in our jtreg (maybe also jck?) runs; only later we enabled AssertWXAtThreadSync for more checking. |
The first part we already do. I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation. (Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow).
Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway. |
I see it differently. This PR is just a simple attempt to get clean test runs with AssertWXAtThreadSync (after fixing an actual crash https://bugs.openjdk.org/browse/JDK-8327036). While the violating locations in this PR might be unlikely to produce actual crashes I think it is worthwhile to have clean testing with AssertWXAtThreadSync because this will help prevent regressions that are more likely. Beyond the trivial fixes of this PR I'm very much in favor of further enhancements as the aforementioned https://bugs.openjdk.org/browse/JDK-8307817. |
I agree. Fixing the current state with this PR makes sense to me. Changing the logic of W^X will take more time and discussion. So from my point of view this PR is ready and should be integrated. If no-one disagrees I will approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this patch makes sense, and does not compete with a long-term solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Mailing list message from dean.long at oracle.com on hotspot-dev: On 3/19/24 8:20 AM, Thomas Stuefe wrote:
That makes sense if the WX state is part of the signal context saved and dl |
Exactly. You do the transition when it's needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ideal but it fixes a real problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Tests with AssertWXAtThreadSync are clean now. Thanks! |
Going to push as commit e41bc42.
Your commit was automatically rebased without conflicts. |
Updated (2024-03-20):
This PR adds switching to
WXWrite
mode before entering the vm where it is missing.With the changes the following jtreg tests succeed with AssertWXAtThreadSync enabled.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18238/head:pull/18238
$ git checkout pull/18238
Update a local copy of the PR:
$ git checkout pull/18238
$ git pull https://git.openjdk.org/jdk.git pull/18238/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18238
View PR using the GUI difftool:
$ git pr show -t 18238
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18238.diff
Webrev
Link to Webrev Comment