Skip to content

Conversation

@neethu-prasad
Copy link
Contributor

@neethu-prasad neethu-prasad commented May 17, 2024

Notes
We are spending significant time on acquiring the per-nmethod as all the
threads are in same nmethod.
Adding double-check lock by calling is_armed before lock acquisition.

Verification

Shenendoah

% /home/neethp/Development/opensource/jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
[0.706s][info][gc] GC(0) Concurrent marking roots 11.519ms
[0.752s][info][gc] GC(1) Concurrent marking roots 9.833ms
[0.814s][info][gc] GC(2) Concurrent marking roots 10.000ms
[0.855s][info][gc] GC(3) Concurrent marking roots 9.314ms
[0.895s][info][gc] GC(4) Concurrent marking roots 8.937ms
[1.213s][info][gc] GC(5) Concurrent marking roots 12.582ms
[1.340s][info][gc] GC(6) Concurrent marking roots 9.574ms
[1.465s][info][gc] GC(7) Concurrent marking roots 12.791ms

ZGC

% /home/neethp/Development/opensource/jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots"
[0.732s][info][gc] GC(0) Concurrent marking roots 10.694ms
[0.782s][info][gc] GC(1) Concurrent marking roots 14.614ms
[0.825s][info][gc] GC(2) Concurrent marking roots 12.700ms
[0.863s][info][gc] GC(3) Concurrent marking roots 9.622ms
[0.904s][info][gc] GC(4) Concurrent marking roots 12.892ms
[1.244s][info][gc] GC(5) Concurrent marking roots 12.422ms
[1.375s][info][gc] GC(6) Concurrent marking roots 12.756ms
[1.503s][info][gc] GC(7) Concurrent marking roots 12.265ms
[1.628s][info][gc] GC(8) Concurrent marking roots 12.309ms
[1.754s][info][gc] GC(9) Concurrent marking roots 12.996ms
[1.879s][info][gc] GC(10) Concurrent marking roots 9.416ms

Issue
https://bugs.openjdk.org/browse/JDK-8331911


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8331911: Reconsider locking for recently disarmed nmethods (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19285/head:pull/19285
$ git checkout pull/19285

Update a local copy of the PR:
$ git checkout pull/19285
$ git pull https://git.openjdk.org/jdk.git pull/19285/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19285

View PR using the GUI difftool:
$ git pr show -t 19285

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19285.diff

Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label May 17, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented May 17, 2024

Hi @neethu-prasad, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user neethu-prasad" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented May 17, 2024

@neethu-prasad This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331911: Reconsider locking for recently disarmed nmethods

Reviewed-by: shade, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 82 new commits pushed to the master branch:

  • 4b153e5: 8306580: Propagate CDS dumping errors instead of directly exiting the VM
  • 71a692a: 8321033: Avoid casting Array to GrowableArray
  • 55c7969: 8334765: JFR: Log chunk waste
  • b2930c5: 8334040: jdk/classfile/CorpusTest.java timed out
  • e825ccf: 8332362: Implement os::committed_in_range for MacOS and AIX
  • 5ac2149: 8334299: Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
  • 2e64d15: 8334564: VM startup: fatal error: FLAG_SET_ERGO cannot be used to set an invalid value for NonNMethodCodeHeapSize
  • 9d4a4bd: 8324841: PKCS11 tests still skip execution
  • ca5a438: 8334571: Extract control dependency rewiring out of PhaseIdealLoop::dominated_by() into separate method
  • 05ff318: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572
  • ... and 72 more: https://git.openjdk.org/jdk/compare/c94af6f943c179553d1827550847b93491d47506...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @fisk) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented May 17, 2024

@neethu-prasad The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels May 17, 2024
@neethu-prasad
Copy link
Contributor Author

/covered

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label May 17, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented May 17, 2024

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Some stylistic comments.

#include "runtime/threadWXSetters.inline.hpp"

bool ShenandoahBarrierSetNMethod::nmethod_entry_barrier(nmethod* nm) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and later: no need for new line at the beginning of the method.

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, 2022, Red Hat, Inc. All rights reserved.
* Copyright (c) 2019, 2024, Red Hat, Inc. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update is unnecessary.

bool ZBarrierSetNMethod::nmethod_entry_barrier(nmethod* nm) {

if (!is_armed(nm)) {
log_develop_trace(gc, nmethod)("nmethod: " PTR_FORMAT " visited by entry (disarmed)", p2i(nm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "(disarmed before lock)" to disambiguate against "(disarmed)" later?

Comment on lines 41 to 42
// Some other thread got here first and healed the oops
// and disarmed the nmethod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for the comment (here and later):

    // Some other thread got here first and healed the oops
    // and disarmed the nmethod. No need to continue.

...and then later, under the lock:

    // Some other thread managed to complete while we were
    // waiting for lock. No need to continue.

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels May 23, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label May 23, 2024
@mlbridge
Copy link

mlbridge bot commented May 23, 2024

Webrevs

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good.

Yes, we could have restructured the code so that nmethod_entry_barrier was not called when nmethod is already disarmed. There are already some places where we check it externally, but the reproducer in the bug shows that it is easy to miss. So checking right here in the method looks appropriate.

@fisk might want to take a look as well.

@openjdk
Copy link

openjdk bot commented May 23, 2024

⚠️ @neethu-prasad the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout JDK-8331911
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 23, 2024
@shipilev
Copy link
Member

@fisk @stefank -- are you good with this for ZGC?

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine to me that the GC backends are responsible for checking if the nmethod is disarmed outside the lock. However, we have some callers that now check it redundantly. I think those callers should stop doing that now. Otherwise, this looks good to me.

@neethu-prasad
Copy link
Contributor Author

It seems fine to me that the GC backends are responsible for checking if the nmethod is disarmed outside the lock. However, we have some callers that now check it redundantly. I think those callers should stop doing that now. Otherwise, this looks good to me.

Thanks for the feedback! Looking around the code, I think there are a few places where we can do more changes.

First, remove check here:

if (bs_nm->is_armed(nm)) {
bool alive = bs_nm->nmethod_entry_barrier(nm);
assert(alive, "should be alive");
}

This would force us to add the check in super-class implementation here:

bool BarrierSetNMethod::nmethod_entry_barrier(nmethod* nm) {

Second, we can remove the check here:

if (!bs_nm->is_armed(nm)) {
return 0;
}
assert(!nm->is_osr_method(), "Should not reach here");
// Called upon first entry after being armed
bool may_enter = bs_nm->nmethod_entry_barrier(nm);

But it does not seem straightforward, because we currently skip cross-modification fence based on is_armed(...) check. Unfortunately, we cannot easily know if nmethod_entry_barrier acted or not, we only know if method is safe or not.  Can we / should we do these refactoring separately?

@fisk
Copy link
Contributor

fisk commented May 30, 2024

It seems fine to me that the GC backends are responsible for checking if the nmethod is disarmed outside the lock. However, we have some callers that now check it redundantly. I think those callers should stop doing that now. Otherwise, this looks good to me.

Thanks for the feedback! Looking around the code, I think there are a few places where we can do more changes.

First, remove check here:

if (bs_nm->is_armed(nm)) {
bool alive = bs_nm->nmethod_entry_barrier(nm);
assert(alive, "should be alive");
}

This would force us to add the check in super-class implementation here:

bool BarrierSetNMethod::nmethod_entry_barrier(nmethod* nm) {

Second, we can remove the check here:

if (!bs_nm->is_armed(nm)) {
return 0;
}
assert(!nm->is_osr_method(), "Should not reach here");
// Called upon first entry after being armed
bool may_enter = bs_nm->nmethod_entry_barrier(nm);

But it does not seem straightforward, because we currently skip cross-modification fence based on is_armed(...) check. Unfortunately, we cannot easily know if nmethod_entry_barrier acted or not, we only know if method is safe or not.  Can we / should we do these refactoring separately?

I see your point. However, this PR is refactoring the code to iron out who is responsible for checking is_armed, so I would prefer if we got that right in this PR. We say it should be the backend code doing that, so the callers shouldn't. I agree with all the changes you just listed and if you make them I would be happy.

Regarding the cross modifying fence, I strongly prefer to not try and be clever. Just run the cross modifying fence unconditionally after calling the backend code. We get there because the barrier was armed anyway.

@neethu-prasad
Copy link
Contributor Author

@fisk
I've addressed the feedback. Can you take a look?
I did not remove the check here. Removing this check resulted in time out when -XX:+DeoptimizeNMethodBarriersALot flag set as it executes deoptimization code path

// Check for disarmed method here to avoid going into DeoptimizeNMethodBarriersALot code
// too often. nmethod_entry_barrier checks for disarmed status itself,
// but we have no visibility into whether the barrier acted or not.
if (!bs_nm->is_armed(nm)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like this check gone, together with the comment above.

Copy link
Member

@shipilev shipilev Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Neethu, though: I think we should proceed to the DeoptimizeNMethodBarrierALot block and cross-modify fences only when barrier acted. We know from testing that it hurts otherwise. I would prefer this change not to introduce new performance potholes, even for verification/test code.

If the argument is cleanliness on who is checking "armed", and that we decide it should be solely in backend, then the middle ground might be adding the out-parameter, like nmethod_entry_barrier(nmethod* nm, bool has_acted), and checking that before proceeding here? That feels uglier than just leaving the check here.

@fisk
Copy link
Contributor

fisk commented Jun 6, 2024

@fisk

I've addressed the feedback. Can you take a look?

I did not remove the check here. Removing this check resulted in time out when -XX:+DeoptimizeNMethodBarriersALot flag set as it executes deoptimization code path

I am quite nervous about having that silly optimization there. So I'm going to have to insist on removing it. Perhaps though, it should be fixed as a separate issue. Allow me to explain myself why it makes me nervous.
The nmethod entry barriers guard modifications to instructions and data through a mixed bag of synchronous and asynchronous cross modifying code.
Synchronous cross modifying code is the sane thing to do; we modify instructions, guarded from concurrent execution by a data flag. After modifying the instructions, the data flag is flipped, and observers are allowed to execute the instructions after executing an instruction cross modification fence.
On for example AArch64 we only perform synchronous cross modifying code, and the stub has the fencing machinery, so it should be okay.
However, on x86_64, we perform a mix of asynchronous and synchronous cross modifying code. The guard word is the immediate part of a compare instruction. If the new disarmed immediate is observed by concurrent execution, instruction cache coherency guarantees that we will correctly observe the cross modified instructions when they are subsequently executed.
However, when we go into the stub slow path, and check is_armed etc, these are data reads. That makes the dance entirely different as it suddenly performs synchronous cross modifying code. If a data read observes that the instructions have been modified, we don't have the same level of guarantees any longer, unless we perform an instruction cross modification fence.
So my concern here, is that the silly optimization to fix some verification code timeout or whatever, is in fact causing a real correctness problem for real release builds, on x86_64. By skipping the cross modification fence we perform an incomplete synchronous instruction cross modification dance that isn't sound.
Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?

@shipilev
Copy link
Member

shipilev commented Jun 6, 2024

Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?

FTR, I don't mind executing cross-modify-fence unconditionally. I do mind going into deopts too often. I do also think that we want to stay on performance-positive side for at least an easy variant of fix, and do potentially regressing things separately. The initial motivation for this work was to resolve an issue in a service workload that runs many threads with similar stacks, and get something that we are sure about for a prompt backport.

To that end, we can continue working out the final shape of the patch here, while we mitigate our current service problems with picking up a limited version of this patch with JDK-8333716 -- it resolves only Shenandoah parts of it, though. Or, we can integrate this patch in its current form, resolving the issue on both Shenandoah and ZGC paths, and work out the check removal as the follow up of JDK-8310239.

I think the latter alternative is more pragmatic.

@fisk
Copy link
Contributor

fisk commented Jun 7, 2024

Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?

FTR, I don't mind executing cross-modify-fence unconditionally. I do mind going into deopts too often. I do also think that we want to stay on performance-positive side for at least an easy variant of fix, and do potentially regressing things separately. The initial motivation for this work was to resolve an issue in a service workload that runs many threads with similar stacks, and get something that we are sure about for a prompt backport.

Fair enough. For what it's worth, aside for the deopt stressing option with arbitrary frequency we can update, we will not deopt more. We just perform an extra cross modifying fence when racingly entering an nmethod concurrently being disarmed. Not performing it might be slightly faster, but is a bug. But I see your point.

To that end, we can continue working out the final shape of the patch here, while we mitigate our current service problems with picking up a limited version of this patch with JDK-8333716 -- it resolves only Shenandoah parts of it, though. Or, we can integrate this patch in its current form, resolving the issue on both Shenandoah and ZGC paths, and work out the check removal as the follow up of JDK-8310239.

I think the latter alternative is more pragmatic.

I'm okay with approving this patch, and we fix the actual bug separately. Sounds good? Then this is a refactoring and optimization, without the bug fix.

@neethu-prasad
Copy link
Contributor Author

@fisk I just merged the latest changes. Do I need approval on the merge commit or can I integrate?

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Will you file a follow-up regarding the dangerous early filtering that we discussed?

@neethu-prasad
Copy link
Contributor Author

Thanks for the review & approval.
I've created follow up bug -
https://bugs.openjdk.org/browse/JDK-8334890

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 24, 2024
@openjdk
Copy link

openjdk bot commented Jun 24, 2024

@neethu-prasad
Your change (at version 86be5b5) is now ready to be sponsored by a Committer.

@shipilev
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 25, 2024

Going to push as commit c30e040.
Since your change was applied there have been 85 commits pushed to the master branch:

  • 974dca8: 8334223: Make Arena MEMFLAGs immutable
  • e527e1c: 8334580: Deprecate no-arg constructor BasicSliderUI() for removal
  • 3a26bbc: 8185429: [macos] After a modal dialog is closed, no window becomes active
  • 4b153e5: 8306580: Propagate CDS dumping errors instead of directly exiting the VM
  • 71a692a: 8321033: Avoid casting Array to GrowableArray
  • 55c7969: 8334765: JFR: Log chunk waste
  • b2930c5: 8334040: jdk/classfile/CorpusTest.java timed out
  • e825ccf: 8332362: Implement os::committed_in_range for MacOS and AIX
  • 5ac2149: 8334299: Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
  • 2e64d15: 8334564: VM startup: fatal error: FLAG_SET_ERGO cannot be used to set an invalid value for NonNMethodCodeHeapSize
  • ... and 75 more: https://git.openjdk.org/jdk/compare/c94af6f943c179553d1827550847b93491d47506...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 25, 2024
@openjdk openjdk bot closed this Jun 25, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 25, 2024
@openjdk
Copy link

openjdk bot commented Jun 25, 2024

@shipilev @neethu-prasad Pushed as commit c30e040.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants