Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261448: Preserve GC stack watermark across safepoints in StackWalk #2500

Closed
wants to merge 3 commits into from

Conversation

rkennke
Copy link
Contributor

@rkennke rkennke commented Feb 10, 2021

I am observing the following assert:

# Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
# assert(is_frame_safe(f)) failed: Frame must be safe

(see issue for full hs_err)

In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.

This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.

Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.

Testing:

  • StackWalk tests with Shenandoah/aggressive
  • StackWalk tests with ZGC/aggressive
  • tier1 (+Shenandoah/ZGC)
  • tier2 (+Shenandoah/ZGC)

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8261448: Preserve GC stack watermark across safepoints in StackWalk

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500
$ git checkout pull/2500

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 10, 2021

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Feb 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 10, 2021

@rkennke The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Feb 10, 2021
@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 10, 2021

/label hotspot-gc

@openjdk openjdk bot added the hotspot-gc label Feb 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 10, 2021

@rkennke
The hotspot-gc label was successfully added.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 10, 2021

Webrevs

@rkennke rkennke marked this pull request as draft Feb 10, 2021
@openjdk openjdk bot removed the rfr label Feb 10, 2021
@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 10, 2021

I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant.

@rkennke rkennke marked this pull request as ready for review Feb 12, 2021
@openjdk openjdk bot added the rfr label Feb 12, 2021
@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 12, 2021

I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we also can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom and mainline JDK though.

@fisk
Copy link
Contributor

@fisk fisk commented Feb 12, 2021

I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread.
Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting?

Copy link
Contributor

@fisk fisk left a comment

Nesting code looks wrong.

@stefank
Copy link
Member

@stefank stefank commented Feb 15, 2021

I incorrectly read Erik's comment as "Nesting code looks good", so I created a unit test to show the problem with the patch:
stefank@8760f1b

Maybe you could build a few more test based on this?

@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 15, 2021

I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread.
Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting?

I just realized that the reentrancy comes from the Java call lower in fetchFirstBatch(). The problem can be easily avoided by putting the KeepStackGCProcessedMark in sensible scope that excludes the call.

Copy link
Member

@stefank stefank left a comment

Looks good.

@openjdk
Copy link

@openjdk openjdk bot commented Feb 22, 2021

@rkennke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261448: Preserve GC stack watermark across safepoints in StackWalk

Reviewed-by: eosterlund, stefank

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 148 new commits pushed to the master branch:

  • 26c1db9: 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522.
  • 0c21dd0: 6206189: Graphics2D.clip specifies incorrectly that a 'null' is a valid value for this method
  • 2b55501: 8261949: fileStream::readln returns incorrect line string
  • 539c80b: 8261702: ClhsdbFindPC can fail due to PointerFinder incorrectly thinking an address is in a .so
  • 564011c: 8261290: Improve error message for NumberFormatException on null input
  • 18188c2: 8261692: Bugs in clhsdb history support
  • 0825bc5: 8261929: ClhsdbFindPC fails with java.lang.RuntimeException: 'In java stack' missing from stdout/stderr
  • c2509ea: 8261857: serviceability/sa/ClhsdbPrintAll.java failed with "Test ERROR java.lang.RuntimeException: 'cannot be cast to' found in stdout"
  • 2b00367: 8261350: Create implementation for NSAccessibilityCheckBox protocol peer
  • 5a25cea: 8261998: Remove unused shared entry support from utilities/hashtable
  • ... and 138 more: https://git.openjdk.java.net/jdk/compare/ef7ee3f44e4dbdde28406ac813e2e3ad20aec849...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Feb 22, 2021
@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 22, 2021

Looks good.

Thanks, Stefan!

@fisk also good?

fisk
fisk approved these changes Feb 22, 2021
Copy link
Contributor

@fisk fisk left a comment

Also good!

@rkennke
Copy link
Contributor Author

@rkennke rkennke commented Feb 22, 2021

/integrate

@openjdk openjdk bot closed this Feb 22, 2021
@openjdk openjdk bot added integrated and removed ready labels Feb 22, 2021
@openjdk openjdk bot removed the rfr label Feb 22, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 22, 2021

@rkennke Since your change was applied there have been 148 commits pushed to the master branch:

  • 26c1db9: 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522.
  • 0c21dd0: 6206189: Graphics2D.clip specifies incorrectly that a 'null' is a valid value for this method
  • 2b55501: 8261949: fileStream::readln returns incorrect line string
  • 539c80b: 8261702: ClhsdbFindPC can fail due to PointerFinder incorrectly thinking an address is in a .so
  • 564011c: 8261290: Improve error message for NumberFormatException on null input
  • 18188c2: 8261692: Bugs in clhsdb history support
  • 0825bc5: 8261929: ClhsdbFindPC fails with java.lang.RuntimeException: 'In java stack' missing from stdout/stderr
  • c2509ea: 8261857: serviceability/sa/ClhsdbPrintAll.java failed with "Test ERROR java.lang.RuntimeException: 'cannot be cast to' found in stdout"
  • 2b00367: 8261350: Create implementation for NSAccessibilityCheckBox protocol peer
  • 5a25cea: 8261998: Remove unused shared entry support from utilities/hashtable
  • ... and 138 more: https://git.openjdk.java.net/jdk/compare/ef7ee3f44e4dbdde28406ac813e2e3ad20aec849...master

Your commit was automatically rebased without conflicts.

Pushed as commit c20fb5d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-gc integrated
3 participants