Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8321565: [REDO] Heap dump does not contain virtual Thread stack references #17040

Closed
wants to merge 2 commits into from

Conversation

alexmenkov
Copy link

@alexmenkov alexmenkov commented Dec 8, 2023

Original fix for JDK-8299426 (Heap dump does not contain virtual Thread stack references, #16665) caused failures of new test (added while #16665 was under review):
test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndHeapDump.java in many tears and was reverted.

Segmented heap dump assumes "merge" stage is executed outside of safepoint (to not block the VM), but heap dump may happen during safepoint (and TestReduceAllocationAndHeapDump.java test provoke the case).
The change contains original fix for JDK-8299426 ("original fix") and removes asserts from HeapMerger (allow heapdump in safepoints).

Run tier1-3 and heapdump-related tests.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8321565: [REDO] Heap dump does not contain virtual Thread stack references (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17040/head:pull/17040
$ git checkout pull/17040

Update a local copy of the PR:
$ git checkout pull/17040
$ git pull https://git.openjdk.org/jdk.git pull/17040/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17040

View PR using the GUI difftool:
$ git pr show -t 17040

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17040.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 8, 2023

👋 Welcome back amenkov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 8, 2023

@alexmenkov The following labels will be automatically applied to this pull request:

  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Dec 8, 2023
@alexmenkov alexmenkov changed the title DRAFT JDK-8321565: [REDO] Heap dump does not contain virtual Thread stack references JDK-8321565: [REDO] Heap dump does not contain virtual Thread stack references Dec 8, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 8, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 8, 2023

Webrevs

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the same fix that we REDO, is it right?
Also, I'd like to double check on what testing was done for this REDO?

@alexmenkov
Copy link
Author

This seems to be the same fix that we REDO, is it right? Also, I'd like to double check on what testing was done for this REDO?

It's almost the same (the only differences are deleted asserts in HeapMerger methods).
The PR contains 2 commits to simplify re-review.
The 1st one ("original fix") is exactly the same commit which was integrated for JDK-8299426 and which was reverted;
The 2nd one ("allow heapdump in safepoints") contains the fix for the issue discovered after integration.

The fix was tested by running heapdump-related tests: test/hotspot/jtreg/serviceability, test/hotspot/jtreg/runtime/ErrorHandling, test/hotspot/jtreg/gc/epsilon, test/jdk/sun/tools/jhsdb
and tier1..3 on all Oracle supported platforms

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REDO fix looks good.
Good luck with this one!

@openjdk
Copy link

openjdk bot commented Dec 9, 2023

@alexmenkov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321565: [REDO] Heap dump does not contain virtual Thread stack references

Reviewed-by: sspitsyn, yyang, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 28 new commits pushed to the master branch:

  • 3d9d353: 8321825: Remove runtime/CompressedOops/CompressedClassPointers.java from the ProblemList
  • 1b621f5: 8321474: TestAutoCreateSharedArchiveUpgrade.java should be updated with JDK 21
  • 5463c9c: 8321557: Move SOURCE line verification for OracleJDK out of OpenJDK
  • ac07355: 8320200: Use Elements predicates for record constructors to improve print output
  • 4fb5c12: 8321180: Condition for non-latin1 string size too large exception is off by one
  • d5a96e3: 8321621: Update JCov version to 3.0.16
  • aadf368: 6230751: [Fmt-Ch] Recursive MessageFormats in ChoiceFormats ignore indicated subformats
  • a3447ec: 8321827: Remove unnecessary suppress warnings annotations from the printing processor
  • b25ed57: 8270269: Desktop.browse method fails if earlier CoInitialize call as COINIT_MULTITHREADED
  • df4ed7e: 8321739: Source launcher fails with "Not a directory" error
  • ... and 18 more: https://git.openjdk.org/jdk/compare/5c12a182e3f9aed8d075bb066cb8a093abab92de...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 9, 2023
@@ -2063,7 +2063,6 @@ void DumpMerger::set_error(const char* msg) {
// read+write combination, which would require transferring data to and from
// user space.
void DumpMerger::merge_file(const char* path) {
assert(!SafepointSynchronize::is_at_safepoint(), "merging happens outside safepoint");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fix the failure but why is this restriction in place to begin with? This seems like a very expensive operation to be performing by the VMThread during a safepoint!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main idea of segmented heap dump feature (JDK-8306441, #13667) is to divide heap dump process to 2 phases - dump to segment files (this phase is executed at safepoint) and merge of the segment files (this one is supposed to be executed outside of safepoint, so VM is not blocked in this phase). And this asserts were added to ensure it works as desired.
I think merge stage can be performed on the current thread without VMOperation, but I don't know all possible consequences of the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to the virtual thread changes what thread is performing the merge? After these changes what thread would do it if not included in the VM operation?

The assert is correct - we should not do the merge at the safepoint (or even in the VMThread).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to the virtual thread changes what thread is performing the merge?

Prior the change merge is performed on either AttachListener thread (when it's the current thread - special case for tools like jcmd) or on VMThread (if the current thread is not AttachListener thread).
Note also that prior the change segmented dumping was conditional - if GC supports multiple workers and if the caller requests parallel dump. HeapDumpAfterFullGC/HeapDumpBeforeFullGC request dumping in 1 thread, so there was no merge in the scenarios.

After these changes what thread would do it if not included in the VM operation?

On the current thread. As I wrote before I 'm not sure if it can cause some unexpected consequences in some cases.
I tried to do the change - tier1 passed (TestReduceAllocationAndHeapDump.java test calls Runtime.gc() with HeapDumpAfterFullGC, merge is performed on main java thread)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong. TestReduceAllocationAndHeapDump.java failed on the assert.
I.e. merge is performed on main java thread, but VM is at safepoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. merge is performed on main java thread, but VM is at safepoint

So is the main thread operating in_native whilst doing the merge? I suspect the admonition of not doing the merge at a safepoint actually meant "not a safepoint by the VMThread" as that would cause the whole VM to pause. Even doing it in the VMThread at all can delay the next safepoint, which does not seem good.

I'm not familiar with this code in general (and only looked at this because of the previous issue in the CI) but I'm unclear what including virtual thread stack referencves has to do with the merging logic? I would not expect merging to be affected by the current change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the main thread operating in_native whilst doing the merge? I suspect the admonition of not doing the merge at a safepoint actually meant "not a safepoint by the VMThread" as that would cause the whole VM to pause. Even doing it in the VMThread at all can delay the next safepoint, which does not seem good.

I think the thread is in_vm.
Main java thread call Runtime.gc(), GC itself calls heap dumper before/after full gc.
I don't know if there is a performance difference in the case between executing merge on the current thread and VMThread.
HeapDumpBeforeFullGC/HeapDumpAfterFullGC/HeapDumpOnOutOfMemoryError are diagnostic flags, so I think performance is not so important here.
There are plans to improve their performance by turning on parallel dump (using GC worker) - often parallel heap dump + merge takes less time than single-threaded dump (we have now).

I'm not familiar with this code in general (and only looked at this because of the previous issue in the CI) but I'm unclear what including virtual thread stack referencves has to do with the merging logic? I would not expect merging to be affected by the current change.

This is caused by a nature of unmounted virtual threads.
We need to dump stack traces before we dump heap objects, but we can find unmounted virtual threads only by iterating over heap. So we need a way to add data to the beginning of the dump file while dumping heap objects. Segmented dump helps here - we have a designated segment to write stack traces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested different scenarios (all I know) of heap dump:

  • via attach (jcmd);
  • JMX (HotSpotDiagnosticMXBean);
  • HeapDumpBeforeFullGC/HeapDumpAfterFullGC;
  • HeapDumpOnOutOfMemoryError.
    I see no problem with merge on the current thread.
    Only HeapDumpBeforeFullGC/HeapDumpAfterFullGC causes heap dump (and merge) at safepoint.
    But HeapDumpBeforeFullGC/HeapDumpAfterFullGC/HeapDumpOnOutOfMemoryError depend on GC and require much more testing, I think it's better to integrate this fix as it is and update the way merger is executed by a separate follow-up change (with cleanup of related stuff)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'm concerned that the merge, whether by the current thread whilst in_vm, or by the VMThread, will either prevent the VM from going to a safepoint in a timely manner, or cause the safepoint to be excessively long. I'm not sure how problems in this area will become visible.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the removal of the problematic assert from the first version.

@alexmenkov
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 13, 2023

Going to push as commit cf94854.
Since your change was applied there have been 34 commits pushed to the master branch:

  • 7ece9e9: 8321400: java/foreign/TestStubAllocFailure.java fails with code cache exhaustion
  • 9320ef9: 8321973: Parallel: Remove unused methods in AdaptiveSizePolicy
  • 2a565ff: 8321808: G1: Use unsigned type for non-negative G1 flags
  • 493b5bd: 8293622: Cleanup use of G1ConcRefinementThreads
  • f573f6d: 8321515: ARM32: Move method resolution information out of the cpCache properly
  • 8a0a6f8: 8321279: Implement hashCode() in Heap-X-Buffer.java.template
  • 3d9d353: 8321825: Remove runtime/CompressedOops/CompressedClassPointers.java from the ProblemList
  • 1b621f5: 8321474: TestAutoCreateSharedArchiveUpgrade.java should be updated with JDK 21
  • 5463c9c: 8321557: Move SOURCE line verification for OracleJDK out of OpenJDK
  • ac07355: 8320200: Use Elements predicates for record constructors to improve print output
  • ... and 24 more: https://git.openjdk.org/jdk/compare/5c12a182e3f9aed8d075bb066cb8a093abab92de...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 13, 2023
@openjdk openjdk bot closed this Dec 13, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 13, 2023
@openjdk
Copy link

openjdk bot commented Dec 13, 2023

@alexmenkov Pushed as commit cf94854.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@alexmenkov alexmenkov deleted the vthread_heapdump_2 branch December 13, 2023 19:03
@alexmenkov alexmenkov restored the vthread_heapdump_2 branch December 19, 2023 22:34
@alexmenkov
Copy link
Author

/backport jdk22

@openjdk
Copy link

openjdk bot commented Dec 19, 2023

@alexmenkov the backport was successfully created on the branch backport-alexmenkov-cf948548 in my personal fork of openjdk/jdk22. To create a pull request with this backport targeting openjdk/jdk22:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit cf948548 from the openjdk/jdk repository.

The commit being backported was authored by Alex Menkov on 13 Dec 2023 and was reviewed by Serguei Spitsyn, Yi Yang and David Holmes.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22:

$ git fetch https://github.com/openjdk-bots/jdk22.git backport-alexmenkov-cf948548:backport-alexmenkov-cf948548
$ git checkout backport-alexmenkov-cf948548
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk22.git backport-alexmenkov-cf948548

@alexmenkov alexmenkov deleted the vthread_heapdump_2 branch January 11, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
4 participants