Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8322043: HeapDumper should use parallel dump by default #18748

Closed
wants to merge 6 commits into from

Conversation

alexmenkov
Copy link

@alexmenkov alexmenkov commented Apr 12, 2024

The fix makes VM heap dumping parallel by default.
jcmd GC.heap_dump and jmap -dump had parallel dumping by default, the fix affects HotSpotDiagnosticMXBean.dumpHeap(), -XX:+HeapDumpBeforeFullGC, -XX:+HeapDumpAfterFullGC and -XX:+HeapDumpOnOutOfMemoryError.

Testing:

  • manually tested different heap dump scenarios with -Xlog:heapdump;
  • tier1,tier2,hs-tier5-svc;
  • all reg.tests that use heap dump.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8322043: HeapDumper should use parallel dump by default (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18748/head:pull/18748
$ git checkout pull/18748

Update a local copy of the PR:
$ git checkout pull/18748
$ git pull https://git.openjdk.org/jdk.git pull/18748/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18748

View PR using the GUI difftool:
$ git pr show -t 18748

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18748.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 12, 2024

👋 Welcome back amenkov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@alexmenkov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322043: HeapDumper should use parallel dump by default

Reviewed-by: yyang, sspitsyn, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 351 new commits pushed to the master branch:

  • b96b38c: 8318682: SA decoding of scalar replaced objects is broken
  • a863ef5: 8331207: Misleading example in DateFormat#parse docs
  • aca1e83: 8329223: Parallel: Parallel GC resizes heap even if -Xms = -Xmx
  • 3d11692: 8331252: C2: MergeStores: handle negative shift values
  • 9ce21d1: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow
  • 130f71c: 8326742: Change compiler tests without additional VM flags from @run driver to @run main
  • f4caac8: 8329138: Convert JFR FileForceEvent to static mirror event
  • 2cc8ecc: 8331346: Update PreviewFeature of STREAM_GATHERERS to JEP-473
  • 33e8122: 8331410: Remove unused MemAllocator::mem_allocate_inside_tlab
  • 22a1c61: 8330817: jdk/internal/vm/Continuation/OSRTest.java times out on libgraal
  • ... and 341 more: https://git.openjdk.org/jdk/compare/e1183ac044f803bf0d4ccfebc2b1cd5b33294c7a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title JDK-8322043: HeapDumper should use parallel dump by default 8322043: HeapDumper should use parallel dump by default Apr 12, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 12, 2024
@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@alexmenkov The following labels will be automatically applied to this pull request:

  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Apr 12, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 12, 2024

Webrevs

@tstuefe
Copy link
Member

tstuefe commented Apr 12, 2024

I am curious: what is the memory overhead for parallel mode, and (I am not familiar with the logic) how many threads are involved? Is the number of thread bounded?

I ask because, especially for the OnOOM handling, we may already be at a limit memory-wise. Starting to swap will probably be worse than running single-threaded.

@alexmenkov
Copy link
Author

I am curious: what is the memory overhead for parallel mode, and (I am not familiar with the logic) how many threads are involved? Is the number of thread bounded?

I ask because, especially for the OnOOM handling, we may already be at a limit memory-wise. Starting to swap will probably be worse than running single-threaded.

Good question.
It think it's several MB per each additional thread (1MB output buffer, DumperClassCacheTable - 1031 elements max, element size depends on class field numbers, if HeapDumpGzipLevel is set, some buffers for gzip compressors)
Number of threads by default is min of os::initial_active_processor_count() * 3 / 8 and number of GC workers.

@tstuefe
Copy link
Member

tstuefe commented Apr 13, 2024

I am curious: what is the memory overhead for parallel mode, and (I am not familiar with the logic) how many threads are involved? Is the number of thread bounded?
I ask because, especially for the OnOOM handling, we may already be at a limit memory-wise. Starting to swap will probably be worse than running single-threaded.

Good question. It think it's several MB per each additional thread (1MB output buffer, DumperClassCacheTable - 1031 elements max, element size depends on class field numbers, if HeapDumpGzipLevel is set, some buffers for gzip compressors) Number of threads by default is min of os::initial_active_processor_count() * 3 / 8 and number of GC workers.

For the OOM case, I would probably make it somehow dependent on os::free_memory() then.

Copy link
Member

@y1yang0 y1yang0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for doing this!

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 30, 2024
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise looks good. Thanks

src/hotspot/share/services/diagnosticCommand.cpp Outdated Show resolved Hide resolved
@alexmenkov
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented May 1, 2024

Going to push as commit 0a24dae.
Since your change was applied there have been 359 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 1, 2024
@openjdk openjdk bot closed this May 1, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 1, 2024
@openjdk
Copy link

openjdk bot commented May 1, 2024

@alexmenkov Pushed as commit 0a24dae.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@alexmenkov alexmenkov deleted the heapdump_mt branch May 1, 2024 19:53
@shipilev
Copy link
Member

shipilev commented May 2, 2024

Late to the party here, sorry.

Are there motivational performance improvements that we get from enabling parallel heap dumps by default? Asking because JDK-8319650 and JDK-8320924 improved single-threaded heap dump performance drastically, and the I/O looked to be a bottleneck going forward.

Would parallel heap dump take more I/O for writing chunks first, and then combining them into large file? Do we know that parallel heap dump still wins a lot? I don't think it does all that much. Here is a simple run using the workload from JDK-8319650 on 10-core machine:

$ for I in `seq 1 5`; do java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done

# jdk mainline with this fix (parallel)
Heap dump file created [1897668110 bytes in 1.401 secs]
Heap dump file created [1897667841 bytes in 1.354 secs]
Heap dump file created [1897668050 bytes in 1.440 secs]
Heap dump file created [1897668101 bytes in 1.366 secs]
Heap dump file created [1897668101 bytes in 1.345 secs]

# jdk mainline without this fix (sequential)
Heap dump file created [1897668092 bytes in 2.314 secs]
Heap dump file created [1897668092 bytes in 2.384 secs]
Heap dump file created [1897668092 bytes in 2.269 secs]
Heap dump file created [1897668092 bytes in 2.274 secs]
Heap dump file created [1897667816 bytes in 2.282 secs]

This is less than 2x improvement, even though we took 3 threads to heap dump:

[1.645s][info][heapdump] Requested dump threads 3, active dump threads 9, actual dump threads 3, parallelism true
[1.649s][info][heapdump] Dump non-objects, 0.0046221 secs
[1.651s][info][heapdump] Dump non-objects (part 2), 0.0019995 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5850964 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5852571 secs
[2.230s][info][heapdump] Dump heap objects in parallel, 0.5790543 secs
[2.558s][info][heapdump] Merge segmented heap file, 0.3268282 secs
[2.863s][info][heapdump] Merge segmented heap file, 0.3047630 secs
[3.307s][info][heapdump] Merge segmented heap file, 0.4436261 secs
[3.308s][info][heapdump] Merge heap files complete, 1.0766620 secs
Heap dump file created [1897667959 bytes in 1.664 secs]

And this is on a fast SSD, where I/O is abundant, and there is plenty of space.
The sequential heap dump also seems to be regressing against jdk21u-dev, which does:

Heap dump file created [1897840374 bytes in 1.071 secs]
Heap dump file created [1897840481 bytes in 1.070 secs]
Heap dump file created [1897840490 bytes in 1.069 secs]
Heap dump file created [1897840481 bytes in 1.073 secs]
Heap dump file created [1897840481 bytes in 1.134 secs]

I believe that is because the 2-phase heap dump makes excess work for a single-threaded heap dump. Note that the parallel heap dump in current mainline is not even able to catch up with what we already had with sequential heap dump. So all this together looks like a performance regression.

I propose we revert this switch to parallel, fix the sequential heap dump performance, and then reconsider -- with benchmarks -- if we want to switch to parallel.

@alexmenkov
Copy link
Author

Currently heap dump is always performed in 2 phases (even if single threaded heap dump is requested or SerialGC is used). This is required to correctly handle unmounted virtual threads.
This was implemented in jdk22, so your testing shows regression comparing with jdk21u (which does not dump unmounted vthreads and references from them).
Note also that you use -XX:+HeapDumpAfterFullGC in your testing and look at total heap dump time.
Main advantage of the 2 phase dumping is decreasing STW time (merge phase is performed on the current thread outside of safepoint). I.e. the idea is not to decrease total heap dump time, but to minimize JVM freeze during dumping.
But this does not work in case of -XX:+HeapDumpBeforeFullGC and -XX:+HeapDumpAfterFullGC because heap dumping is requested inside safepoint, so merge stage is also performed in safepoint too (I think it's possible to fix it so merge is performed on some other thread, but I'm not sure it worth it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants