Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8319650: Improve heap dump performance with class metadata caching #16545

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Nov 7, 2023

See the rationale and some profiles in the bug.

This PR implements simple caching for object instance dumping, which is overwhelmingly hottest path in heap dumping code. This allows to put the cache straight in HeapObjectDumper, which is effectively thread local. A simple hash table would then work to cache the class metadata.

Given a simple workload from the bug, these are the motivational improvements and historical perspective:

$ for I in `seq 1 3`; do java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done

# Baseline JDK 8u382
Heap dump file created [1440865228 bytes in 5.492 secs]
Heap dump file created [1440865228 bytes in 5.460 secs]
Heap dump file created [1440865228 bytes in 5.471 secs]

# Baseline JDK 11.0.21
Heap dump file created [1446043070 bytes in 5.088 secs]
Heap dump file created [1446042039 bytes in 5.096 secs]
Heap dump file created [1446042835 bytes in 5.081 secs]

# Baseline JDK 17.0.10
Heap dump file created [1446841531 bytes in 3.276 secs]
Heap dump file created [1446841673 bytes in 3.342 secs]
Heap dump file created [1446841590 bytes in 3.323 secs]

# Baseline JDK 21.0.2 EA
Heap dump file created [1447828697 bytes in 5.516 secs]
Heap dump file created [1447829756 bytes in 5.461 secs]
Heap dump file created [1447829539 bytes in 5.435 secs]

# Baseline JDK 22
Heap dump file created [1447285149 bytes in 5.232 secs]
Heap dump file created [1447285149 bytes in 5.211 secs]
Heap dump file created [1447284813 bytes in 5.222 secs]

# Patched JDK 22
Heap dump file created [1447285087 bytes in 1.071 secs]
Heap dump file created [1447285149 bytes in 1.073 secs]
Heap dump file created [1447285073 bytes in 1.058 secs]

This amounts to 5x improvement, 1.2 GB/sec dump rate (somewhat I/O limited), and the best heap dump performance across all JDK releases. I am planning to backport this patch where possible: at least to JDK 21u, but hopefully to JDK 17u as well.

There are other simple improvements, but this one is the top bottleneck.

Additional testing:

  • Linux x86_64 server fastdebug, serviceability/ (contains heap dump tests)
  • Linux x86_64 server fastdebug, runtime/ErrorHandling (contains heap dump on failure tests)
  • Linux x86_64 server fastdebug, gc/epsilon (contains heap dump on failure tests)
  • Linux x86_64 server fastdebug, sun/tools/jhsdb (contains heap dump tests)

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8319650: Improve heap dump performance with class metadata caching (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16545/head:pull/16545
$ git checkout pull/16545

Update a local copy of the PR:
$ git checkout pull/16545
$ git pull https://git.openjdk.org/jdk.git pull/16545/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16545

View PR using the GUI difftool:
$ git pr show -t 16545

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16545.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 7, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 7, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Nov 7, 2023
@shipilev shipilev marked this pull request as ready for review November 7, 2023 19:27
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 7, 2023
@shipilev shipilev force-pushed the JDK-8319650-heapdump-cache-meta branch from 8f0b8c9 to a19501f Compare November 7, 2023 19:35
@openjdk
Copy link

openjdk bot commented Nov 7, 2023

@shipilev Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@mlbridge
Copy link

mlbridge bot commented Nov 7, 2023

Webrevs

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. I only have a questions related to resource management (see inline).

src/hotspot/share/services/heapDumper.cpp Show resolved Hide resolved
Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Nov 8, 2023

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8319650: Improve heap dump performance with class metadata caching

Reviewed-by: simonis, coleenp, yyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 14 new commits pushed to the master branch:

  • a95062b: 8319670: Improve comments describing system properties for TLS server and client for max chain length
  • 38745ec: 8319649: inline_boxing_calls unused gvn variable
  • 636a351: 8319429: Resetting MXCSR flags degrades ecore
  • d7b0ba9: 8319554: Select LogOutput* directly for stdout and stderr
  • 68110b7: 8319574: Exec/process tests should be marked as flagless
  • 7b971c1: 8319705: RISC-V: signumF/D intrinsics fails compiler/intrinsics/math/TestSignumIntrinsic.java
  • f939542: 8319324: FFM: Reformat javadocs
  • a3f1b33: 8319664: IGV always output on PhaseRemoveUseless
  • f57b78c: 8319726: Parallel GC: Re-use object in object-iterator
  • 4451a92: 8319748: [JVMCI] TestUseCompressedOopsFlagsWithUlimit.java crashes on libgraal
  • ... and 4 more: https://git.openjdk.org/jdk/compare/b0fc8082f02ee5ed5e17474221aae4dd9285e78f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 8, 2023
@y1yang0
Copy link
Member

y1yang0 commented Nov 9, 2023

What is the proportion of RSS growth caused by class caching during heapdump? As we create class cache entry(num_of_fields*8+ len_of_sig + u4_instance_size + u4_entries) per instance klass

@shipilev
Copy link
Member Author

shipilev commented Nov 9, 2023

What is the proportion of RSS growth caused by class caching during heapdump? As we create class cache entry(num_of_fields*8+ len_of_sig + u4_instance_size + u4_entries) per instance klass

Good question!

On simple Hello World like scenarios, NMT shows about 80 bytes of additional mtServiceability per class. This should include cache entry, table overhead itself, etc. This means means for 1M classes (this is the highest I seen in real world), it is 80M additional memory. But then for parallel heap dumps, each thread would have its own table, so memory costs grow proportionally. For 1K core machine (thinking ahead here), it would take 80G (!) of additional memory.

But then nothing really requires us to cache all the classes all the time, or track the class popularity. We can just clear the entire table every so often, and let it regenerate with the most frequent classes again and again. If we cache only the 256 most recently popular classes, then the final number from the extreme example above gets to the modest 20M of additional memory. In practice, with normal thread counts, I would expect the overhead to be in double-digit KBs.

Implemented in new commit. I checked with CACHE_TOP = 0 to stress the scenario we always get for the table lookup/purge (not the worst case, since we are purging an empty table), and the performance is not worse than the baseline.

@shipilev
Copy link
Member Author

shipilev commented Nov 9, 2023

A bit more realistic test, spring from Dacapo 23.11:

$  java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms256m -Xmx256m -jar dacapo-23.11-chopin.jar spring

# Before
Dumping heap to java_pid61605.hprof ...
Heap dump file created [21075711 bytes in 0.060 secs]
Dumping heap to java_pid61605.hprof.1 ...
Heap dump file created [35894740 bytes in 0.116 secs]
Dumping heap to java_pid61605.hprof.2 ...
Heap dump file created [65576913 bytes in 0.518 secs]
Dumping heap to java_pid61605.hprof.3 ...
Heap dump file created [74486004 bytes in 0.547 secs]
===== DaCapo 23.11-chopin spring starting =====
Starting 8192 requests...
Dumping heap to java_pid61605.hprof.4 ...
Heap dump file created [85493155 bytes in 0.597 secs]
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-chopin spring PASSED in 5009 msec =====

# After
Dumping heap to java_pid61545.hprof ...
Heap dump file created [21009027 bytes in 0.023 secs]
Dumping heap to java_pid61545.hprof.1 ...
Heap dump file created [35898640 bytes in 0.041 secs]
Dumping heap to java_pid61545.hprof.2 ...
Heap dump file created [65457100 bytes in 0.072 secs]
Dumping heap to java_pid61545.hprof.3 ...
Heap dump file created [74483982 bytes in 0.089 secs]
===== DaCapo 23.11-chopin spring starting =====
Starting 8192 requests...
Dumping heap to java_pid61545.hprof.4 ...
Heap dump file created [85815887 bytes in 0.115 secs]
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-chopin spring PASSED in 4539 msec =====

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I have a couple of minor comments.

src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
Copy link
Member

@y1yang0 y1yang0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for taking extra memory consumption into consideration.

src/hotspot/share/services/heapDumper.cpp Show resolved Hide resolved
@shipilev
Copy link
Member Author

If there are no other comments, I would like to integrate it soon. Thank you all!

@shipilev
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 13, 2023

Going to push as commit 03db828.
Since your change was applied there have been 27 commits pushed to the master branch:

  • b41b00a: 8319820: Use unnamed variables in the FFM implementation
  • 4d650fe: 8319704: LogTagSet::set_output_level() should not accept NULL as LogOutput
  • 6f863b2: 8318636: Add jcmd to print annotated process memory map
  • e035637: 8319375: test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineLeakThrowable.java runs into OutOfMemoryError: Metaspace on AIX
  • 50f41d6: 8309893: Integrate ReplicateB/S/I/L/F/D nodes to Replicate node
  • caf7181: 8318189: ChoiceFormat::format throws undocumented AIOOBE
  • 9938b3f: 8319314: NMT detail report slow or hangs for large number of mappings
  • c9077b8: 8319339: Internal error on spurious markup in a hybrid snippet
  • ea1ffa3: 8318895: Deoptimization results in incorrect lightweight locking stack
  • c9657ca: 8319882: SequenceLayout::toString throws ArithmeticException
  • ... and 17 more: https://git.openjdk.org/jdk/compare/b0fc8082f02ee5ed5e17474221aae4dd9285e78f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 13, 2023
@openjdk openjdk bot closed this Nov 13, 2023
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 13, 2023
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 13, 2023
@openjdk
Copy link

openjdk bot commented Nov 13, 2023

@shipilev Pushed as commit 03db828.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8319650-heapdump-cache-meta branch November 28, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants