Skip to content

Conversation

@fisk
Copy link
Contributor

@fisk fisk commented Feb 11, 2025

When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location.

Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses.

Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what -XX:ZMarkStackSpaceLimit controls.

While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option.

The -XX:ZMarkStackSpaceLimit JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed.

This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the -XX:ZMarkStackSpaceLimit JVM option redundant. The proposed technique is to use hazard pointers instead.

The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures.

Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable.

I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any interesting changes in the behavior, other than having a more predictable and consistent native memory usage, instead of the slightly more temperamental behavior that we have today due to eagerly handing the mark stack memory back to the OS between GC cycles, while requiring it all back the next cycle.

With this change, another JVM option bites the dust. I have already gotten the CSR to obsolete the -XX:ZMarkStackSpaceLimit JVM option approved (cf. https://bugs.openjdk.org/browse/JDK-8349204).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change requires CSR request JDK-8349204 to be approved
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issues

  • JDK-8347335: ZGC: Use limitless mark stack memory (Enhancement - P4)
  • JDK-8349204: ZGC: Use limitless mark stack memory (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23571/head:pull/23571
$ git checkout pull/23571

Update a local copy of the PR:
$ git checkout pull/23571
$ git pull https://git.openjdk.org/jdk.git pull/23571/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23571

View PR using the GUI difftool:
$ git pr show -t 23571

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23571.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 11, 2025

👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 11, 2025

@fisk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8347335: ZGC: Use limitless mark stack memory

Reviewed-by: aboldtch, iwalulya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 4 new commits pushed to the master branch:

  • 1e87ff0: 8348936: [Accessibility,macOS,VoiceOver] VoiceOver doesn't announce untick on toggling the checkbox with "space" key on macOS
  • 86d0616: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767
  • 0662e39: 8350267: Set mtune and mcpu settings in JDK native lib compilation on Linux ppc64(le)
  • c5c91a8: 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 11, 2025
@openjdk
Copy link

openjdk bot commented Feb 11, 2025

@fisk The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 11, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 11, 2025

Webrevs

Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the old saying goes malloc's the limit. Or maybe it was the sky.

lgtm. Good work!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 20, 2025
Copy link
Member

@walulyai walulyai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Driveby review

@fisk
Copy link
Contributor Author

fisk commented Feb 20, 2025

Looks good!

Driveby review

Thanks Ivan!

@fisk
Copy link
Contributor Author

fisk commented Feb 24, 2025

/integrate

@openjdk
Copy link

openjdk bot commented Feb 24, 2025

Going to push as commit 65f79c1.
Since your change was applied there have been 29 commits pushed to the master branch:

  • e410af0: 8342393: Promote commutative vector IR node sharing
  • f755fad: 8349653: Clarify the docs for MemorySegment::reinterpret
  • a5c9a4d: 8349032: C2: Parse Predicate refactoring in Loop Unswitching broke fix for JDK-8290850
  • 302bed0: 8350499: Minimal build fails with slowdebug builds
  • 0795d11: 8350464: The flags to set the native priority for the VMThread and Java threads need a broader range
  • 05b4812: 8350041: Skip test/jdk/java/lang/String/nativeEncoding/StringPlatformChars.java on static JDK
  • a891630: 8350480: RISC-V: Relax assertion about registers in C2_MacroAssembler::minmax_fp
  • 5cbd9d1: 8349959: Test CR6740048.java passes unexpectedly missing CR6740048.xsd
  • 25322aa: 8350258: AArch64: Client build fails after JDK-8347917
  • 825ab20: 8350456: Test javax/crypto/CryptoPermissions/InconsistentEntries.java crashed: EXCEPTION_ACCESS_VIOLATION
  • ... and 19 more: https://git.openjdk.org/jdk/compare/26bf445f4726f1936a0a4cbaf1424c5235424bfb...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 24, 2025
@openjdk openjdk bot closed this Feb 24, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 24, 2025
@openjdk
Copy link

openjdk bot commented Feb 24, 2025

@fisk Pushed as commit 65f79c1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants