Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8017163: G1: Refactor remembered sets #4116

Closed

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented May 19, 2021

Hi all,

can I have reviews for this change that significantly refactors the remembered set for more scalability.

The current G1 remembered set implementation has been designed for use cases and Java heaps and applications from 20 years ago.

Over time many problems with performance and in particular memory usage have been observed:

  • adding elements to the lowest tier data structure takes a per-remembered set global lock. Measurements have shown that the applications can wait thousands of seconds acquiring these locks. While the affected threads are in most cases refinement threads so does not directly affect the application, it can still affect the ability of G1 to meet some goals needed for keeping pause times (i.e. amount of cards from the refinement buffers to be merged into the card table and then scanned during gc).

  • there is a substantial memory overhead for managing the data structures: examples are

    • using separate (hash) tables for the three different types of card containers
    • there is significant unnecessary preallocation of memory for some of the card set containers
    • Containers store redundant information
  • inflexibility when reusing memory: in the current implementation the different containers use different approaches to manage memory. Most use the C heap directly, some the C heap with some internal global memory pool. This in practice makes it very difficult to implement anything other than giving back memory in the collection pause. The corresponding "Free Collection Set" pause can take a significant amount of time because of that.
    Also memory reuse is limited and preallocating arenas is limited (or would have to be reimplemented multiple times), stressing the C heap allocator.

  • inability to support additional use cases: over time interesting ideas (e.g. JDK-8058803) came up for improving performance of remembered set management. Mostly due to redundant information everywhere and completely different handling of various aspects in the containers it is in practice impossible to implement these.

  • (partial) inability to give back memory to the OS. While some of the containers use the C heap allocator, and so in some way give back memory, these implementations and handling is different for every container.

  • the existing granularity of containers are unbalanced: currently there exist three tiers: "sparse", "fine" and "full". Sparse is an array of cards ranging in the hundreds maybe, "fine" is a bitmap covering a whole region and full is a bit indicating that that region should be scanned completely during GC.

The problem is that there is nothing between "no card at all" and "sparse" and in particular the difference between the capability to hold entries of "sparse" and "fine". I.e. memory usage difference when exceeding a "sparse" array (holding 128 entries at 32M regions, taking ~256 bytes) to fine that is able to hold 65k entries using 8kB is significant.
For these reason there is even a dedicated option to stop allocating more "fine" containers and just give up and use "full" instead to avoid excessive memory usage. With extremely bad consequences in pause times.

Over time some of these issues have been fixed or in many cases band-aided, and some of these fixes and ideas were the result of working on this change (e.g. JDK-8262185, JDK-8233919, JDK-8213108).

This change is effectively a rewrite of the Java heap card based part of a region's remembered set.

This initial fully working change can be roughly described with the following properties:

  • use a single ConcurrentHashTable for the card containers of a given region. The container in use replaced (coarsened) on the fly within the CHT node, completely lock-free. This implements JDK-6949259.

  • memory for a given region's remembered set for all containers (and the CHT nodes) is backed by per container type and per remembered set arena style bump-pointer allocation buffers. In this change, in the pause, memory is given back to free lists only. The implementation gives back memory to the OS concurrently to the application. Memory is still managed using the C heap memory manager though, but abstracted away and could be replaced by manual page memory management.

  • there are now four different container types and one meta-container type. These four actual containers are:

    • inline pointer: the change store a few (3-5) cards in the CHT node directly and uses no extra memory.
    • array of cards: similar to the "sparse" container, an array of cards with a configurable amount of entries. However bulk allocation of memory is now managed at a lower level so there is much less waste.
    • bitmap: similar to "fine", a bitmap spanning a (sub-)range of memory
    • full: same as full, indicating for a (sub-)range of memory that all cards are to be looked at during scan. Similar to inline pointers, this uses no extra memory.
    • howl: the Howl container subdivides a given memory range into subranges where any of the other containers describing that sub-range of the heap may be stored in. This is somewhat similar to the idea suggested in JDK-8048504.
  • care has been taken to minimize container memory usage, e.g. by not adding redundant information there and in general carefully specify them. They have been designed with future enhancements in mind.

In some benchmarks (where there is significant remembered set memory usage) we are seeing memory reduction to 25% of JDK 16 levels with this change. Garbage collection times are at most as long or shorter than before; most changes affecting pause times have been extracted earlier. Individiual affected phases are generally shorter now.

Testing: tier1-8 many times, manual and automated perf testing


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8017163: G1: Refactor remembered sets
  • JDK-8048504: G1: Investigate replacing the coarse and fine grained data structures in the remembered sets
  • JDK-6949259: G1: Merge sparse and fine remembered set hash tables

Reviewers

Contributors

  • Ivan Walulya <iwalulya@openjdk.org>
  • Thomas Schatzl <tschatzl@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4116/head:pull/4116
$ git checkout pull/4116

Update a local copy of the PR:
$ git checkout pull/4116
$ git pull https://git.openjdk.java.net/jdk pull/4116/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4116

View PR using the GUI difftool:
$ git pr show -t 4116

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4116.diff

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented May 19, 2021

/contributor add iwalulya

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 19, 2021

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented May 19, 2021

@tschatzl
Contributor Ivan Walulya <iwalulya@openjdk.org> successfully added.

@openjdk
Copy link

@openjdk openjdk bot commented May 19, 2021

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label May 19, 2021
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented May 19, 2021

/issue add JDK-8048504
/issue add JDK-6949259

@openjdk
Copy link

@openjdk openjdk bot commented May 19, 2021

@tschatzl
Adding additional issue to issue list: 8048504: G1: Investigate replacing the coarse and fine grained data structures in the remembered sets.

@tschatzl tschatzl changed the title 8017163: Refactor remembered set 8017163: G1: Refactor remembered set May 19, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 19, 2021

@tschatzl
Adding additional issue to issue list: 6949259: G1: Merge sparse and fine remembered set hash tables.

@openjdk openjdk bot changed the title 8017163: G1: Refactor remembered set 8017163: G1: Refactor remembered sets May 19, 2021
@tschatzl tschatzl marked this pull request as ready for review May 20, 2021
@openjdk openjdk bot added the rfr label May 20, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 20, 2021

src/hotspot/share/gc/g1/g1CardSet.cpp Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Show resolved Hide resolved
Copy link
Contributor

@kstefanj kstefanj left a comment

Just a few comments to get this going.

src/hotspot/share/gc/g1/g1CardSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSetContainers.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSetContainers.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSetContainers.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSetFreeMemoryTask.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CollectedHeap.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1GCPhaseTimes.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1RemSet.cpp Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented May 26, 2021

@tschatzl this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout submit/8017163-refactor-remembered-set
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict label May 26, 2021
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented May 27, 2021

Fyi on the "Remove prefetching of log buffers" commit: testing on one particular machine showed that for some reason performance decreased to baseline (jdk17 levels).
I.e. when I factored out the changes in JDK-8266821 I extended the prefetching from the merge remset phase only to also include the log buffers. This showed (minimal absolute) improvements on pause time for that phase, but overall after merging back that change seems to slow down mixed gcs on that machine.
Removing the prefetching for the log buffers improves pause times again quite a bit below jdk17/mainline levels.

I can extract this particular change again if desired.

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented May 27, 2021

Another note, during development of this feature there obviously were optimizations and improvements that did not make the cut. I labelled them with a gc-g1-remset label in the bug tracker. Feel free to add or suggest other things.

@openjdk openjdk bot removed the merge-conflict label May 31, 2021
Copy link
Contributor

@kstefanj kstefanj left a comment

A few more comments. Been looking more at the code in my IDE now as well and I think it looks good. I haven't looked closely at the tests yet, but very nice that you added those. I will look at this today or tomorrow.

A general question about the testing? Have you done any testing with VerifyRememberedSets turned on?

src/hotspot/share/gc/g1/g1CardSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSetContainers.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1RemSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1RemSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1ServiceThread.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/heapRegionRemSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/heapRegion.cpp Outdated Show resolved Hide resolved
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Jun 2, 2021

/contributor add tschatzl

@openjdk
Copy link

@openjdk openjdk bot commented Jun 2, 2021

@tschatzl
Contributor Thomas Schatzl <tschatzl@openjdk.org> successfully added.

Copy link
Contributor

@kstefanj kstefanj left a comment

Took a closer look at the new tests and overall they look good, just a couple of small comments.

test/hotspot/gtest/gc/g1/test_g1CardSet.cpp Outdated Show resolved Hide resolved
const double FullCardSetThreshold = 1.0;
const uint BitmapCoarsenThreshold = 1.0;
Copy link
Contributor

@kstefanj kstefanj Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to run this test with a few different config thresholds? To test the different levels of the card-set. If I understand those thresholds correct this card-set will never consider a region to be coarsend or full. I get that the accounting might turn into everything being "found" rather than added, but might be worth testing.

Copy link
Contributor Author

@tschatzl tschatzl Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that test randomly adds cards, it is very hard to calculate the expected number of cards for verification if we do not know where the coarsening exactly happens. This is very complicated to test, and other tests already test the coarsening, although not in an MT context, so I would like to not spend the time for either a brittle or useless test.

@openjdk
Copy link

@openjdk openjdk bot commented Jun 15, 2021

⚠️ @tschatzl This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

Copy link
Contributor

@kstefanj kstefanj left a comment

Looked through the changes again and I think they are good. As we have all of JDK 18 to test, polish it and fix any potential problems I see no reason to not approve this now.

I found a few unused functions, please remove them unless you have some future plans for any of them.

src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.hpp Show resolved Hide resolved
src/hotspot/share/gc/g1/heapRegionRemSet.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/heapRegionRemSet.hpp Outdated Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented Jun 17, 2021

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8017163: G1: Refactor remembered sets
8048504: G1: Investigate replacing the coarse and fine grained data structures in the remembered sets
6949259: G1: Merge sparse and fine remembered set hash tables

Co-authored-by: Ivan Walulya <iwalulya@openjdk.org>
Co-authored-by: Thomas Schatzl <tschatzl@openjdk.org>
Reviewed-by: sjohanss, iwalulya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • f4d20b2: 8268900: com/sun/net/httpserver/Headers.java: Fix indentation and whitespace

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jun 17, 2021
Copy link
Member

@walulyai walulyai left a comment

Looks good!

A few minor nits.

src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1CardSet.cpp Outdated Show resolved Hide resolved
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Jun 21, 2021

Thanks @kstefanj @walulyai for your reviews.

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Jun 21, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Jun 21, 2021

Going to push as commit 1692fd2.
Since your change was applied there have been 27 commits pushed to the master branch:

  • 0b8a0e2: 8266082: AssertionError in Annotate.fromAnnotations with -Xdoclint
  • b7d78a5: Merge
  • b8f073b: 8268316: Typo in JFR jdk.Deserialization event
  • b9d7337: 8268638: semaphores of AsyncLogWriter may be broken when JVM is exiting.
  • 8caeca0: 8264775: ClhsdbFindPC still fails with java.lang.RuntimeException: 'In java stack' missing from stdout/stderr
  • 7e03cf2: 8265073: XML transformation and indentation when using xml:space
  • 60389ee: 8269025: jsig/Testjsig.java doesn't check exit code
  • dab00ee: 8266518: Refactor and expand scatter/gather tests
  • f9c8c1c: 8268903: JFR: RecordingStream::dump is missing @SInCE
  • d8a0582: 8265369: [macos-aarch64] java/net/MulticastSocket/Promiscuous.java failed with "SocketException: Cannot allocate memory"
  • ... and 17 more: https://git.openjdk.java.net/jdk/compare/a051e735cda0d5ee5cb6ce0738aa549a7319a28c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jun 21, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Jun 21, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jun 21, 2021

@tschatzl Pushed as commit 1692fd2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8017163-refactor-remembered-set branch Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot integrated
3 participants