-
Notifications
You must be signed in to change notification settings - Fork 6k
8017163: G1: Refactor remembered sets #4116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8017163: G1: Refactor remembered sets #4116
Conversation
/contributor add iwalulya |
👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into |
@tschatzl |
/issue add JDK-8048504 |
@tschatzl |
@tschatzl |
…tomic::sub() via cmpxchg-loop
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments to get this going.
@tschatzl this pull request can not be integrated into git checkout submit/8017163-refactor-remembered-set
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Fyi on the "Remove prefetching of log buffers" commit: testing on one particular machine showed that for some reason performance decreased to baseline (jdk17 levels). I can extract this particular change again if desired. |
Another note, during development of this feature there obviously were optimizations and improvements that did not make the cut. I labelled them with a gc-g1-remset label in the bug tracker. Feel free to add or suggest other things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments. Been looking more at the code in my IDE now as well and I think it looks good. I haven't looked closely at the tests yet, but very nice that you added those. I will look at this today or tomorrow.
A general question about the testing? Have you done any testing with VerifyRememberedSets
turned on?
…ies have never been used before, just taking a small amount of memory)
/contributor add tschatzl |
@tschatzl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a closer look at the new tests and overall they look good, just a couple of small comments.
const double FullCardSetThreshold = 1.0; | ||
const uint BitmapCoarsenThreshold = 1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to run this test with a few different config thresholds? To test the different levels of the card-set. If I understand those thresholds correct this card-set will never consider a region to be coarsend or full. I get that the accounting might turn into everything being "found" rather than added, but might be worth testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that test randomly adds cards, it is very hard to calculate the expected number of cards for verification if we do not know where the coarsening exactly happens. This is very complicated to test, and other tests already test the coarsening, although not in an MT context, so I would like to not spend the time for either a brittle or useless test.
…jdk into 8017163-refactor-remembered-set
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through the changes again and I think they are good. As we have all of JDK 18 to test, polish it and fix any potential problems I see no reason to not approve this now.
I found a few unused functions, please remove them unless you have some future plans for any of them.
@tschatzl This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
A few minor nits.
/integrate |
Going to push as commit 1692fd2.
Your commit was automatically rebased without conflicts. |
Hi all,
can I have reviews for this change that significantly refactors the remembered set for more scalability.
The current G1 remembered set implementation has been designed for use cases and Java heaps and applications from 20 years ago.
Over time many problems with performance and in particular memory usage have been observed:
adding elements to the lowest tier data structure takes a per-remembered set global lock. Measurements have shown that the applications can wait thousands of seconds acquiring these locks. While the affected threads are in most cases refinement threads so does not directly affect the application, it can still affect the ability of G1 to meet some goals needed for keeping pause times (i.e. amount of cards from the refinement buffers to be merged into the card table and then scanned during gc).
there is a substantial memory overhead for managing the data structures: examples are
inflexibility when reusing memory: in the current implementation the different containers use different approaches to manage memory. Most use the C heap directly, some the C heap with some internal global memory pool. This in practice makes it very difficult to implement anything other than giving back memory in the collection pause. The corresponding "Free Collection Set" pause can take a significant amount of time because of that.
Also memory reuse is limited and preallocating arenas is limited (or would have to be reimplemented multiple times), stressing the C heap allocator.
inability to support additional use cases: over time interesting ideas (e.g. JDK-8058803) came up for improving performance of remembered set management. Mostly due to redundant information everywhere and completely different handling of various aspects in the containers it is in practice impossible to implement these.
(partial) inability to give back memory to the OS. While some of the containers use the C heap allocator, and so in some way give back memory, these implementations and handling is different for every container.
the existing granularity of containers are unbalanced: currently there exist three tiers: "sparse", "fine" and "full". Sparse is an array of cards ranging in the hundreds maybe, "fine" is a bitmap covering a whole region and full is a bit indicating that that region should be scanned completely during GC.
The problem is that there is nothing between "no card at all" and "sparse" and in particular the difference between the capability to hold entries of "sparse" and "fine". I.e. memory usage difference when exceeding a "sparse" array (holding 128 entries at 32M regions, taking ~256 bytes) to fine that is able to hold 65k entries using 8kB is significant.
For these reason there is even a dedicated option to stop allocating more "fine" containers and just give up and use "full" instead to avoid excessive memory usage. With extremely bad consequences in pause times.
Over time some of these issues have been fixed or in many cases band-aided, and some of these fixes and ideas were the result of working on this change (e.g. JDK-8262185, JDK-8233919, JDK-8213108).
This change is effectively a rewrite of the Java heap card based part of a region's remembered set.
This initial fully working change can be roughly described with the following properties:
use a single
ConcurrentHashTable
for the card containers of a given region. The container in use replaced (coarsened) on the fly within the CHT node, completely lock-free. This implements JDK-6949259.memory for a given region's remembered set for all containers (and the CHT nodes) is backed by per container type and per remembered set arena style bump-pointer allocation buffers. In this change, in the pause, memory is given back to free lists only. The implementation gives back memory to the OS concurrently to the application. Memory is still managed using the C heap memory manager though, but abstracted away and could be replaced by manual page memory management.
there are now four different container types and one meta-container type. These four actual containers are:
care has been taken to minimize container memory usage, e.g. by not adding redundant information there and in general carefully specify them. They have been designed with future enhancements in mind.
In some benchmarks (where there is significant remembered set memory usage) we are seeing memory reduction to 25% of JDK 16 levels with this change. Garbage collection times are at most as long or shorter than before; most changes affecting pause times have been extracted earlier. Individiual affected phases are generally shorter now.
Testing: tier1-8 many times, manual and automated perf testing
Progress
Issues
Reviewers
Contributors
<iwalulya@openjdk.org>
<tschatzl@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4116/head:pull/4116
$ git checkout pull/4116
Update a local copy of the PR:
$ git checkout pull/4116
$ git pull https://git.openjdk.java.net/jdk pull/4116/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4116
View PR using the GUI difftool:
$ git pr show -t 4116
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4116.diff