New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8272773: Configurable card table card size #5838
Conversation
|
@vish-chan The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Hi Vish-chan,
thanks for your contribution.
Could you configure Github actions (see the green box where it says "Testing is not configured") ? This will uncover the issues found when trying to compile it for performance testing.
This performance testing will take bit.
I am also curious why you enabled minimum card size of 128 bytes? Did you see any advantages in your testing on small heaps?
Also, the other comments are based on a very passing look on the code.
Thanks,
Thomas
@@ -91,11 +91,17 @@ void HeapRegion::setup_heap_region_size(size_t max_heap_size) { | |||
guarantee(GrainWords == 0, "we should only set it once"); | |||
GrainWords = GrainBytes >> LogHeapWordSize; | |||
|
|||
// Initialize card size based on the region size. | |||
// Maximum no. of cards per region is 2^16. | |||
CardTable::initialize_card_size(1 << (region_size_log - 16)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make a constant out of the 16
; the reason is the size of the G1CardSetArray::EntryDataType
being 16 bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question we need to answer is whether smaller card size than 512 gains anything - if not, I recommend keeping to use 512 at minimum, as otherwise the card table will obviously use much more memory for no gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to identify the usecases where we can see better performance with small card sizes. Right now, I have none. So, I'll set the minimum card size to 512 right now. Intention is to have a design such that if someone finds a usecase which gives better performance with smaller cards, only the minimum card size needs to be changed in the code, nothing else. Shall I keep this current behavior of overriding card_size based on region size till there's some feedback from others? or Do I need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the minimum card size to 512 for now, this can be investigated further separately. As you said, this can be changed easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please do not scale card table entry size with region size - there are some applications (many references, large remembered sets; e.g. the one in JDK-8152438) that with a 1024 card table size show 30% longer pause times. (Actually, they benefit a lot from smaller card table entry size, i.e. they get around the same pause time improvement with e.g. 256 byte sized card table entry size if heap region size permits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I'm not scaling card size with region size, as it is not required. With current supported region sizes (1M to 32M) and card sizes (512b, 1024B), all the possible combinations of region size and card size are valid combinations. If we include 256b cards or 64M regions, then invalid combinations will be there which would need to be handled.
Do you think that code/logic is needed now, or we can add if and when needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's leave it at that for now.
Note that all that kept me thinking over the weekend, and I found an easy way (the remembered set refactoring in 18 made that possible) to completely decouple heap region size from remembered sets.
So soon we'll be able to have heap region sizes > 32M without any restrictions. Same for card table sizes, they can be set independently of region size now (although 1024 is max as long as we fix BOT and card table size to the same).
Then we can re-introduce the option to allow smaller (probably 256, maybe 128) card table sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, sounds good.
@@ -428,7 +464,8 @@ MemRegion CardTable::dirty_card_range_after_reset(MemRegion mr, | |||
} | |||
|
|||
uintx CardTable::ct_max_alignment_constraint() { | |||
return card_size * os::vm_page_size(); | |||
// CardTable max alignment is computed with card_size_max | |||
return card_size_max * os::vm_page_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change causes some tests to fail because their heap will get too large. Maybe it is possible to move code so that the actual card size value is used here to avoid this, or fix the errors (will be visible with enabled github actions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any errors after enabling the tests via github actions. Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll re-check.
static const uintx card_size_min = 128; | ||
static const uintx card_size_max = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static consts should be CamelCased.
product(uintx, GCCardSizeInBytes, 512, \ | ||
"Card table entry size (in bytes) for card based collectors") \ | ||
range(128, 1024) \ | ||
// end of GC_FLAGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See e.g. MaxTenuringThreshold
for how adding a constraint function works.
static uint block_shift; | ||
static uint block_size; | ||
static uint block_size_in_words; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming guideline for static members is the same as for regular class members, i.e. use an underscore in front. Maybe wait on changing this for another person to comment on (I did not look up the style guide).
We do not use public static members too often :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I've kept the names of all the pre-existing constants (which are now variables) same as before. Otherwise we have to change all the references also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what the second reviewer thinks. Otherwise this can be done separately as it might hide the interesting part of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a review (yet), just a response to these questions.
Public data members are very rare in HotSpot code. There are a few POD/standard layout classes (examples include G1BufferNodeList and OopStorage::BasicParState::IterationData), but otherwise accessor functions are preferred. The style-guide says
"Use public accessor functions for member variables accessed outside the class."
The style-guide says this:
"Class data member names have a leading underscore, and use lowercase with words separated by a single underscore (_foo_bar)."
It makes no distinction for public or not, or for static or not. That is intentional, based on some discussion about that question when I was updating the style-guide a while ago.
Of course, there are probably counter-examples to both of these lurking in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if this introduction of getters were moved to an extra CR, but I'm good to do that now as well; not so much a problem with the block_*
ones, more with the ones in CardTable
that add lots of noise.
|
The change looks good now (keeping in mind that for some questions we are waiting for a second reviewer to chime in), however there are some things we need to do first:
-
complete testing; while I have pushed the change through our perf test suite with no statistically significant regressions when using compiled-in 512 bytes card size vs. configurable 512 bytes card size, and ran with different (smaller) card sizes I would like to confirm the improvements you reported for the 1024 bytes case on specjbb2015, and analyze a few of the particular results. This will take a bit.
-
for the new product flag (if we make it product, I am going to ask about this internally) we need a CSR completed before pushing. I started one at https://bugs.openjdk.java.net/browse/JDK-8275142, please have a look.
If we decide to make this non-product, then we can just close the CSR again. -
it would be nice to not be required to amend the CSR later to fix the bounds of the
GCCardSizeInBytes
flag (when it stays a product flag) when G1 supports different heap region sizes > 32M.
There is already a draft PR out for this (#5909), and a branch that contains a quickly merged repo (https://github.com/openjdk/jdk/compare/master...tschatzl:submit/virtualize-g1card-config-card-set-size?expand=1).
Are there any problems with waiting a bit for this change on PR#5909?
Maybe you are also interested to try out the combination of PR#5909 and this one, allowing card table entry sizes from 128 to 1024 and any combination of region size iirc.
In PR#5909 there is also the question of how large regions G1 should allow with that - the patch currently allows up to 512M regions, but theoretically there is no limit - do you have any opinion on that?
Thanks,
Thomas
// This maximum is derived from that we need an extra bit for possible | ||
// offsets in the byte for backskip values (this is a hard limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This maximum is derived from that we need an extra bit for possible | |
// offsets in the byte for backskip values (this is a hard limit) | |
// Maximum size an offset table entry can cover. This maximum is derived from that we need an extra bit for possible | |
// offsets in the byte for backskip values, leaving 2^7 possible offsets. Mininum object alignment is 8 bytes (2^3), so we can at most represent 2^10 offsets within a BOT value. |
(We could make this maximum dependent on ObjectAlignmentInBytes
, but I do not know if increasing ObjectAlignmentInBytes gives any advantage. So at most defer this investigation to some other time as I'm already keeping busy some machines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is interesting to make this size depend on current ObjectAlignmentInBytes
after all. I've seen some recent testing where it proved useful. However that may be done as a separate change.
It also seems that reducing the size of the BOT/card table (increasing this maximum) has diminishing returns e.g. wrt to Card Table Clear times.
Maybe it is interesting for you to try out even larger card sizes though (e.g. an ObjectAlignmentInBytes
value of 16
should allow a card size of 2048 and so on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll explore this and share the results.
Fwiw, we in the gc team agreed that this flag should be a product flag so that hopefully people will use it. So we need that CSR before integrating it :) Hth, |
@@ -75,6 +76,7 @@ class G1CardTable : public CardTable { | |||
|
|||
static const size_t WordAllClean = SIZE_MAX; | |||
static const size_t WordAllDirty = 0; | |||
static const size_t LogOfMaxCardsPerRegion = BitsPerByte * sizeof(G1CardSetArray::EntryDataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tschatzl Do we need this code once JDK-8275056 (Virtualize G1CardSet containers over heap region) gets integrated? From my understanding, this won't be a restriction from heap region point of view. Do I need to remove this and the changes in g1CardTable.cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The change you will need will be very similar to 70dcca6 (slightly modified because of review comments); but please wait until it is in. No hurry though, we'll need to complete the CSR before we can integrate anyway.
Thomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, got it.
Are there any problems with waiting a bit for this change on PR#5909? Maybe you are also interested to try out the combination of PR#5909 and this one, allowing card table entry sizes from 128 to 1024 and any combination of region size iirc. In PR#5909 there is also the question of how large regions G1 should allow with that - the patch currently allows up to 512M regions, but theoretically there is no limit - do you have any opinion on that? Thanks, |
@vish-chan This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@vish-chan : the change for PR#5909 has finally been integrated. Sorry for the delay, things like a related bug delayed its integration more than expected. |
…ize-configurable
…/jdk into card-size-configurable
@tschatzl I tested PR#5909 and this change: The combination of 1024b card/512M region is giving 2-3% improvement over 1024b card/32M region in SPECjbb2015 on my setup. I haven't analyzed this is detail though. Shall I update the pull request with new patch by:
|
Good that the changed did not optimize away the improvements.
Please do. It is unnecessary now as there are no limitations of the combination of card size/region size for G1 any more.
I'd say keep 512 bytes default. I looked a bit through literature, and the suggestion is that the minimum size used seems 128 bytes. That might be useful for smallish heaps. Thanks, Edit: I messed up with the default value, @vish-chan, please keep 512 default. |
@tschatzl I've updated the PR with a new patch: Removed the dependency of G1 heap region and card size. Set card size in [128. 1024] with 512 as default. |
|
||
// Maximum size an offset table entry can cover. This maximum is derived from that | ||
// we need an extra bit for possible offsets in the byte for backskip values, leaving 2^7 possible offsets. | ||
// Mininum object alignment is 8 bytes (2^3), so we can at most represent 2^10 offsets within a BOT value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Mininum/Minimum/
assert(UseG1GC || UseParallelGC || UseSerialGC, | ||
"Initialize card size should only be called by card based collectors."); | ||
|
||
// Card size is the max. of minimum permissible value and GCCardSizeInBytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment. Remove.
static uint card_size; | ||
static uint card_size_in_words; | ||
|
||
// max permissible card size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/max/Maximum
return JVMFlag::SUCCESS; | ||
if (!is_power_of_2(value)) { | ||
JVMFlag::printError(verbose, | ||
"GCCardSizeInBytes (" UINT32_FORMAT ") must be " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use %u
instead of UINT32_FORMAT
.
if(!(UseG1GC || UseParallelGC || UseSerialGC)) | ||
return JVMFlag::SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this check - if we are doing the constraint check at parse time, the Use***GC
flags have not been set. It does not hurt to do the unnecessary check either even a GC does not support that option.
// Maximum size an offset table entry can cover. This maximum is derived from that | ||
// we need an extra bit for possible offsets in the byte for backskip values, leaving 2^7 possible offsets. | ||
// Mininum object alignment is 8 bytes (2^3), so we can at most represent 2^10 offsets within a BOT value. | ||
static const uint _max_block_size = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static consts should be CamelCased.
static const uint _max_block_size = 1024; | |
static const uint MaxBlockSize = 1024; |
static uint block_shift; | ||
static uint block_size; | ||
static uint block_size_in_words; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if this introduction of getters were moved to an extra CR, but I'm good to do that now as well; not so much a problem with the block_*
ones, more with the ones in CardTable
that add lots of noise.
@@ -428,7 +453,8 @@ MemRegion CardTable::dirty_card_range_after_reset(MemRegion mr, | |||
} | |||
|
|||
uintx CardTable::ct_max_alignment_constraint() { | |||
return card_size * os::vm_page_size(); | |||
// CardTable max alignment is computed with _card_size_max | |||
return _card_size_max * os::vm_page_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the constraint were AtParse
, the code could directly use GCCardSizeInBytes
. Without needing to increase the minimum heap size.
#include "logging/log.hpp" | ||
#include "memory/virtualspace.hpp" | ||
#include "runtime/java.hpp" | ||
#include "runtime/os.hpp" | ||
#include "runtime/globals_extension.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we typically include the globals include that contains the flag - this should probably be #include "gc/shared/gc_globals.hpp
(and sorted correctly in this list).
@@ -26,13 +26,40 @@ | |||
#include "gc/shared/cardTable.hpp" | |||
#include "gc/shared/collectedHeap.hpp" | |||
#include "gc/shared/space.inline.hpp" | |||
#include "gc/shared/gcLogPrecious.hpp" | |||
#include "gc/parallel/objectStartArray.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This include is in the wrong order and should probably be guarded by #if INCLUDE_PARALLELGC
.
Hi @vish-chan - I changed the name of the CR to "Configurable card table card size" because we are done with the investigation :) No, honestly, "investigate xyz" is a bad title for a specification change (CSR) and a release note. |
@tschatzl I've updated the title of the PR and also made code changes as per the review comments. Thanks. |
I will push it through internal testing. Will take a bit. |
@tschatzl Thank you. Regarding JDK-8277372, do we wait for this PR to integrate and then work on this? |
/csr needed |
@tschatzl this pull request will not be integrated until the CSR request JDK-8275142 for issue JDK-8272773 has been approved. |
You can certainly start working on this, based on the current version, I'm assuming not too much will change. Integrating this is conditional on CSR approval (which is certain, but takes some time until reviewed). I do not recommend making a PR already for the JDK-8277372 change yet - these dependent PRs never worked for me :) |
@@ -1660,7 +1660,7 @@ jint G1CollectedHeap::initialize() { | |||
|
|||
// The G1FromCardCache reserves card with value 0 as "invalid", so the heap must not | |||
// start within the first card. | |||
guarantee(heap_rs.base() >= (char*)G1CardTable::card_size, "Java heap must not start within the first card."); | |||
guarantee(heap_rs.base() >= (char*)(uintptr_t)(G1CardTable::card_size), "Java heap must not start within the first card."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about (uintptr_t)heap_rs.base() >= G1CardTable::card_size
? Converting an arbitrary integer to a pointer is implementation-defined.
@@ -428,7 +454,7 @@ MemRegion CardTable::dirty_card_range_after_reset(MemRegion mr, | |||
} | |||
|
|||
uintx CardTable::ct_max_alignment_constraint() { | |||
return card_size * os::vm_page_size(); | |||
return GCCardSizeInBytes * os::vm_page_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why card_size
can't be used here is that card_size
haven't been set when this method is called, correct? If so, please put a comment here.
@@ -90,6 +90,9 @@ void HeapRegion::setup_heap_region_size(size_t max_heap_size) { | |||
guarantee(GrainWords == 0, "we should only set it once"); | |||
GrainWords = GrainBytes >> LogHeapWordSize; | |||
|
|||
// Initialize card size | |||
CardTable::initialize_card_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that CardTable::initialize_card_size()
is called at the beginning of GCArguments::initialize_alignments()
for Serial and Parallel. I wonder if it makes sense to do the same for G1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for having CardTable::initialize_card_size()
here in the initial patch was the dependency between card size and G1 heap region size. Since the dependency is no longer there, we can safely move this initialization to G1Arguments::initialize_alignments
@vish-chan This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 87 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@tschatzl, @albertnetymk) but any other Committer may sponsor as well.
|
Testing tier1-5 in our CI looks good. |
CSR (https://bugs.openjdk.java.net/browse/JDK-8275142) and release note (https://bugs.openjdk.java.net/browse/JDK-8277448) are done. After addressing @albertnetymk 's comments this is ready to go. |
@tschatzl I've added a new patch to address @albertnetymk 's comments |
@tschatzl is there anything else left from my end? |
As the bot suggested:
Then I'll do the sponsoring (/sponsor) if necesasry. |
/integrate |
@vish-chan |
/sponsor |
Going to push as commit 1c215f3.
Your commit was automatically rebased without conflicts. |
@tschatzl @vish-chan Pushed as commit 1c215f3. |
Hi,
Please review the changes to make CardTable entry size configurable. The changes primarily consists of:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5838/head:pull/5838
$ git checkout pull/5838
Update a local copy of the PR:
$ git checkout pull/5838
$ git pull https://git.openjdk.java.net/jdk pull/5838/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5838
View PR using the GUI difftool:
$ git pr show -t 5838
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5838.diff