Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272773: Configurable card table card size #5838

Closed
wants to merge 12 commits into from
Closed
@@ -91,11 +91,17 @@ void HeapRegion::setup_heap_region_size(size_t max_heap_size) {
guarantee(GrainWords == 0, "we should only set it once");
GrainWords = GrainBytes >> LogHeapWordSize;

// Initialize card size based on the region size.
// Maximum no. of cards per region is 2^16.
CardTable::initialize_card_size(1 << (region_size_log - 16));
Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a constant out of the 16; the reason is the size of the G1CardSetArray::EntryDataType being 16 bit.

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question we need to answer is whether smaller card size than 512 gains anything - if not, I recommend keeping to use 512 at minimum, as otherwise the card table will obviously use much more memory for no gain.

Copy link
Contributor Author

@vish-chan vish-chan Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to identify the usecases where we can see better performance with small card sizes. Right now, I have none. So, I'll set the minimum card size to 512 right now. Intention is to have a design such that if someone finds a usecase which gives better performance with smaller cards, only the minimum card size needs to be changed in the code, nothing else. Shall I keep this current behavior of overriding card_size based on region size till there's some feedback from others? or Do I need to change this?

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the minimum card size to 512 for now, this can be investigated further separately. As you said, this can be changed easily.

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please do not scale card table entry size with region size - there are some applications (many references, large remembered sets; e.g. the one in JDK-8152438) that with a 1024 card table size show 30% longer pause times. (Actually, they benefit a lot from smaller card table entry size, i.e. they get around the same pause time improvement with e.g. 256 byte sized card table entry size if heap region size permits).

Copy link
Contributor Author

@vish-chan vish-chan Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm not scaling card size with region size, as it is not required. With current supported region sizes (1M to 32M) and card sizes (512b, 1024B), all the possible combinations of region size and card size are valid combinations. If we include 256b cards or 64M regions, then invalid combinations will be there which would need to be handled.
Do you think that code/logic is needed now, or we can add if and when needed?

Copy link
Contributor

@tschatzl tschatzl Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's leave it at that for now.

Note that all that kept me thinking over the weekend, and I found an easy way (the remembered set refactoring in 18 made that possible) to completely decouple heap region size from remembered sets.

So soon we'll be able to have heap region sizes > 32M without any restrictions. Same for card table sizes, they can be set independently of region size now (although 1024 is max as long as we fix BOT and card table size to the same).

Then we can re-introduce the option to allow smaller (probably 256, maybe 128) card table sizes.

Copy link
Contributor Author

@vish-chan vish-chan Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, sounds good.


guarantee(CardsPerRegion == 0, "we should only set it once");
CardsPerRegion = GrainBytes >> G1CardTable::card_shift;

LogCardsPerRegion = log2i(CardsPerRegion);

assert(LogCardsPerRegion <= 16, "Total cards per region should be less than or equal to 2^16");
vish-chan marked this conversation as resolved.
Show resolved Hide resolved

if (G1HeapRegionSize != GrainBytes) {
FLAG_SET_ERGO(G1HeapRegionSize, GrainBytes);
}
@@ -31,11 +31,21 @@
#include "services/memTracker.hpp"
#include "utilities/align.hpp"

uint ObjectStartArray::block_shift = 0;
uint ObjectStartArray::block_size = 0;
uint ObjectStartArray::block_size_in_words = 0;

void ObjectStartArray::initialize_block_size(uint card_shift) {
block_shift = card_shift;
block_size = 1 << block_shift;
block_size_in_words = block_size / sizeof(HeapWord);
}

void ObjectStartArray::initialize(MemRegion reserved_region) {
// We're based on the assumption that we use the same
// size blocks as the card table.
assert((int)block_size == (int)CardTable::card_size, "Sanity");
assert((int)block_size <= 512, "block_size must be less than or equal to 512");
assert((int)block_size <= 1024, "block_size must be less than or equal to 1024");
vish-chan marked this conversation as resolved.
Show resolved Hide resolved

// Calculate how much space must be reserved
_reserved_region = reserved_region;
@@ -52,11 +52,12 @@ class ObjectStartArray : public CHeapObj<mtGC> {
clean_block = -1
};

enum BlockSizeConstants {
block_shift = 9,
block_size = 1 << block_shift,
block_size_in_words = block_size / sizeof(HeapWord)
};
static uint block_shift;
static uint block_size;
static uint block_size_in_words;
Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming guideline for static members is the same as for regular class members, i.e. use an underscore in front. Maybe wait on changing this for another person to comment on (I did not look up the style guide).
We do not use public static members too often :)

Copy link
Contributor Author

@vish-chan vish-chan Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've kept the names of all the pre-existing constants (which are now variables) same as before. Otherwise we have to change all the references also.

Copy link
Contributor

@tschatzl tschatzl Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what the second reviewer thinks. Otherwise this can be done separately as it might hide the interesting part of this change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a review (yet), just a response to these questions.

Public data members are very rare in HotSpot code. There are a few POD/standard layout classes (examples include G1BufferNodeList and OopStorage::BasicParState::IterationData), but otherwise accessor functions are preferred. The style-guide says
"Use public accessor functions for member variables accessed outside the class."

The style-guide says this:
"Class data member names have a leading underscore, and use lowercase with words separated by a single underscore (_foo_bar)."
It makes no distinction for public or not, or for static or not. That is intentional, based on some discussion about that question when I was updating the style-guide a while ago.

Of course, there are probably counter-examples to both of these lurking in the code base.

Copy link
Contributor

@tschatzl tschatzl Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this introduction of getters were moved to an extra CR, but I'm good to do that now as well; not so much a problem with the block_* ones, more with the ones in CardTable that add lots of noise.


// Initialize block size based on card size
static void initialize_block_size(uint card_shift);

protected:

@@ -97,6 +97,8 @@ static size_t default_gen_alignment() {
}

void ParallelArguments::initialize_alignments() {
// Initialize card size before initializing alignments
CardTable::initialize_card_size();
SpaceAlignment = GenAlignment = default_gen_alignment();
HeapAlignment = compute_heap_alignment();
}
@@ -33,6 +33,18 @@
#include "runtime/java.hpp"
#include "services/memTracker.hpp"

uint BOTConstants::LogN = 0;
uint BOTConstants::LogN_words = 0;
uint BOTConstants::N_bytes = 0;
uint BOTConstants::N_words = 0;

void BOTConstants::initialize_bot_size(uint card_shift) {
LogN = card_shift;
LogN_words = LogN - LogHeapWordSize;
N_bytes = 1 << LogN;
N_words = 1 << LogN_words;
}

//////////////////////////////////////////////////////////////////////
// BlockOffsetSharedArray
//////////////////////////////////////////////////////////////////////
@@ -49,16 +49,20 @@ class ContiguousSpace;

class BOTConstants : public AllStatic {
public:
static const uint LogN = 9;
static const uint LogN_words = LogN - LogHeapWordSize;
static const uint N_bytes = 1 << LogN;
static const uint N_words = 1 << LogN_words;
static uint LogN;
static uint LogN_words;
static uint N_bytes;
static uint N_words;

// entries "e" of at least N_words mean "go back by Base^(e-N_words)."
// All entries are less than "N_words + N_powers".
static const uint LogBase = 4;
static const uint Base = (1 << LogBase);
static const uint N_powers = 14;

// Initialize bot size based on card size
static void initialize_bot_size(uint card_shift);

static size_t power_to_cards_back(uint i) {
return (size_t)1 << (LogBase * i);
}
@@ -24,15 +24,51 @@

#include "precompiled.hpp"
#include "gc/shared/cardTable.hpp"
#include "gc/parallel/objectStartArray.hpp"
#include "gc/shared/collectedHeap.hpp"
#include "gc/shared/space.inline.hpp"
#include "logging/log.hpp"
#include "memory/virtualspace.hpp"
#include "runtime/java.hpp"
#include "runtime/os.hpp"
#include "runtime/globals_extension.hpp"
Copy link
Contributor

@tschatzl tschatzl Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically include the globals include that contains the flag - this should probably be #include "gc/shared/gc_globals.hpp (and sorted correctly in this list).

#include "services/memTracker.hpp"
#include "utilities/align.hpp"

uintx CardTable::card_shift = 0;
uintx CardTable::card_size = 0;
uintx CardTable::card_size_in_words = 0;

void CardTable::initialize_card_size(uintx min_card_size) {
assert(UseG1GC || UseParallelGC || UseSerialGC,
"Initialize card size should only be called by card based collectors.");

// GCCardSizeInBytes is rounded off to the nearest power of 2, and clamped
// between min and max card sizes
card_size = GCCardSizeInBytes;
vish-chan marked this conversation as resolved.
Show resolved Hide resolved
card_size = round_up_power_of_2(card_size);
card_size = clamp(card_size, MAX2(min_card_size,card_size_min), card_size_max);
vish-chan marked this conversation as resolved.
Show resolved Hide resolved
card_shift = log2i_exact(card_size);
card_size = 1 << card_shift;
vish-chan marked this conversation as resolved.
Show resolved Hide resolved
card_size_in_words = card_size / sizeof(HeapWord);

// Set blockOffsetTable size based on card table entry size
BOTConstants::initialize_bot_size(card_shift);

// Set ObjectStartArray block size based on card table entry size
ObjectStartArray::initialize_block_size(card_shift);

if (GCCardSizeInBytes != card_size) {
FLAG_SET_ERGO(GCCardSizeInBytes, card_size);
}

Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also be unnecessary with a constraint function that limits the values of the option.

Copy link
Contributor Author

@vish-chan vish-chan Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLAG_SET_ERGO is needed for G1 GC. In case of G1, min card size is determined by the region size.
For the following cmdline arguments: -XX:G1HeapRegionSize=32M -XX:GCCardSizeInBytes=256
---- card_size is ergonomically set to 512 instead of 256, as min. card size for 32M region is 512.

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this question out for now: I am not sure it makes sense to allow a card size of 256; however the common reaction when specifying incompatible command line options would be to bail out.
The question is what to do if the user selects GCCardSizeInBytes=256 and G1 ergonomically decides to use 32m regions. Silently use 16m regions, which are very likely worse than 32m regions in that case?
That is why we should first determine if card size of 256 makes sense somewhere and then decide (or somebody else has a particular opinion on this).

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That same question will come up if we allow 64m regions later, but the step from 64->32m may not be that big. Another option would be to go away from the 1:1 mapping of cards in the remembered set and cards in the card table. This could probably be fixed, but is a completely different issue of course.

Copy link
Contributor Author

@vish-chan vish-chan Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this current implementation, with G1 GC, card_size is initialized after determining the region_size inside setup_heap_region_size(). card_size value is determined as MAX(GCCardSizeInByte, minimum card size for the region size). So, if user selects GCCardSizeInBytes=256 and G1 ergonomically decides to use 32M regions, card_size is silently set to 512b. And, if user selects GCCardSizeInBytes=1024 and G1 ergonomically decides to use 32M regions, card_size stays at 1024b. In case user doesn't specify any card_size, default value of 512b is used. We can set it ergonomically, but thats a completely new problem.
With Parallel and Serial GC, we directly use GCCardSizeInBytes, as there's no dependency.

Copy link
Contributor

@tschatzl tschatzl Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the code, but the problems are elsewhere:

  • this restriction is G1 only, but the code handling this situation is in common code (in the common cardTable.cpp, not in e.g. g1CardTable*). I do not have an extremely strong opinion about this at the moment.
  • the code overrides a user decision silently, which is not a good idea imho. The typical solution is that other settings that the user did not explicitly specified try to adapt to these. However in this case, doing that would be in almost all cases disadvantageous (I guess). Let's see what other people think about this.

log_info(gc, barrier)("CardTable entry size: " UINTX_FORMAT, card_size);
vish-chan marked this conversation as resolved.
Show resolved Hide resolved
}

void CardTable::initialize_card_size() {
initialize_card_size(card_size_min);
}

size_t CardTable::compute_byte_map_size() {
assert(_guard_index == cards_required(_whole_heap.word_size()) - 1,
"uninitialized, check declaration order");
@@ -41,6 +77,7 @@ size_t CardTable::compute_byte_map_size() {
return align_up(_guard_index + 1, MAX2(_page_size, granularity));
}


CardTable::CardTable(MemRegion whole_heap) :
_whole_heap(whole_heap),
_guard_index(0),
@@ -56,8 +93,7 @@ CardTable::CardTable(MemRegion whole_heap) :
{
assert((uintptr_t(_whole_heap.start()) & (card_size - 1)) == 0, "heap must start at card boundary");
assert((uintptr_t(_whole_heap.end()) & (card_size - 1)) == 0, "heap must end at card boundary");

assert(card_size <= 512, "card_size must be less than 512"); // why?
assert(card_size >= card_size_min && card_size <= card_size_max, "card_size must be between min and max");
vish-chan marked this conversation as resolved.
Show resolved Hide resolved
}

CardTable::~CardTable() {
@@ -428,7 +464,8 @@ MemRegion CardTable::dirty_card_range_after_reset(MemRegion mr,
}

uintx CardTable::ct_max_alignment_constraint() {
return card_size * os::vm_page_size();
// CardTable max alignment is computed with card_size_max
return card_size_max * os::vm_page_size();
Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change causes some tests to fail because their heap will get too large. Maybe it is possible to move code so that the actual card size value is used here to avoid this, or fix the errors (will be visible with enabled github actions).

Copy link
Contributor Author

@vish-chan vish-chan Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any errors after enabling the tests via github actions. Am I missing something here?

Copy link
Contributor

@tschatzl tschatzl Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re-check.

}

void CardTable::verify_guard() {
@@ -228,17 +228,25 @@ class CardTable: public CHeapObj<mtGC> {
MemRegion dirty_card_range_after_reset(MemRegion mr, bool reset,
int reset_val);

// Constants
enum SomePublicConstants {
card_shift = 9,
card_size = 1 << card_shift,
card_size_in_words = card_size / sizeof(HeapWord)
};
// CardTable entry size
static uintx card_shift;
static uintx card_size;
static uintx card_size_in_words;
vish-chan marked this conversation as resolved.
Show resolved Hide resolved

// min and max permissible card sizes
static const uintx card_size_min = 128;
static const uintx card_size_max = 1024;
Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static consts should be CamelCased.


static constexpr CardValue clean_card_val() { return clean_card; }
static constexpr CardValue dirty_card_val() { return dirty_card; }
static intptr_t clean_card_row_val() { return clean_card_row; }

// Initialize card size based on the min_card_size value
static void initialize_card_size(uintx min_card_size);

//Initialize card size based on CardTable::card_size_min
static void initialize_card_size();

// Card marking array base (adjusted for heap low boundary)
// This would be the 0th element of _byte_map, if the heap started at 0x0.
// But since the heap starts at some higher address, this points to somewhere
@@ -692,9 +692,12 @@
product(uintx, GCDrainStackTargetSize, 64, \
"Number of entries we will try to leave on the stack " \
"during parallel gc") \
range(0, max_juint)

// end of GC_FLAGS
range(0, max_juint) \
\
product(uintx, GCCardSizeInBytes, 512, \
"Card table entry size (in bytes) for card based collectors") \
range(128, 1024) \
// end of GC_FLAGS
Copy link
Contributor

@tschatzl tschatzl Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g. MaxTenuringThreshold for how adding a constraint function works.


DECLARE_FLAGS(GC_FLAGS)

@@ -25,6 +25,7 @@
#include "precompiled.hpp"
#include "gc/shared/genArguments.hpp"
#include "gc/shared/generation.hpp"
#include "gc/shared/cardTable.hpp"
#include "logging/log.hpp"
#include "runtime/globals_extension.hpp"
#include "runtime/java.hpp"
@@ -61,6 +62,8 @@ static size_t bound_minus_alignment(size_t desired_size,
}

void GenArguments::initialize_alignments() {
// Initialize card size before initializing alignments
CardTable::initialize_card_size();
SpaceAlignment = GenAlignment = (size_t)Generation::GenGrain;
HeapAlignment = compute_heap_alignment();
}