-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8305895: Implement JEP 450: Compact Object Headers (Experimental) #20677
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@rkennke The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
96ec3c2
to
ad7e13b
Compare
/label remove core-libs |
/jep JEP-450 |
@rkennke |
@rkennke |
@rkennke |
ad7e13b
to
dfe36bd
Compare
dfe36bd
to
ed03217
Compare
Webrevs
|
|
In both aarch64.ad and x86_64.ad, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDS changes look good! Have two style comments but otherwise this makes sense
@@ -344,7 +345,7 @@ void ReadClosure::do_tag(int tag) { | |||
int old_tag; | |||
old_tag = (int)(intptr_t)nextPtr(); | |||
// do_int(&old_tag); | |||
assert(tag == old_tag, "old tag doesn't match"); | |||
assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assert message change a leftover from debugging or is it meant to be this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in.
@@ -670,8 +672,19 @@ void ArchiveBuilder::make_shallow_copy(DumpRegion *dump_region, SourceObjInfo* s | |||
SystemDictionaryShared::validate_before_archiving(InstanceKlass::cast(klass)); | |||
dump_region->allocate(sizeof(address)); | |||
} | |||
// Allocate space for the future InstanceKlass with proper alignment | |||
const size_t alignment = | |||
#ifdef _LP64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch.
Only one comment about the interpreter code (affecting 4 locations).
__ sub(r3, r3, oopDesc::base_offset_in_bytes()); | ||
} else { | ||
__ sub(r3, r3, sizeof(oopDesc)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something that could be buggy if we're not careful. We had a pass where we cleaned up sizeof(oopDesc) once. Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name?
} else { | ||
__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx); | ||
NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc).
if (is_class) { | ||
assert(word_size >= (sizeof(Klass)/BytesPerWord), "weird size for klass: %zu", word_size); | ||
result = class_space_arena()->allocate(word_size, wastage); | ||
} else { | ||
return non_class_space_arena()->allocate(word_size); | ||
result = non_class_space_arena()->allocate(word_size, wastage); | ||
} | ||
if (wastage.is_nonempty()) { | ||
non_class_space_arena()->deallocate(wastage); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace.
The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here.
if (is_class) { | ||
assert(word_size >= (sizeof(Klass)/BytesPerWord), "weird size for klass: %zu", word_size); | ||
result = class_space_arena()->allocate(word_size, wastage); | ||
} else { | ||
return non_class_space_arena()->allocate(word_size); | ||
result = non_class_space_arena()->allocate(word_size, wastage); | ||
} | ||
if (wastage.is_nonempty()) { | ||
non_class_space_arena()->deallocate(wastage); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also assert or be condionalized on UseCompactObjectHeaders.
@@ -778,6 +796,7 @@ void Metaspace::global_initialize() { | |||
Metaspace::initialize_class_space(rs); | |||
|
|||
// Set up compressed class pointer encoding. | |||
// In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this comment is here. Seems out of place.
add_block(p + requested_word_size, waste); | ||
} | ||
} | ||
return p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answers my prior question. The waste is added back to the block list for non-class-arenas as well.
#define METABLOCKFORMAT "block (@" PTR_FORMAT " word size " SIZE_FORMAT ")" | ||
#define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() | ||
|
||
} // namespace metaspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better.
bool MetaspaceArena::is_valid_area(MetaWord* p, size_t word_size) const { | ||
assert(p != nullptr && word_size > 0, "Sanity"); | ||
// Returns true if the given block is contained in this arena | ||
// Returns true if the given block is contained in this arena |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the same comment twice.
@@ -4004,7 +4004,7 @@ void StubGenerator::generate_compiler_stubs() { | |||
generate_chacha_stubs(); | |||
|
|||
#ifdef COMPILER2 | |||
if ((UseAVX == 2) && EnableX86ECoreOpts) { | |||
if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { | |||
generate_string_indexof(StubRoutines::_string_indexof_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stub routine should be re-enabled if UseCompactObjectHeaders
is to become non-experimental and enabled by default in the future. Is there a RFE for this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from an assert in LibraryCallKit::inline_string_indexOfI
and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in LibraryCallKit::inline_string_indexOfI
and generate_string_indexof_stubs()
and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: rkennke@7001783
If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the indexOf
implementation, but here is a relevant comment that motivates the assertion: #16753 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: rkennke@097c2af
Does this look correct to you? Or better to do it as a follow-up?
(It passes a couple of indexOf tests, will run tier1-4 on it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look correct to you? Or better to do it as a follow-up?
I do not feel confident enough to review this part. If you want to include rkennke@097c2af in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the code in the patch is good enough as-is, especially if UseCompactObjectHeaders
is slated to go away. The existing if
will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed.
I'm good with a comment tying UseCompactObjectHeaders
to the condition. The comment can be removed when the flag is removed. "Ship it" :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait a second, I've probably not been clear. UseCompactObjectHeaders
is slated to become on by default and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like:
if (haystack_len <= 8) {
// Copy 8 bytes onto stack
} else if (haystack_len <= 16) {
// Copy 16 bytes onto stack
} else {
// Copy 32 bytes onto stack
}
So that is 2 branches in this prologue code instead of originally 1.
However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault.
I think I need to mull over it some more to come up with a correct fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the header<16 version to be a small loop: rkennke@bcba264
The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash).
I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.?
Also, this new implementation could simply replace the old one, instead of being an alternative. I am not sure if if would make any difference performance-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkennke The small loop looks to me that it will run over the end of the array.
Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array:
// num_words (zero-based) = (haystack_len - 1) / 8;
__ movq(index, haystack_len);
__ subq(index, 1);
__ shrq(index, LogBytesPerWord);
__ bind(L_loop);
__ movq(XMM_TMP1, Address(haystack, index, Address::times_8));
__ movq(Address(rsp, index, Address::times_8), XMM_TMP1);
__ subq(index, 1);
__ jcc(Assembler::positive, L_loop);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that is intentional.
Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes.
Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct.
Then comes the actual loop.
Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied.
@rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using
Here are the failure details:
|
I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3? |
I don't think so, due to a limitation in the IR framework precondition language: I suggest to disable the IR checks of the failing tests using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this change. I've reviewed runtime, oops and metaspace code. It looks good.
I posted a patch for JDK-8341044 for CDSPluginTest.java that was failing in our testing with the Lilliput patch. |
Added tag jdk-24+18 for changeset 19642bd
There's another test failure that we're seeing that's similar to this bug in mainline when running with -XX:+UseCompactObjectHeaders on aarch64: https://bugs.openjdk.org/browse/JDK-8340212. |
This is the main body of the JEP 450: Compact Object Headers (Experimental).
It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
Main changes:
Testing:
(+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.)
The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that.
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677
$ git checkout pull/20677
Update a local copy of the PR:
$ git checkout pull/20677
$ git pull https://git.openjdk.org/jdk.git pull/20677/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20677
View PR using the GUI difftool:
$ git pr show -t 20677
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20677.diff
Webrev
Link to Webrev Comment