-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8139457: Relax alignment of array elements #11044
Conversation
👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into |
ARM32 seems to build well, passes |
RISC-V needs more work:
|
Thanks for trying this. It may be enough to change the assert to check for BytesPerInt multiple instead. Something like the following: `
|
Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java
|
Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly. |
/contributor add @RealFYang |
@rkennke |
With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform. |
Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix. Thanks, |
Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards. |
Thanks, Fei! This is very appreciated! |
This should make it work on ppc.
I did the zero-ing out up in I ran several tests manually with and without UseCCP. Your test case runs also through. Our hardware is a bottleneck though, and currently Richard is using our test queue with his PPC Loom port. Therefore it may take a while until I manage to run more tests. |
About that, I see that other platforms zero out the leading bytes in |
I don't think so. initialize_object() always passes an aligned offset to initialize_body() because that gap at offset 12 is handled by initialize_header() already. |
/contributor add @tstuefe |
My testing for tiers 1-7 had 100% passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look correct. Tried to look around the code for indirect assumptions surrounding the klass_offset
and the length_offset
w.r.t. the base_offset
. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)
I only looked at changes since my last comments, seems like my nits from then have been fixed.
There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.
I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)
I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes
. (Similarly to how it is done for the instanceOop
where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).
This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)
Maybe something similar for copying and cloning. But there are already so much shims and patching code surrounding copying and cloning. E.g. in ZGC (depending on UseCompressClassPointers) we have to sub 1 from the length node in C2 to apply the proper barriers. This is required because when C2 takes out the base offset it does not know the element type, so it starts the clone from the word aligned (up) length offset (which may or may not start in the header). It would be a larger project to overhaul, but we have seen a couple of bugs related to this.
Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):
- Any named property/variable ending in
_size_in_bytes
/_offset_in_bytes
is not required to be word aligned. - Any named property/variable ending in just
_size
/_offset
must be in words. And their name should not lie. i.e.aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;
instead ofheader_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;
unlessheader_size * HeapWordSize == header_size_in_bytes()
is invariant.
Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes forByteSize
andWordSize
. (We already have these in the code base but they use 32-bit int and onlyByteSize
sees any use). The idea is then that you can just use_size
/_offset
suffixes with the correct types.
Some thoughts on future naming
This is just one way that things could be named.Note: padding
and klass_gap
may be 0 here.
arrayOop: After This PR
Layout: |<---------------header-------------->|<---------------payload---------------->| |<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->| Names: |<-header_size_in_bytes-->| |<-------base_offset_in_bytes-------->| |<----------------------object_size/object_size_in_bytes---------------------->|
The ***
just means that the alignment will end up somewhere here.
base_offset_in_bytes + elements_size_in_bytes == aligned_base_offset + aligned_element_size
is invariant.
arrayOop: Potential Future
Layout: |<--------header--------->|<---------------------payload---------------------->| |<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->| Names: |<-header_size_in_bytes-->| |<-------base_offset_in_bytes-------->|<--elements_size_in_bytes-->| |<--aligned_header_size---****************>| // Word Aligned (UP) May include elements |<--aligned_base_offset---****************>| // Word Aligned (UP) May include elements |<****-aligned_element_size->| // Word Aligned (Down) |<----------------------object_size/object_size_in_bytes---------------------->| |<-Header Initialization--***************>| |<****-Body Initialization-->|
instanceOop: Current
Layout: |<--------header-------->|<-header/body->|<---------------body---------------->| |<------mark/klass------>|<--klass_gap-->| |<------------------fields/padding------------------->| Names: |<---header_size/header_size_in_bytes--->| |<-base_offset_in_bytes->| |<----------------------object_size/object_size_in_bytes---------------------->|
Not sure what a good ambiguous name for the field and/or padding should be.
The ***
has a similar role.
instanceOop: Potential Future
Layout: |<--------header-------->|<-----------------------body------------------------>| |<------mark/klass------>|<------------------fields/padding------------------->| |<--klass_gap-->| Names: |<-header_size_in_bytes->| |<-base_offset_in_bytes->|<------------------X_size_in_bytes------------------>| |<----------------------object_size/object_size_in_bytes---------------------->| |<--aligned_header_size---**************>| // Word Aligned (UP) May include elements |<--aligned_base_offset---**************>| // Word Aligned (UP) May include elements |<***************-----------aligned_X_size----------->| |<-Header Initialization--**************>| |<***************--------Body Initialization--------->|
Like what I did for x86 in latest commit? If you agree that should be the way to go, then I'll do the same for aarch64. |
Yes, there is still much to do. I just don't want to overload this PR with clutter.
This seems indeed the most pragmatic way to deal with it. In the (very) long run, I hope we can settle on a single fixed object layout (e.g. 4-byte header, 4-byte length, then payload for arrays and 4-byte headers, then payload for objects), until then we need to live with the shims. Aligning initialization on word-boundaries makes sense, though.
Ugh, yes. But not in this PR ;-)
Agree with all of that.
Agree on that, too. Also, another thing that should be cleaned-up/gotten right is arrayOopDesc::max_array_length() (use sane return type, not int32_t, fix maximum lengths to be less pessimistic, esp on 32bit platforms). But not here, and perhaps not until we're done with the various planned layout changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, Roman. Nice work.
Thanks, all! /integrate |
Going to push as commit 336bbbe.
Your commit was automatically rebased without conflicts. |
@@ -100,7 +100,7 @@ using MacroAssembler::null_check; | |||
// header_size: size of object header in words |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkennke Should this be updated to base_offset_in_bytes
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And description updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same of x86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Galder, unfortunately this PR has already been intergrated before your review comments. I've opened JDK-8327361 to track the additional fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Roman, I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review. |
Thanks for the heads-up, this is a very good point. Wouldn't we get wrong results for array-equals if we blindly compare the last word, if it doesn't actually belong to the array contents? |
No. We just blindly load for performance but the comparison is still precise. |
See JDK-8139457 for details.
Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements.
Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16.
Testing:
Progress
Issues
Reviewers
Contributors
<fyang@openjdk.org>
<stuefe@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044
$ git checkout pull/11044
Update a local copy of the PR:
$ git checkout pull/11044
$ git pull https://git.openjdk.org/jdk.git pull/11044/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11044
View PR using the GUI difftool:
$ git pr show -t 11044
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11044.diff
Webrev
Link to Webrev Comment