8139457: Relax alignment of array elements #11044

rkennke · 2022-11-08T20:18:09Z

See JDK-8139457 for details.

Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements.

Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16.

Testing:

runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390)
bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390)
tier1 (x86_64, x86_32, aarch64, riscv)
tier2 (x86_64, aarch64, riscv)
tier3 (x86_64, riscv)

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Change requires CSR request JDK-8314882 to be approved
Commit message must refer to an issue

Issues

JDK-8139457: Relax alignment of array elements (Enhancement - P4)
JDK-8314882: Relax alignment of array elements (CSR)

Reviewers

Thomas Stuefe (@tstuefe - Reviewer) ⚠️ Review applies to e8c1e408
Stefan Karlsson (@stefank - Reviewer)
Aleksey Shipilev (@shipilev - Reviewer) ⚠️ Review applies to 3e37e785
Coleen Phillimore (@coleenp - Reviewer) ⚠️ Review applies to fb9f18ea
Kelvin Nilsen (@kdnilsen - no project role) ⚠️ Review applies to eab5720a
Axel Boldt-Christmas (@xmas92 - Reviewer)

Contributors

Fei Yang <fyang@openjdk.org>
Thomas Stuefe <stuefe@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044
$ git checkout pull/11044

Update a local copy of the PR:
$ git checkout pull/11044
$ git pull https://git.openjdk.org/jdk.git pull/11044/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11044

View PR using the GUI difftool:
$ git pr show -t 11044

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11044.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2022-11-08T20:19:31Z

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-11-08T20:21:47Z

@rkennke The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

shipilev · 2022-11-10T17:11:33Z

ARM32 seems to build well, passes runtime/FieldLayout tests, and bootcycles.

shipilev · 2022-11-10T18:38:09Z

RISC-V needs more work:

$  make images test TEST=runtime/FieldLayout
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (0xe0000000), pid=454832, tid=454835
#  stop: len is not a multiple of BytesPerWord
#
# JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64)
# Problematic frame:
# J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base@20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c]

rkennke · 2022-11-10T18:45:23Z

RISC-V needs more work:

$  make images test TEST=runtime/FieldLayout
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (0xe0000000), pid=454832, tid=454835
#  stop: len is not a multiple of BytesPerWord
#
# JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64)
# Problematic frame:
# J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base@20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c]

Thanks for trying this. It may be enough to change the assert to check for BytesPerInt multiple instead. Something like the following:

`
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 91833b662e2..107e4cfcedd 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -4215,9 +4215,9 @@ void MacroAssembler::zero_memory(Register addr, Register len, Register tmp) {
#ifdef ASSERT
{
Label L;

andi(t0, len, BytesPerWord - 1);

andi(t0, len, BytesPerInt - 1);
beqz(t0, L);

stop("len is not a multiple of BytesPerWord");

stop("len is not a multiple of BytesPerInt");
bind(L);
}
#endif // ASSERT
`

RealFYang · 2022-11-11T08:17:19Z

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java
But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

rkennke · 2022-11-11T10:08:08Z

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly.

rkennke · 2022-11-11T10:09:15Z

/contributor add @RealFYang

openjdk · 2022-11-11T10:09:33Z

@rkennke
Contributor Fei Yang <fyang@openjdk.org> successfully added.

RealFYang · 2022-11-12T03:24:22Z

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly.

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

#ifdef ASSERT
  {
     Label L;
-    andi(t0, len, BytesPerWord - 1);
+    andi(t0, len, BytesPerInt - 1);
     beqz(t0, L);
-    stop("len is not a multiple of BytesPerWord");
+    stop("len is not a multiple of BytesPerInt");
     bind(L);
  }
#endif // ASSERT

Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

rkennke · 2022-11-17T08:30:45Z

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix.

Thanks,
Roman

RealFYang · 2022-11-17T08:59:45Z

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix.

Thanks, Roman

Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards.

rkennke · 2022-11-17T09:02:45Z

Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards.

Thanks, Fei! This is very appreciated!

tstuefe · 2022-11-17T12:54:06Z

This should make it work on ppc.

thomas@starfish:/shared/projects/openjdk/jdk-jdk/source$ git diff
diff --git a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
index 87b87e83e1a..4420c2ac4ca 100644
--- a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
@@ -361,6 +361,16 @@ void C1_MacroAssembler::allocate_array(
   const Register index = t3;
   addi(base, obj, base_offset_in_bytes);               // compute address of first element
   addi(index, arr_size, -(base_offset_in_bytes));      // compute index = number of bytes to clear
+
+  // Elements are not dword aligned. Zero out leading word.
+  if (!is_aligned(base_offset_in_bytes, BytesPerWord)) {
+    assert(is_aligned(base_offset_in_bytes, BytesPerInt), "weird alignment");
+    li(t1, 0);
+    stw(t1, 0, base);
+    addi(base, base, BytesPerInt);
+    // Note: initialize_body will align index down, no need to correct it here.
+  }
+
   initialize_body(base, index);
 
   if (CURRENT_ENV->dtrace_alloc_probes()) {

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

I ran several tests manually with and without UseCCP. Your test case runs also through. Our hardware is a bottleneck though, and currently Richard is using our test queue with his PPC Loom port. Therefore it may take a while until I manage to run more tests.

tstuefe · 2022-11-17T12:56:04Z

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

About that, I see that other platforms zero out the leading bytes in initialize_body, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers?

src/hotspot/share/oops/arrayOop.hpp

rkennke · 2022-11-17T15:17:15Z

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

About that, I see that other platforms zero out the leading bytes in initialize_body, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers?

I don't think so. initialize_object() always passes an aligned offset to initialize_body() because that gap at offset 12 is handled by initialize_header() already.

rkennke · 2022-11-17T15:27:49Z

/contributor add @tstuefe

coleenp · 2024-02-21T14:33:23Z

My testing for tiers 1-7 had 100% passing.

xmas92

These changes look correct. Tried to look around the code for indirect assumptions surrounding the klass_offset and the length_offset w.r.t. the base_offset. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)

I only looked at changes since my last comments, seems like my nits from then have been fixed.

There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.

I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

Maybe something similar for copying and cloning. But there are already so much shims and patching code surrounding copying and cloning. E.g. in ZGC (depending on UseCompressClassPointers) we have to sub 1 from the length node in C2 to apply the proper barriers. This is required because when C2 takes out the base offset it does not know the element type, so it starts the clone from the word aligned (up) length offset (which may or may not start in the header). It would be a larger project to overhaul, but we have seen a couple of bugs related to this.

Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):

Any named property/variable ending in _size_in_bytes/_offset_in_bytes is not required to be word aligned.
Any named property/variable ending in just _size/_offset must be in words. And their name should not lie. i.e. aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; instead of header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; unless header_size * HeapWordSize == header_size_in_bytes() is invariant.
Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for ByteSize and WordSize. (We already have these in the code base but they use 32-bit int and only ByteSize sees any use). The idea is then that you can just use _size/_offset suffixes with the correct types.

Some thoughts on future naming

This is just one way that things could be named.

Note: padding and klass_gap may be 0 here.

arrayOop: After This PR

  Layout:
|<---------------header-------------->|<---------------payload---------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|
|<----------------------object_size/object_size_in_bytes---------------------->|

The *** just means that the alignment will end up somewhere here.
base_offset_in_bytes + elements_size_in_bytes == aligned_base_offset + aligned_element_size is invariant.
arrayOop: Potential Future

  Layout:
|<--------header--------->|<---------------------payload---------------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|<--elements_size_in_bytes-->|
|<--aligned_header_size---****************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---****************>| // Word Aligned (UP) May include elements
                                      |<****-aligned_element_size->| // Word Aligned (Down)
|<----------------------object_size/object_size_in_bytes---------------------->|
|<-Header Initialization--***************>|
                                      |<****-Body Initialization-->|

instanceOop: Current

  Layout:
|<--------header-------->|<-header/body->|<---------------body---------------->|
|<------mark/klass------>|<--klass_gap-->|
                         |<------------------fields/padding------------------->|
  Names:
|<---header_size/header_size_in_bytes--->|
|<-base_offset_in_bytes->|
|<----------------------object_size/object_size_in_bytes---------------------->|

Not sure what a good ambiguous name for the field and/or padding should be.
The *** has a similar role.
instanceOop: Potential Future

  Layout:
|<--------header-------->|<-----------------------body------------------------>|
|<------mark/klass------>|<------------------fields/padding------------------->|
                         |<--klass_gap-->|
   Names:
|<-header_size_in_bytes->|
|<-base_offset_in_bytes->|<------------------X_size_in_bytes------------------>|
|<----------------------object_size/object_size_in_bytes---------------------->|
|<--aligned_header_size---**************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---**************>| // Word Aligned (UP) May include elements
                         |<***************-----------aligned_X_size----------->|
|<-Header Initialization--**************>|
                         |<***************--------Body Initialization--------->|

rkennke · 2024-02-22T11:46:36Z

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

Like what I did for x86 in latest commit? If you agree that should be the way to go, then I'll do the same for aarch64.

src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp

rkennke · 2024-02-22T13:16:02Z

These changes look correct. Tried to look around the code for indirect assumptions surrounding the klass_offset and the length_offset w.r.t. the base_offset. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)

I only looked at changes since my last comments, seems like my nits from then have been fixed.

There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.

I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

Yes, there is still much to do. I just don't want to overload this PR with clutter.

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

This seems indeed the most pragmatic way to deal with it. In the (very) long run, I hope we can settle on a single fixed object layout (e.g. 4-byte header, 4-byte length, then payload for arrays and 4-byte headers, then payload for objects), until then we need to live with the shims. Aligning initialization on word-boundaries makes sense, though.

Maybe something similar for copying and cloning.

Ugh, yes. But not in this PR ;-)

Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):

Any named property/variable ending in _size_in_bytes/_offset_in_bytes is not required to be word aligned.

Any named property/variable ending in just _size/_offset must be in words. And their name should not lie. i.e. aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; instead of header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; unless header_size * HeapWordSize == header_size_in_bytes() is invariant.
Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for ByteSize and WordSize. (We already have these in the code base but they use 32-bit int and only ByteSize sees any use). The idea is then that you can just use _size/_offset suffixes with the correct types.

Agree with all of that.

Some thoughts on future naming

Agree on that, too.

Also, another thing that should be cleaned-up/gotten right is arrayOopDesc::max_array_length() (use sane return type, not int32_t, fix maximum lengths to be less pessimistic, esp on 32bit platforms). But not here, and perhaps not until we're done with the various planned layout changes.

tstuefe

This looks good to me, Roman. Nice work.

src/hotspot/share/oops/arrayOop.hpp

rkennke · 2024-02-23T10:05:01Z

Thanks, all!

/integrate

openjdk · 2024-02-23T10:05:28Z

Going to push as commit 336bbbe.
Since your change was applied there have been 98 commits pushed to the master branch:

cb809f8: 8325215: Incorrect not exhaustive switch error
c4409ea: 8325994: JFR: Examples in JFR.start help use incorrect separator
54f09d7: 8278527: java/util/concurrent/tck/JSR166TestCase.java fails nanoTime test
00ffc42: 8318761: MessageFormat pattern support for CompactNumberFormat, ListFormat, and DateTimeFormatter
d695af8: 8326376: java -version failed with CONF=fastdebug -XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k
9f9a732: 8325752: Remove badMetaWordVal
864cf22: 8325742: Remove MetaWord usage from MemRegion
8e5c0ee: 8324832: JFR: Improve sorting of 'jfr summary'
724a2a2: 8321192: j.a.PrintJob/ImageTest/ImageTest.java: Fail or skip the test if there's no printer
f365d80: 8325153: SEGV in stackChunkOopDesc::derelativize_address(int)
... and 88 more: https://git.openjdk.org/jdk/compare/a0e5e16afbd19f6396f0af2cba954225a357eca8...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-02-23T10:05:34Z

@rkennke Pushed as commit 336bbbe.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

galderz · 2024-03-04T09:04:34Z

src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.hpp

@@ -100,7 +100,7 @@ using MacroAssembler::null_check;
  // header_size: size of object header in words


@rkennke Should this be updated to base_offset_in_bytes too?

And description updated?

Same of x86

Hi Galder, unfortunately this PR has already been intergrated before your review comments. I've opened JDK-8327361 to track the additional fixes.

@galderz please review #18120, thank you!

mmyxym · 2024-03-14T07:58:11Z

Hi Roman,

I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

rkennke · 2024-03-14T08:26:13Z

Hi Roman,

I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

Thanks for the heads-up, this is a very good point. Wouldn't we get wrong results for array-equals if we blindly compare the last word, if it doesn't actually belong to the array contents?

mmyxym · 2024-03-14T08:28:54Z

Hi Roman,
I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

Thanks for the heads-up, this is a very good point. Wouldn't we get wrong results for array-equals if we blindly compare the last word, if it doesn't actually belong to the array contents?

No. We just blindly load for performance but the comparison is still precise.

8139457: Array bases are aligned at HeapWord granularity

dde3814

openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 8, 2022

rkennke added 6 commits November 9, 2022 08:15

Aarch64 parts

991658b

Arm parts

141512a

s390 parts

fd2ecd0

PPC parts

08aee2c

RISCV parts

cf3448d

Add test to verify array base offset

c7ff14b

More RISCV fixes

70448c6

rkennke added 2 commits November 17, 2022 09:28

Revert BytesPerWord/BytesPerInt change in RISCV

4b3967d

Merge branch 'master' into JDK-8139457

6111589

tstuefe reviewed Nov 17, 2022

View reviewed changes

src/hotspot/share/oops/arrayOop.hpp Show resolved Hide resolved

More PPC fixes

42caf4b

rkennke requested review from albertnetymk, xmas92, tstuefe and shipilev February 21, 2024 13:02

xmas92 approved these changes Feb 21, 2024

View reviewed changes

Move shim into initialize_header()

f2f02bd

stefank reviewed Feb 22, 2024

View reviewed changes

src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp Outdated Show resolved Hide resolved

Reuse base_offset

8572f84

openjdk bot added ready Pull request is ready to be integrated and removed ready Pull request is ready to be integrated labels Feb 22, 2024

Move shim into initialize_header() (aarch64)

e8c1e40

tstuefe approved these changes Feb 22, 2024

View reviewed changes

src/hotspot/share/oops/arrayOop.hpp Show resolved Hide resolved

Improve comment

0a4e2f7

stefank approved these changes Feb 22, 2024

View reviewed changes

xmas92 approved these changes Feb 23, 2024

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Feb 23, 2024

openjdk bot closed this Feb 23, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 23, 2024

galderz reviewed Mar 4, 2024

View reviewed changes

zifeihan mentioned this pull request Mar 6, 2024

8327426: RISC-V: Move alignment shim into initialize_header() in C1_MacroAssembler::allocate_array #18131

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8139457: Relax alignment of array elements #11044

8139457: Relax alignment of array elements #11044

rkennke commented Nov 8, 2022 •

edited by openjdk bot

bridgekeeper bot commented Nov 8, 2022

openjdk bot commented Nov 8, 2022

shipilev commented Nov 10, 2022

shipilev commented Nov 10, 2022

rkennke commented Nov 10, 2022

RealFYang commented Nov 11, 2022

rkennke commented Nov 11, 2022

rkennke commented Nov 11, 2022

openjdk bot commented Nov 11, 2022

RealFYang commented Nov 12, 2022 •

edited

rkennke commented Nov 17, 2022

RealFYang commented Nov 17, 2022

rkennke commented Nov 17, 2022

tstuefe commented Nov 17, 2022

tstuefe commented Nov 17, 2022

rkennke commented Nov 17, 2022

rkennke commented Nov 17, 2022

coleenp commented Feb 21, 2024

xmas92 left a comment

rkennke commented Feb 22, 2024

rkennke commented Feb 22, 2024

tstuefe left a comment

rkennke commented Feb 23, 2024

openjdk bot commented Feb 23, 2024

openjdk bot commented Feb 23, 2024

galderz Mar 4, 2024

galderz Mar 4, 2024

galderz Mar 4, 2024

rkennke Mar 5, 2024

rkennke Mar 5, 2024

mmyxym commented Mar 14, 2024

rkennke commented Mar 14, 2024

mmyxym commented Mar 14, 2024

		@@ -100,7 +100,7 @@ using MacroAssembler::null_check;
		// header_size: size of object header in words

8139457: Relax alignment of array elements #11044

8139457: Relax alignment of array elements #11044

Conversation

rkennke commented Nov 8, 2022 • edited by openjdk bot

Progress

Issues

Reviewers

Contributors

Reviewing

Webrev

bridgekeeper bot commented Nov 8, 2022

openjdk bot commented Nov 8, 2022

shipilev commented Nov 10, 2022

shipilev commented Nov 10, 2022

rkennke commented Nov 10, 2022

RealFYang commented Nov 11, 2022

rkennke commented Nov 11, 2022

rkennke commented Nov 11, 2022

openjdk bot commented Nov 11, 2022

RealFYang commented Nov 12, 2022 • edited

rkennke commented Nov 17, 2022

RealFYang commented Nov 17, 2022

rkennke commented Nov 17, 2022

tstuefe commented Nov 17, 2022

tstuefe commented Nov 17, 2022

rkennke commented Nov 17, 2022

rkennke commented Nov 17, 2022

coleenp commented Feb 21, 2024

xmas92 left a comment

Choose a reason for hiding this comment

rkennke commented Feb 22, 2024

rkennke commented Feb 22, 2024

tstuefe left a comment

Choose a reason for hiding this comment

rkennke commented Feb 23, 2024

openjdk bot commented Feb 23, 2024

openjdk bot commented Feb 23, 2024

galderz Mar 4, 2024

Choose a reason for hiding this comment

galderz Mar 4, 2024

Choose a reason for hiding this comment

galderz Mar 4, 2024

Choose a reason for hiding this comment

rkennke Mar 5, 2024

Choose a reason for hiding this comment

rkennke Mar 5, 2024

Choose a reason for hiding this comment

mmyxym commented Mar 14, 2024

rkennke commented Mar 14, 2024

mmyxym commented Mar 14, 2024

rkennke commented Nov 8, 2022 •

edited by openjdk bot

RealFYang commented Nov 12, 2022 •

edited