Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8139457: Relax alignment of array elements #11044

Closed
wants to merge 99 commits into from

Conversation

rkennke
Copy link
Contributor

@rkennke rkennke commented Nov 8, 2022

See JDK-8139457 for details.

Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements.

Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16.

Testing:

  • runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390)
  • bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390)
  • tier1 (x86_64, x86_32, aarch64, riscv)
  • tier2 (x86_64, aarch64, riscv)
  • tier3 (x86_64, riscv)

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Change requires CSR request JDK-8314882 to be approved
  • Commit message must refer to an issue

Issues

  • JDK-8139457: Relax alignment of array elements (Enhancement - P4)
  • JDK-8314882: Relax alignment of array elements (CSR)

Reviewers

Contributors

  • Fei Yang <fyang@openjdk.org>
  • Thomas Stuefe <stuefe@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044
$ git checkout pull/11044

Update a local copy of the PR:
$ git checkout pull/11044
$ git pull https://git.openjdk.org/jdk.git pull/11044/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11044

View PR using the GUI difftool:
$ git pr show -t 11044

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11044.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 8, 2022

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 8, 2022

@rkennke The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 8, 2022
@shipilev
Copy link
Member

ARM32 seems to build well, passes runtime/FieldLayout tests, and bootcycles.

@shipilev
Copy link
Member

RISC-V needs more work:

$  make images test TEST=runtime/FieldLayout
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (0xe0000000), pid=454832, tid=454835
#  stop: len is not a multiple of BytesPerWord
#
# JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64)
# Problematic frame:
# J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base@20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c]

@rkennke
Copy link
Contributor Author

rkennke commented Nov 10, 2022

RISC-V needs more work:

$  make images test TEST=runtime/FieldLayout
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (0xe0000000), pid=454832, tid=454835
#  stop: len is not a multiple of BytesPerWord
#
# JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64)
# Problematic frame:
# J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base@20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c]

Thanks for trying this. It may be enough to change the assert to check for BytesPerInt multiple instead. Something like the following:

`
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 91833b662e2..107e4cfcedd 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -4215,9 +4215,9 @@ void MacroAssembler::zero_memory(Register addr, Register len, Register tmp) {
#ifdef ASSERT
{
Label L;

  • andi(t0, len, BytesPerWord - 1);
  • andi(t0, len, BytesPerInt - 1);
    beqz(t0, L);
  • stop("len is not a multiple of BytesPerWord");
  • stop("len is not a multiple of BytesPerInt");
    bind(L);
    }
    #endif // ASSERT
    `

@RealFYang
Copy link
Member

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java
But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

@rkennke
Copy link
Contributor Author

rkennke commented Nov 11, 2022

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly.

@rkennke
Copy link
Contributor Author

rkennke commented Nov 11, 2022

/contributor add @RealFYang

@openjdk
Copy link

openjdk bot commented Nov 11, 2022

@rkennke
Contributor Fei Yang <fyang@openjdk.org> successfully added.

@RealFYang
Copy link
Member

RealFYang commented Nov 12, 2022

Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv.

diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
index 5989d5ab809..9dced7c53e9 100644
--- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp
@@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int
   sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes);
   beqz(len_in_bytes, done);

+  // Zero first 4 bytes, if start offset is not word aligned.
+  if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) {
+    sw(zr, Address(obj, hdr_size_in_bytes));
+    sub(len_in_bytes, len_in_bytes, BytesPerInt);
+    hdr_size_in_bytes += BytesPerInt;
+  }
+
   // Preserve obj
   if (hdr_size_in_bytes) {
     add(obj, obj, hdr_size_in_bytes);

Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly.

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

#ifdef ASSERT
  {
     Label L;
-    andi(t0, len, BytesPerWord - 1);
+    andi(t0, len, BytesPerInt - 1);
     beqz(t0, L);
-    stop("len is not a multiple of BytesPerWord");
+    stop("len is not a multiple of BytesPerInt");
     bind(L);
  }
#endif // ASSERT

Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

@rkennke
Copy link
Contributor Author

rkennke commented Nov 17, 2022

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix.

Thanks,
Roman

@RealFYang
Copy link
Member

With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform.

Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix.

Thanks, Roman

Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards.

@rkennke
Copy link
Contributor Author

rkennke commented Nov 17, 2022

Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards.

Thanks, Fei! This is very appreciated!

@tstuefe
Copy link
Member

tstuefe commented Nov 17, 2022

This should make it work on ppc.

thomas@starfish:/shared/projects/openjdk/jdk-jdk/source$ git diff
diff --git a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
index 87b87e83e1a..4420c2ac4ca 100644
--- a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
@@ -361,6 +361,16 @@ void C1_MacroAssembler::allocate_array(
   const Register index = t3;
   addi(base, obj, base_offset_in_bytes);               // compute address of first element
   addi(index, arr_size, -(base_offset_in_bytes));      // compute index = number of bytes to clear
+
+  // Elements are not dword aligned. Zero out leading word.
+  if (!is_aligned(base_offset_in_bytes, BytesPerWord)) {
+    assert(is_aligned(base_offset_in_bytes, BytesPerInt), "weird alignment");
+    li(t1, 0);
+    stw(t1, 0, base);
+    addi(base, base, BytesPerInt);
+    // Note: initialize_body will align index down, no need to correct it here.
+  }
+
   initialize_body(base, index);
 
   if (CURRENT_ENV->dtrace_alloc_probes()) {

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

I ran several tests manually with and without UseCCP. Your test case runs also through. Our hardware is a bottleneck though, and currently Richard is using our test queue with his PPC Loom port. Therefore it may take a while until I manage to run more tests.

@tstuefe
Copy link
Member

tstuefe commented Nov 17, 2022

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

About that, I see that other platforms zero out the leading bytes in initialize_body, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers?

@rkennke
Copy link
Contributor Author

rkennke commented Nov 17, 2022

I did the zero-ing out up in allocate_array since I did not want to affect the object allocation path.

About that, I see that other platforms zero out the leading bytes in initialize_body, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers?

I don't think so. initialize_object() always passes an aligned offset to initialize_body() because that gap at offset 12 is handled by initialize_header() already.

@rkennke
Copy link
Contributor Author

rkennke commented Nov 17, 2022

/contributor add @tstuefe

@coleenp
Copy link
Contributor

coleenp commented Feb 21, 2024

My testing for tiers 1-7 had 100% passing.

Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look correct. Tried to look around the code for indirect assumptions surrounding the klass_offset and the length_offset w.r.t. the base_offset. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)

I only looked at changes since my last comments, seems like my nits from then have been fixed.

There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.

I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

Maybe something similar for copying and cloning. But there are already so much shims and patching code surrounding copying and cloning. E.g. in ZGC (depending on UseCompressClassPointers) we have to sub 1 from the length node in C2 to apply the proper barriers. This is required because when C2 takes out the base offset it does not know the element type, so it starts the clone from the word aligned (up) length offset (which may or may not start in the header). It would be a larger project to overhaul, but we have seen a couple of bugs related to this.

Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):

  • Any named property/variable ending in _size_in_bytes/_offset_in_bytes is not required to be word aligned.
  • Any named property/variable ending in just _size/_offset must be in words. And their name should not lie. i.e. aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; instead of header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; unless header_size * HeapWordSize == header_size_in_bytes() is invariant.
    Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
    An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for ByteSize and WordSize. (We already have these in the code base but they use 32-bit int and only ByteSize sees any use). The idea is then that you can just use _size/_offset suffixes with the correct types.
Some thoughts on future naming This is just one way that things could be named.

Note: padding and klass_gap may be 0 here.

arrayOop: After This PR

  Layout:
|<---------------header-------------->|<---------------payload---------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|
|<----------------------object_size/object_size_in_bytes---------------------->|

The *** just means that the alignment will end up somewhere here.
base_offset_in_bytes + elements_size_in_bytes == aligned_base_offset + aligned_element_size is invariant.
arrayOop: Potential Future

  Layout:
|<--------header--------->|<---------------------payload---------------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|<--elements_size_in_bytes-->|
|<--aligned_header_size---****************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---****************>| // Word Aligned (UP) May include elements
                                      |<****-aligned_element_size->| // Word Aligned (Down)
|<----------------------object_size/object_size_in_bytes---------------------->|
|<-Header Initialization--***************>|
                                      |<****-Body Initialization-->|

instanceOop: Current

  Layout:
|<--------header-------->|<-header/body->|<---------------body---------------->|
|<------mark/klass------>|<--klass_gap-->|
                         |<------------------fields/padding------------------->|
  Names:
|<---header_size/header_size_in_bytes--->|
|<-base_offset_in_bytes->|
|<----------------------object_size/object_size_in_bytes---------------------->|

Not sure what a good ambiguous name for the field and/or padding should be.
The *** has a similar role.
instanceOop: Potential Future

  Layout:
|<--------header-------->|<-----------------------body------------------------>|
|<------mark/klass------>|<------------------fields/padding------------------->|
                         |<--klass_gap-->|
   Names:
|<-header_size_in_bytes->|
|<-base_offset_in_bytes->|<------------------X_size_in_bytes------------------>|
|<----------------------object_size/object_size_in_bytes---------------------->|
|<--aligned_header_size---**************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---**************>| // Word Aligned (UP) May include elements
                         |<***************-----------aligned_X_size----------->|
|<-Header Initialization--**************>|
                         |<***************--------Body Initialization--------->|

@rkennke
Copy link
Contributor Author

rkennke commented Feb 22, 2024

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

Like what I did for x86 in latest commit? If you agree that should be the way to go, then I'll do the same for aarch64.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed ready Pull request is ready to be integrated labels Feb 22, 2024
@rkennke
Copy link
Contributor Author

rkennke commented Feb 22, 2024

These changes look correct. Tried to look around the code for indirect assumptions surrounding the klass_offset and the length_offset w.r.t. the base_offset. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)

I only looked at changes since my last comments, seems like my nits from then have been fixed.

There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.

I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

Yes, there is still much to do. I just don't want to overload this PR with clutter.

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) header_size_in_bytes. (Similarly to how it is done for the instanceOop where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

This seems indeed the most pragmatic way to deal with it. In the (very) long run, I hope we can settle on a single fixed object layout (e.g. 4-byte header, 4-byte length, then payload for arrays and 4-byte headers, then payload for objects), until then we need to live with the shims. Aligning initialization on word-boundaries makes sense, though.

Maybe something similar for copying and cloning.

Ugh, yes. But not in this PR ;-)

Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):

  • Any named property/variable ending in _size_in_bytes/_offset_in_bytes is not required to be word aligned.
  • Any named property/variable ending in just _size/_offset must be in words. And their name should not lie. i.e. aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; instead of header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize; unless header_size * HeapWordSize == header_size_in_bytes() is invariant.
    Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
    An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for ByteSize and WordSize. (We already have these in the code base but they use 32-bit int and only ByteSize sees any use). The idea is then that you can just use _size/_offset suffixes with the correct types.

Agree with all of that.

Some thoughts on future naming

Agree on that, too.

Also, another thing that should be cleaned-up/gotten right is arrayOopDesc::max_array_length() (use sane return type, not int32_t, fix maximum lengths to be less pessimistic, esp on 32bit platforms). But not here, and perhaps not until we're done with the various planned layout changes.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, Roman. Nice work.

src/hotspot/share/oops/arrayOop.hpp Show resolved Hide resolved
@rkennke
Copy link
Contributor Author

rkennke commented Feb 23, 2024

Thanks, all!

/integrate

@openjdk
Copy link

openjdk bot commented Feb 23, 2024

Going to push as commit 336bbbe.
Since your change was applied there have been 98 commits pushed to the master branch:

  • cb809f8: 8325215: Incorrect not exhaustive switch error
  • c4409ea: 8325994: JFR: Examples in JFR.start help use incorrect separator
  • 54f09d7: 8278527: java/util/concurrent/tck/JSR166TestCase.java fails nanoTime test
  • 00ffc42: 8318761: MessageFormat pattern support for CompactNumberFormat, ListFormat, and DateTimeFormatter
  • d695af8: 8326376: java -version failed with CONF=fastdebug -XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k
  • 9f9a732: 8325752: Remove badMetaWordVal
  • 864cf22: 8325742: Remove MetaWord usage from MemRegion
  • 8e5c0ee: 8324832: JFR: Improve sorting of 'jfr summary'
  • 724a2a2: 8321192: j.a.PrintJob/ImageTest/ImageTest.java: Fail or skip the test if there's no printer
  • f365d80: 8325153: SEGV in stackChunkOopDesc::derelativize_address(int)
  • ... and 88 more: https://git.openjdk.org/jdk/compare/a0e5e16afbd19f6396f0af2cba954225a357eca8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 23, 2024
@openjdk openjdk bot closed this Feb 23, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 23, 2024
@openjdk
Copy link

openjdk bot commented Feb 23, 2024

@rkennke Pushed as commit 336bbbe.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@@ -100,7 +100,7 @@ using MacroAssembler::null_check;
// header_size: size of object header in words
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkennke Should this be updated to base_offset_in_bytes too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And description updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same of x86

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Galder, unfortunately this PR has already been intergrated before your review comments. I've opened JDK-8327361 to track the additional fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galderz please review #18120, thank you!

@mmyxym
Copy link

mmyxym commented Mar 14, 2024

Hi Roman,

I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

@rkennke
Copy link
Contributor Author

rkennke commented Mar 14, 2024

Hi Roman,

I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

Thanks for the heads-up, this is a very good point. Wouldn't we get wrong results for array-equals if we blindly compare the last word, if it doesn't actually belong to the array contents?

@mmyxym
Copy link

mmyxym commented Mar 14, 2024

Hi Roman,
I found a potential bug but didn't realized this PR was already integrated recently. Sorry for my negligence. It's a rare crash in aarch64 with G1 GC. The root cause is that default behavior of MacroAssembler::arrays_equals will blindly load whole word before comparison. When the array[0] is aligned to 32-bit, the last word load will exceed the array limit and may touch the next word beyong object layout in heap memory. If the next word which doesn't belong to object self happens to be the boundary of pages and G1 heap regions, the segmentation fault will be triggered. Loading the last word blindly is benign for 64-bit aligned array because it is always inside the object self. We proposed JDK-8328138 to optimize the aarch64 array equals implementation to both handle word aligned or unaligned array correctly and have better performance in ARM neoverse n1&n2 architectures. Apologize again for my delay. Please help to take a review.

Thanks for the heads-up, this is a very good point. Wouldn't we get wrong results for array-equals if we blindly compare the last word, if it doesn't actually belong to the array contents?

No. We just blindly load for performance but the comparison is still precise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated