Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8305895: Implement JEP 450: Compact Object Headers (Experimental) #20677

Open
wants to merge 76 commits into
base: master
Choose a base branch
from

Conversation

rkennke
Copy link
Contributor

@rkennke rkennke commented Aug 22, 2024

This is the main body of the JEP 450: Compact Object Headers (Experimental).

It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.

Main changes:

  • Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
  • The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
  • Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
  • Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
  • Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
  • Arrays will now store their length at offset 8.
  • CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant.
  • Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions.

Testing:
(+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.)
The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that.

  • tier1 (x86_64)
  • tier2 (x86_64)
  • tier3 (x86_64)
  • tier4 (x86_64)
  • tier1 (aarch64)
  • tier2 (aarch64)
  • tier3 (aarch64)
  • tier4 (aarch64)
  • tier1 (x86_64) +UseCompactObjectHeaders
  • tier2 (x86_64) +UseCompactObjectHeaders
  • tier3 (x86_64) +UseCompactObjectHeaders
  • tier4 (x86_64) +UseCompactObjectHeaders
  • tier1 (aarch64) +UseCompactObjectHeaders
  • tier2 (aarch64) +UseCompactObjectHeaders
  • tier3 (aarch64) +UseCompactObjectHeaders
  • tier4 (aarch64) +UseCompactObjectHeaders
  • Running as a backport in production since >1 year.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires a JEP request to be targeted
  • Change must be properly reviewed (3 reviews required, with at least 1 Reviewer, 2 Authors)

Issues

  • JDK-8305895: Implement JEP 450: Compact Object Headers (Experimental) (Enhancement - P4)
  • JDK-8294992: JEP 450: Compact Object Headers (Experimental) (JEP)
  • JDK-8306000: Add experimental -XX:+UseCompactObjectHeaders flag (CSR) (Withdrawn)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677
$ git checkout pull/20677

Update a local copy of the PR:
$ git checkout pull/20677
$ git pull https://git.openjdk.org/jdk.git pull/20677/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20677

View PR using the GUI difftool:
$ git pr show -t 20677

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20677.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 22, 2024

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 22, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Aug 22, 2024
@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@rkennke The following labels will be automatically applied to this pull request:

  • build
  • core-libs
  • graal
  • hotspot
  • serviceability
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org serviceability serviceability-dev@openjdk.org build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Aug 22, 2024
@rkennke
Copy link
Contributor Author

rkennke commented Aug 22, 2024

/label remove core-libs
/label add hotspot-gc
/label add hotspot-runtime

@rkennke
Copy link
Contributor Author

rkennke commented Aug 22, 2024

/jep JEP-450

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Aug 22, 2024
@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@rkennke
The core-libs label was successfully removed.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Aug 22, 2024
@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@rkennke
The hotspot-gc label was successfully added.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Aug 22, 2024
@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@rkennke
The hotspot-runtime label was successfully added.

@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@rkennke
This pull request will not be integrated until the JEP-450 has been targeted.

@robcasloz
Copy link
Contributor

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on oopDesc::klass_offset_in_bytes()) being a pre-condition for a future, non-experimental version of compact headers?

Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work?

Done: https://bugs.openjdk.org/browse/JDK-8340453.

@Hamlin-Li
Copy link

In both aarch64.ad and x86_64.ad, MachUEPNode::format might need some change accordingly?

Copy link
Contributor

@matias9927 matias9927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDS changes look good! Have two style comments but otherwise this makes sense

@@ -344,7 +345,7 @@ void ReadClosure::do_tag(int tag) {
int old_tag;
old_tag = (int)(intptr_t)nextPtr();
// do_int(&old_tag);
assert(tag == old_tag, "old tag doesn't match");
assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assert message change a leftover from debugging or is it meant to be this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in.

@@ -670,8 +672,19 @@ void ArchiveBuilder::make_shallow_copy(DumpRegion *dump_region, SourceObjInfo* s
SystemDictionaryShared::validate_before_archiving(InstanceKlass::cast(klass));
dump_region->allocate(sizeof(address));
}
// Allocate space for the future InstanceKlass with proper alignment
const size_t alignment =
#ifdef _LP64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation?

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch.
Only one comment about the interpreter code (affecting 4 locations).

__ sub(r3, r3, oopDesc::base_offset_in_bytes());
} else {
__ sub(r3, r3, sizeof(oopDesc));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something that could be buggy if we're not careful. We had a pass where we cleaned up sizeof(oopDesc) once. Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name?

} else {
__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx);
NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc).

Comment on lines +107 to +115
if (is_class) {
assert(word_size >= (sizeof(Klass)/BytesPerWord), "weird size for klass: %zu", word_size);
result = class_space_arena()->allocate(word_size, wastage);
} else {
return non_class_space_arena()->allocate(word_size);
result = non_class_space_arena()->allocate(word_size, wastage);
}
if (wastage.is_nonempty()) {
non_class_space_arena()->deallocate(wastage);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace.

The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here.

Comment on lines +107 to +115
if (is_class) {
assert(word_size >= (sizeof(Klass)/BytesPerWord), "weird size for klass: %zu", word_size);
result = class_space_arena()->allocate(word_size, wastage);
} else {
return non_class_space_arena()->allocate(word_size);
result = non_class_space_arena()->allocate(word_size, wastage);
}
if (wastage.is_nonempty()) {
non_class_space_arena()->deallocate(wastage);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also assert or be condionalized on UseCompactObjectHeaders.

@@ -778,6 +796,7 @@ void Metaspace::global_initialize() {
Metaspace::initialize_class_space(rs);

// Set up compressed class pointer encoding.
// In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this comment is here. Seems out of place.

add_block(p + requested_word_size, waste);
}
}
return p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answers my prior question. The waste is added back to the block list for non-class-arenas as well.

#define METABLOCKFORMAT "block (@" PTR_FORMAT " word size " SIZE_FORMAT ")"
#define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size()

} // namespace metaspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better.

bool MetaspaceArena::is_valid_area(MetaWord* p, size_t word_size) const {
assert(p != nullptr && word_size > 0, "Sanity");
// Returns true if the given block is contained in this arena
// Returns true if the given block is contained in this arena
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the same comment twice.

src/hotspot/share/opto/memnode.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/x86/macroAssembler_x86.cpp Show resolved Hide resolved
src/hotspot/cpu/x86/macroAssembler_x86.hpp Outdated Show resolved Hide resolved
@@ -4004,7 +4004,7 @@ void StubGenerator::generate_compiler_stubs() {
generate_chacha_stubs();

#ifdef COMPILER2
if ((UseAVX == 2) && EnableX86ECoreOpts) {
if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) {
generate_string_indexof(StubRoutines::_string_indexof_array);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stub routine should be re-enabled if UseCompactObjectHeaders is to become non-experimental and enabled by default in the future. Is there a RFE for this task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from an assert in LibraryCallKit::inline_string_indexOfI and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in LibraryCallKit::inline_string_indexOfI and generate_string_indexof_stubs() and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: rkennke@7001783

If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the indexOf implementation, but here is a relevant comment that motivates the assertion: #16753 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: rkennke@097c2af

Does this look correct to you? Or better to do it as a follow-up?
(It passes a couple of indexOf tests, will run tier1-4 on it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look correct to you? Or better to do it as a follow-up?

I do not feel confident enough to review this part. If you want to include rkennke@097c2af in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the code in the patch is good enough as-is, especially if UseCompactObjectHeaders is slated to go away. The existing if will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed.

I'm good with a comment tying UseCompactObjectHeaders to the condition. The comment can be removed when the flag is removed. "Ship it" :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a second, I've probably not been clear. UseCompactObjectHeaders is slated to become on by default and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like:

if (haystack_len <= 8) {
  // Copy 8 bytes onto stack
} else if (haystack_len <= 16) {
  // Copy 16 bytes onto stack
} else {
  // Copy 32 bytes onto stack
}

So that is 2 branches in this prologue code instead of originally 1.

However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault.

I think I need to mull over it some more to come up with a correct fix.

Copy link
Contributor Author

@rkennke rkennke Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the header<16 version to be a small loop: rkennke@bcba264

The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash).

I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.?

Also, this new implementation could simply replace the old one, instead of being an alternative. I am not sure if if would make any difference performance-wise.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkennke The small loop looks to me that it will run over the end of the array.
Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array:
// num_words (zero-based) = (haystack_len - 1) / 8;
__ movq(index, haystack_len);
__ subq(index, 1);
__ shrq(index, LogBytesPerWord);

  __ bind(L_loop);
  __ movq(XMM_TMP1, Address(haystack, index, Address::times_8));
  __ movq(Address(rsp, index, Address::times_8), XMM_TMP1);
  __ subq(index, 1);
  __ jcc(Assembler::positive, L_loop);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that is intentional.

Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes.

Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct.

Then comes the actual loop.

Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied.

src/hotspot/cpu/x86/macroAssembler_x86.cpp Outdated Show resolved Hide resolved
@robcasloz
Copy link
Contributor

Indeed, I could re-enable all tests in:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java

but unfortunately not those others:

> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.

I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

@rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N with N <= 3 on an Intel Xeon Platinum 8358 machine:

  • test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
  • test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
  • test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java

Here are the failure details:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java:

1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

@rkennke
Copy link
Contributor Author

rkennke commented Oct 1, 2024

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3?

@robcasloz
Copy link
Contributor

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3?

I don't think so, due to a limitation in the IR framework precondition language: UseCompactObjectHeaders can only appear within a "flag precondition" whereas UseSSE>3 needs to be expressed as a "CPU feature precondition" for portability (UseSSE is not defined for aarch64), and these two cannot be combined with logical operators.

I suggest to disable the IR checks of the failing tests using applyIf = {"UseCompactObjectHeaders", "false"} as you did for other similar tests (e.g. TestMulAddS2I.java), and document it in JDK-8340010. Maybe also comment in the tests that the failure happens only with -XX:UseSSE<=3.

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Oct 2, 2024
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change. I've reviewed runtime, oops and metaspace code. It looks good.

@coleenp
Copy link
Contributor

coleenp commented Oct 3, 2024

I posted a patch for JDK-8341044 for CDSPluginTest.java that was failing in our testing with the Lilliput patch.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Oct 4, 2024
@coleenp
Copy link
Contributor

coleenp commented Oct 4, 2024

There's another test failure that we're seeing that's similar to this bug in mainline when running with -XX:+UseCompactObjectHeaders on aarch64: https://bugs.openjdk.org/browse/JDK-8340212.
I haven't been able to reproduce this myself yet though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org jep rfr Pull request is ready for review serviceability serviceability-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.