Skip to content

Conversation

@robcasloz
Copy link
Contributor

@robcasloz robcasloz commented Mar 12, 2025

The array fill optimization replaces simple innermost loops that fill an array with copies of the same primitive value:

for (int i = 0; i < array.length; i++) {
    array[i] = 0;
}

with a call to an array filling intrinsic that is specialized for the array element type:

arrayof_jint_fill(array, 0, array.length)

The optimization retrieves the (basic) array element type from calling MemNode::memory_type() on the original filling store. This is incorrect for stores of short values, since these are represented by StoreC nodes whose memory_type() is T_CHAR. As a result, the optimization wrongly assigns the address type char[] to short array fill loops. This can cause miscompilations due to missing anti-dependences, see the issue description for further detail.

This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations.

Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate memory_type(). This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability.

Testing: tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8351468: C2: array fill optimization assigns wrong type to intrinsic call (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005
$ git checkout pull/24005

Update a local copy of the PR:
$ git checkout pull/24005
$ git pull https://git.openjdk.org/jdk.git pull/24005/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24005

View PR using the GUI difftool:
$ git pr show -t 24005

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24005.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 12, 2025

👋 Welcome back rcastanedalo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 12, 2025

@robcasloz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8351468: C2: array fill optimization assigns wrong type to intrinsic call

Reviewed-by: epeter, thartmann, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 135 new commits pushed to the master branch:

  • a875733: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct
  • 5591f8a: 8351515: C2 incorrectly removes double negation for double and float
  • 56a4ffa: 8352597: [IR Framework] test bug: TestNotCompilable.java fails on product build
  • e23e0f8: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test
  • adfb120: 8351748: Add class init barrier to AOT-cached Method/Var Handles
  • ee1577b: 8352652: [BACKOUT] nsk/jvmti/ tests should fail when nsk_jvmti_setFailStatus() is called
  • df9210e: 8347706: jvmciEnv.cpp has jvmci includes out of order
  • 5dd0acb: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled
  • 334a1ee: 8351375: nsk/jvmti/ tests should fail when nsk_jvmti_setFailStatus() is called
  • 7442039: 8337279: Share StringBuilder to format instant
  • ... and 125 more: https://git.openjdk.org/jdk/compare/4412c079fccefbb235b22651206089f5bac47d18...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Mar 12, 2025

@robcasloz The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 12, 2025
@robcasloz robcasloz marked this pull request as ready for review March 12, 2025 09:51
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 12, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice landmine.

(If you pull from current master, GHA should become clean)

@mlbridge
Copy link

mlbridge bot commented Mar 12, 2025

Webrevs

@merykitty
Copy link
Member

I think the issue here is the implementation of MemNode::memory_type(), it says that it returns the type of the value in memory, but it always returns T_CHAR for StoreC which seems non-sensical, what if I StoreC to a long[]?

@robcasloz
Copy link
Contributor Author

I think the issue here is the implementation of MemNode::memory_type().

I agree, in particular the fact that StoreC nodes are used to represent both short and char stores but always return T_CHAR as their memory_type(). That is why I propose to simply circumvent the usage of MemNode::memory_type() to compute the type of the array fill intrinsic in this changeset, and explore creating a dedicated StoreS node in a separate RFE.

what if I StoreC to a long[]?

A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no?

@merykitty
Copy link
Member

A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no?

No, a code such as this MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c) would produce a StoreC into a long[].

@robcasloz
Copy link
Contributor Author

(If you pull from current master, GHA should become clean)

Done (commit 90fd766), thanks.

@robcasloz
Copy link
Contributor Author

A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no?

No, a code such as this MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c) would produce a StoreC into a long[].

Right, in this case I interpret from the comment at the declaration of MemNode::memory_type() that the memory_type() of the StoreC node should be T_SHORT (the type of the value stored by the node), as opposed to the current T_CHAR. I propose to address this in a separate RFE.

@merykitty
Copy link
Member

merykitty commented Mar 13, 2025

@robcasloz I disagree, I would expect the memory_type of a StoreC into a long[] to be something that means "a part of a long[]", which should be T_LONG if the store is guaranteed to be enclosed in a single long, or T_VOID otherwise. While we are trying to store 2 bytes into the memory, the thing in the memory is neither a short nor a char.

@galderz
Copy link
Contributor

galderz commented Mar 13, 2025

... explore creating a dedicated StoreS node in a separate RFE.

Why not do this in this PR? Seems like the right approach to me.

@robcasloz
Copy link
Contributor Author

... explore creating a dedicated StoreS node in a separate RFE.

Why not do this in this PR? Seems like the right approach to me.

My thinking is that this is a bug whose fix we might want to backport to several JDK Update releases. The fix proposed in this PR is minimal and local to the array fill optimization, whereas the alternative approach of defining a StoreS node (see prototype here)

  1. is more costly to apply due to its larger changeset, and
  2. incurs a significantly higher risk of introducing regressions, as it affects the entire C2 compilation chain (for example, I found while prototyping it that it affects the output of the store merging optimization).

See the OpenJDK Developers' Guide for a more elaborate discussion of the trade-offs involved in backporting.

Having said this, I still think we should consider introducing a StoreS node in a follow-up RFE, and perhaps also enforcing consistent type abbreviations across load and store node names, e.g. renaming LoadUSNode to LoadCNode.

@robcasloz
Copy link
Contributor Author

I would expect the memory_type of a StoreC into a long[] to be something that means "a part of a long[]"

If that was the intended meaning of MemNode::memory_type(), wouldn't the function be redundant, because we can retrieve that information from MemNode::adr_type() already?

@merykitty
Copy link
Member

@robcasloz Yes that's right. Then MemNode::memory_type() does not refer to the thing in memory at all, but the thing that is about to interact with the memory. I think:

  • We should rename it to MemNode::value_type() or MemNode::value_basic_type()
  • It is simply incorrect to use it to reason about the thing in the memory in this problem, and using adr_type is the correct fix.

To be clear, I don't think having StoreSNode would solve any issue. I can StoreS into a char[], and StoreC into a short[] and we are back at the same issue.

@robcasloz
Copy link
Contributor Author

@robcasloz Yes that's right. Then MemNode::memory_type() does not refer to the thing in memory at all, but the thing that is about to interact with the memory.

Yes, that matches my understanding.

  • We should rename it to MemNode::value_type() or MemNode::value_basic_type()

I agree, it would be good to do this (in a follow-up RFE). I like MemNode::value_basic_type() best.

It is simply incorrect to use it to reason about the thing in the memory in this problem, and using adr_type is the correct fix.

To be clear, I don't think having StoreSNode would solve any issue. I can StoreS into a char[], and StoreC into a short[] and we are back at the same issue.

I agree that using adr_type() (the solution proposed in this changeset) seems more robust.

The alternative of using memory_type() and introducing a StoreS node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. StoreS into a char[]). If it did, I agree using memory_type() would be incorrect even after introducing a StoreS node. But so far, I haven't found any counterexample, i.e. any way to produce an array-filling loop with such a mismatched store that would be accepted by the array fill optimization. My attempts include using memory segments and Unsafe. In all cases, the array fill analysis in PhaseIdealLoop::match_fill_loop fails to recognize the loops due to different address computation patterns. Do you have any other idea/suggestion to trigger the array fill optimization using mismatched array stores?

@robcasloz
Copy link
Contributor Author

The alternative of using memory_type() and introducing a StoreS node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. StoreS into a char[]).

After some more thought, I lean towards just disabling the OptimizeFill optimization for mismatched stores. It does not succeed today anyway due to accidental reasons (brittleness in pattern matching), so disabling it for this case should not have any other impact than making us more confident in the correctness of the optimization.

@robcasloz
Copy link
Contributor Author

The alternative of using memory_type() and introducing a StoreS node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. StoreS into a char[]).

After some more thought, I lean towards just disabling the OptimizeFill optimization for mismatched stores. It does not succeed today anyway due to accidental reasons (brittleness in pattern matching), so disabling it for this case should not have any other impact than making us more confident in the correctness of the optimization.

Done now, and also added a set of positive and negative test cases (commit 38c9b47) and updated the PR description. @merykitty hopefully this addresses your concerns, please let me know what you think.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robcasloz Nice catch, I'm glad you dug this up and found a reproducer 🥳

Yes, taking the element type from the address is the best, that way you actually depend on the array, not the type of the store.

* @summary Test that loads anti-dependent on array fill intrinsics are
* scheduled correctly, for different load and array fill types.
* See detailed comments in testShort() below.
* @requires vm.compiler2.enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test so expensive that C2 is required? Or can you just put -XX:+IgnoreUnrecognizedVMOptions in the run that has C2 flags?

// Disabling unrolling is necessary for test robustness, otherwise the
// compiler might decide to unroll the array-filling loop instead of
// replacing it with an intrinsic call even if OptimizeFill is enabled.
TestFramework.runWithFlags("-XX:LoopUnrollLimit=0", "-XX:+OptimizeFill");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a run without flags just in case?

// Disabling unrolling is necessary for test robustness, otherwise the
// compiler might decide to unroll the array-filling loop instead of
// replacing it with an intrinsic call even if OptimizeFill is enabled.
TestFramework.runWithFlags("-XX:LoopUnrollLimit=0", "-XX:+OptimizeFill");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a run without flags just in case?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 20, 2025
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and nice tests! The fix looks good to me.

Copy link
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robcasloz My concern is that MemNode::memory_type does not do what it seems to do. I wonder if there are other places misusing this method. The concern is orthogonal to this issue, though.

@eme64
Copy link
Contributor

eme64 commented Mar 21, 2025

@merykitty You are right, MemNode::memory_type is very easy to misunderstand. We could probably rename it, and while doing that check all usages. We have had bugs like this before, I think I had one in SuperWord as well some years ago... What would be a better name though?

Quickly looking at the cases, there are not even that many usages:

emanuel@emanuel-oracle:/oracle-work/jdk-fork0/open$ grep memory_type src/hotspot/share/opto/ -r
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const = 0;
src/hotspot/share/opto/memnode.hpp:    return type2aelembytes(memory_type(), true);
src/hotspot/share/opto/memnode.hpp:    return type2aelembytes(memory_type());
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_BYTE; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_BYTE; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_CHAR; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_SHORT; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_INT; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_LONG; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_FLOAT; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_DOUBLE; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_ADDRESS; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_NARROWOOP; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_NARROWKLASS; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_BYTE; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_CHAR; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_INT; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_LONG; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_FLOAT; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_DOUBLE; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_ADDRESS; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_NARROWOOP; }
src/hotspot/share/opto/memnode.hpp:  virtual BasicType memory_type() const { return T_NARROWKLASS; }
src/hotspot/share/opto/escape.cpp:          // StoreP::memory_type() == T_ADDRESS
src/hotspot/share/opto/escape.cpp:              store->as_Store()->memory_type() == ft) {
src/hotspot/share/opto/vectornode.hpp:  virtual BasicType memory_type() const { return T_VOID; }
src/hotspot/share/opto/vectornode.hpp:  virtual BasicType memory_type() const { return T_VOID; }
src/hotspot/share/opto/memnode.cpp:      if (memory_type() != T_VOID) {
src/hotspot/share/opto/memnode.cpp:          return phase->zerocon(memory_type());
src/hotspot/share/opto/memnode.cpp:                                                                      memory_type(), is_unsigned());
src/hotspot/share/opto/memnode.cpp:      const Type* con_type = Type::make_constant_from_field(const_oop->as_instance(), off, is_unsigned(), memory_type());
src/hotspot/share/opto/superword.cpp:      bt = n->as_Mem()->memory_type();
src/hotspot/share/opto/superword.cpp:        bt = n->as_Mem()->memory_type();
src/hotspot/share/opto/superword.cpp:        is_java_primitive(mem->memory_type())) {
src/hotspot/share/opto/superword.cpp:  if (!is_java_primitive(s1->as_Mem()->memory_type()) ||
src/hotspot/share/opto/superword.cpp:      !is_java_primitive(s2->as_Mem()->memory_type())) {
src/hotspot/share/opto/superword.cpp:    BasicType bt = n->as_Mem()->memory_type();
src/hotspot/share/opto/loopTransform.cpp:  BasicType t = store->as_Mem()->memory_type();
src/hotspot/share/opto/loopTransform.cpp:        if (type2aelembytes(store->as_Mem()->memory_type(), true) != (1 << n->in(2)->get_int())) {
src/hotspot/share/opto/loopTransform.cpp:  BasicType t = store->as_Mem()->memory_type();

Well, I looked through them, and I cannot see any issue with the other cases. But maybe someone else can give the usages a quick look too.

@robcasloz
Copy link
Contributor Author

What would be a better name though?

@merykitty had the suggestions MemNode::value_type() or MemNode::value_basic_type() (see comment above), I like both better than the current name.

@robcasloz
Copy link
Contributor Author

@TobiHartmann @merykitty @eme64 Thanks for reviewing! I will update the tests as suggested by @eme64 and re-run testing over the weekend.
@RealFYang I enabled the new IR tests in test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java on riscv64 because this platform seems to handle array fill intrinsification similarly to x64 and aarch64. Would you like to test it before integration?

@eme64
Copy link
Contributor

eme64 commented Mar 21, 2025

What would be a better name though?

@merykitty had the suggestions MemNode::value_type() or MemNode::value_basic_type() (see comment above), I like both better than the current name.

@merykitty @robcasloz MemNode::value_basic_type() sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that?

@robcasloz
Copy link
Contributor Author

What would be a better name though?

@merykitty had the suggestions MemNode::value_type() or MemNode::value_basic_type() (see comment above), I like both better than the current name.

@merykitty @robcasloz MemNode::value_basic_type() sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that?

Done: JDK-8352620.

@RealFYang
Copy link
Member

@RealFYang I enabled the new IR tests in test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java on riscv64 because this platform seems to handle array fill intrinsification similarly to x64 and aarch64. Would you like to test it before integration?

Hi, Thanks for the ping. Yes, both of the newly-added tests are good on linux-riscv64 platform using fastdebug build. Great!

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 24, 2025
@robcasloz
Copy link
Contributor Author

Hi @eme64, I implemented your test suggestions (commit b59d2eb), please re-review.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robcasloz Thanks for the changes!

Some thoughts about future work on intrinsic fill.

  • It would be nice to enable mismatched cases.
  • And it would be nice to enable not just arrays, but also native memory. That would be especially good for MemorySegments. But not sure how easy this change would be.

Comment on lines 3577 to 3579
if (msg == nullptr && store->as_Mem()->is_mismatched_access()) {
msg = "mismatched store";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What effect does this have?

Ah, it seems to have to do with these comments in your PR:
Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations.

It may be good to leave additional comments in the code here, saying that this is a limitation, and maybe improved in the future. Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done in commit c0b3cf9.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 24, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 24, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 24, 2025
@robcasloz
Copy link
Contributor Author

Thanks for re-reviewing and the additional suggestions @eme64!

@robcasloz
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 24, 2025

Going to push as commit de58009.
Since your change was applied there have been 135 commits pushed to the master branch:

  • a875733: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct
  • 5591f8a: 8351515: C2 incorrectly removes double negation for double and float
  • 56a4ffa: 8352597: [IR Framework] test bug: TestNotCompilable.java fails on product build
  • e23e0f8: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test
  • adfb120: 8351748: Add class init barrier to AOT-cached Method/Var Handles
  • ee1577b: 8352652: [BACKOUT] nsk/jvmti/ tests should fail when nsk_jvmti_setFailStatus() is called
  • df9210e: 8347706: jvmciEnv.cpp has jvmci includes out of order
  • 5dd0acb: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled
  • 334a1ee: 8351375: nsk/jvmti/ tests should fail when nsk_jvmti_setFailStatus() is called
  • 7442039: 8337279: Share StringBuilder to format instant
  • ... and 125 more: https://git.openjdk.org/jdk/compare/4412c079fccefbb235b22651206089f5bac47d18...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 24, 2025
@openjdk openjdk bot closed this Mar 24, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 24, 2025
@openjdk
Copy link

openjdk bot commented Mar 24, 2025

@robcasloz Pushed as commit de58009.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

7 participants