Skip to content

Conversation

benoitmaillard
Copy link
Contributor

@benoitmaillard benoitmaillard commented Sep 11, 2025

This PR introduces a fix for wrong results caused by missing Store nodes in C2 IR due to incorrect wiring in PhaseIdealLoop::insert_post_loop.

Context

The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (x = 0) to be ignored:

    static public void test() {
        x = 0;
        for (int i = 0; i < 20000; i++) {
            x += i;
        }
        x = 0;
    }

After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an OuterStripMinedLoop node. Store node are moved out of the inner loop when encountering this pattern, and the associated Phi nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in JDK-8356708 by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.

This PR aims at addressing the specific case of the wrong wiring of Store nodes in post loops, but on the longer term further investigations into the missing Phi node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).

Detailed Analysis

In PhaseIdealLoop::create_outer_strip_mined_loop, a simple CountedLoop is turned into a nested loop with an OuterStripMinedLoop. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move Store nodes after the inner loop in PhaseIdealLoop::try_move_store_after_loop. When the Store node is moved to the outer loop, we also get rid of its input Phi node in order not to confuse loop optimizations happening later.

This only becomes a problem in PhaseIdealLoop::insert_post_loop, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use Phi nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have Phi nodes for the moved Store nodes, their memory inputs may end up being incorrect.

This is what the IR looks like after the creation of the post loop in our reproducer:

image

On the screenshot, node 118 StoreI takes directly 24 StoreI as memory input, even though it is obvious that 96 CountedLoopEnd (to which 73 NodeI is attached) is a predecessor of 114 CountedLoopEnd in the CFG.

After that, we observe a succession of IGVN optimizations that eventually lead to the generation of wrong code:

  • The IfFalse projection of 128 If becomes dead, as the the post loop is always executed (number of iterations is known)
  • 121 Region and 123 Phi are subsequently eliminated (as a result of the dead path)
  • Because the Phi disappeared, 118 StoreI becomes the memory input of 89 StoreI
  • 118 StoreI is eliminated because it is directly followed by a write at the same memory location
  • 89 StoreI is replaced by 24 StoreI as an Identity optimizations because it is stores the same value at the same location

Node 89 StoreI corresponds to the last x = 0 assignment, and its elimination directly causes the wrong result (the store node from the OuterStripMinedLoop remains, as it is used by the safepoint).

Proposed Fix

As mentioned previously, the impact of the missing Phi nodes need to be investigated further, as it it likely that this causes other bugs in the compilation process. This is a "local fix" for the specific issue of Store nodes moved out of the inner loop.

The approach here is to do the wiring directly in PhaseIdealLoop::insert_post_loop, right after having done the usual rewiring based on the Phi nodes. As the conditions for moving Store nodes out of the loop are quite restrictive, the pattern is predictable: Store nodes are attached to the false projection of the inner CountedLoopEnd, right before the safepoint in the CFG.

In the simplest case, the memory input of new version of the store node is outside of the loop body. In the cloned node, we change it to point to its original version instead (as the original store is always executed before).

It may also be that the memory input of the new node points to another memory node in the loop body. This can happen in the case where we have:

for (int i = 0; i < 20000; i++) {
    a1.field += i;
    a2.field += i;
}

Here, the second store has the first one as memory input, as a1 and a2 may be aliases. In this case, we only need to change the memory input of the first store in the chain, and it needs to point to the last memory node in the chain in the original version of the loop.

Testing

Thank you for reviewing!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225
$ git checkout pull/27225

Update a local copy of the PR:
$ git checkout pull/27225
$ git pull https://git.openjdk.org/jdk.git pull/27225/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27225

View PR using the GUI difftool:
$ git pr show -t 27225

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27225.diff

Using Webrev

Link to Webrev Comment

Clean up and add comments

Move the store rewiring out of the phi loop, use loop->tail() instead

Add comments in src
Add two more test cases

Rename test and improve headers

Remove useless space in test
@bridgekeeper
Copy link

bridgekeeper bot commented Sep 11, 2025

👋 Welcome back bmaillard! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 11, 2025

@benoitmaillard This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop

Reviewed-by: mhaessig, roland

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 606 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eme64, @mhaessig, @rwestrel) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot changed the title 8364757 8364757: JavaFuzzer test Test_54.java fails with wrong result after 8280320 Sep 11, 2025
@openjdk
Copy link

openjdk bot commented Sep 11, 2025

@benoitmaillard The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 11, 2025
@benoitmaillard benoitmaillard changed the title 8364757: JavaFuzzer test Test_54.java fails with wrong result after 8280320 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop Sep 11, 2025
@benoitmaillard benoitmaillard changed the title 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop Sep 11, 2025
@benoitmaillard benoitmaillard marked this pull request as ready for review September 12, 2025 07:20
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 12, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 12, 2025

@rwestrel
Copy link
Contributor

Not a review but a comment on the missing Phis. Your description makes it sound like if the OuterStripMinedLoop was created with Phis from the start, there would be no issue. That's no true AFAICT. The current logic for pre/main/post loops creation would simply not work because it doesn't expect the Phis and it would need to be extended so things are rewired correctly with the outer loop Phis. The inner loop would still have no Phi for the sunk store. So the existing logic, once fixed, would not find it either and you would need some new logic to find it maybe using the outer loop Phis. The current shape of the outer loop (without the Phis) is very simple and there's only one location where the Store can be (on the exit projection of the inner loop right above the safepoint which is right below the exit of the inner loop and can't be anywhere else). So you added logic to find the Store relying on the current shape of the outer loop. If the outer loop had Phis, some alternate version of that logic could be used. They seem like 2 ways of doing the same thing to me and nothing tells us one is better than the other. In short, I don't find this bug a good example of something that would work better if we had Phis on the outer loop. I wouldn't say the root cause is that we don't have Phis on the outer loop either.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @benoitmaillard !

And thanks for all the explanations.

It seems the missing Phi at the OuterStripMinedLoop are a decision that implies that Stores will just sort of "hang" between loop exit and SafePoint. That is now the new "invariant". Fine for now, but we may want to reconsider adding the Phi for the OuterStripMinedLoop eventually.

I have read through the PR, and was a little confused about names, so bear with my comments 😅

On the algo level I was wondering if it is possible to have a chain of stores between the exit and SafePoint? Do you have such examples?

Comment on lines 1383 to 1384
// Find the last memory node in the loop when following memory usages
Node *find_mem_out_outer_strip_mined(Node* store, IdealLoopTree* outer_loop);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the method is a bit confusing. And the comment seems to suggest something different than what the code says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name was really bad indeed, sorry for that. I have renamed it to find_last_store_in_outer_loop, and added a comment to explain why we have the guarantee of a linear graph here.

Comment on lines 1671 to 1692
Node* PhaseIdealLoop::find_mem_out_outer_strip_mined(Node* store, IdealLoopTree* outer_loop) {
Node* out = store;
// Follow the memory uses until we get out of the loop
while (true) {
Node* unique_next = nullptr;
for (DUIterator_Fast imax, l = out->fast_outs(imax); l < imax; l++) {
Node* next = out->fast_out(l);
if (next->is_Mem() && next->in(MemNode::Memory) == out) {
IdealLoopTree* output_loop = get_loop(get_ctrl(next));
if (outer_loop->is_member(output_loop)) {
assert(unique_next == nullptr, "memory node should only have one usage in the loop body");
unique_next = next;
}
}
}
if (unique_next == nullptr) {
break;
}
out = unique_next;
}
return out;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)

Below, in a code comment you say that this method does:
Find the last memory node in the loop when following memory usages

What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?

store
 |
 +--------+
 |        |
store   store
 |        |
 +---+ +--+
     | |
     phi
      |
    store -> the last one in the loop

I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?

I'm also not sure how to parse the method name:
find_mem_out_outer_strip_mined

  • find "mem out" outer-strip-mined <loop?>
  • find mem outside of outer-strip-mined loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we would trigger your assert if we found a branch:
assert(unique_next == nullptr, "memory node should only have one usage in the loop body");

Now we usually only do pre-main-post for relatively small loop bodies, see LoopUnrollLimit. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I have been misled by the names / comments.
You are really looking for the last store in the outer_loop. And we do have the guarantee of a linear memory graph because it is the one between if_false and SafePoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better method name would help a lot ;)

Copy link
Contributor Author

@benoitmaillard benoitmaillard Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?

This actually cannot happen because of the conditions in PhaseIdealLoop::try_move_store_after_loop. There, before moving the store, we make sure that any user of the store is either:

  • the Phi node attached to the loop head
  • outside of the loop body

This means we cannot have any branch (though we can have chains), and it guarantees that the memory subgraph is linear within the loop body.

Now we usually only do pre-main-post for relatively small loop bodies, see LoopUnrollLimit. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?

I think the answer is the same here, it really depends on PhaseIdealLoop::try_move_store_after_loop.

Comment on lines 1677 to 1679
Node* next = out->fast_out(l);
if (next->is_Mem() && next->in(MemNode::Memory) == out) {
IdealLoopTree* output_loop = get_loop(get_ctrl(next));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the names for next and output_loop consistent. Maybe next_loop? Or just call them use and use_loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have changed it to use and use_loop.

for (DUIterator j = if_false->outs(); if_false->has_out(j); j++) {
Node* store = if_false->out(j)->isa_Store();
// We don't make changes if the memory input is in the loop body as well
if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

No implicit null or zero checks, see hotspot style guide ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the ! by swapping things around?
get_loop(get_ctrl(store->in(MemNode::Memory))))->is_member(outer_loop)
Does not look that much better either... hmm.

Copy link
Contributor Author

@benoitmaillard benoitmaillard Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No implicit null or zero checks, see hotspot style guide ;)

Missed that, thanks for the reminder!

The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the ! by swapping things around?

I personally think it looks more intuitive with the !, but I agree it is a bit convoluted. I have added an intermediate variable to make it more readable.

const Node* if_false = loop->tail()->in(0)->as_BaseCountedLoopEnd()->proj_out(false);
for (DUIterator j = if_false->outs(); if_false->has_out(j); j++) {
Node* store = if_false->out(j)->isa_Store();
// We don't make changes if the memory input is in the loop body as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I suppose that is because there must be a Phi in the loop then, right? Maybe state that in the comment here.

Copy link
Contributor Author

@benoitmaillard benoitmaillard Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a memory input that is outside of the loop body is the situation where we would normally expect a Phi, and this is where we would like to intervene.

If the memory input is in the loop body as well, we can safely assume it is still correct as the whole body get cloned as a unit.

I have updated the comment, I hope it is clearer now.

Comment on lines 1795 to 1797
Node* mem_out = find_mem_out_outer_strip_mined(store, outer_loop);
Node* store_new = old_new[store->_idx];
store_new->set_req(MemNode::Memory, mem_out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that there are multiple stores in a chain after the loop exit and before the SafePoint?

Loop
Exit
store1
store2
store3
SafePoint

If so, they all have the same control, namely at the if_false.
Their memory state should be ordered, where store2 depends on store1 and store3 on store2. Only store1 should then really have its memory input updated.

Your code now finds the store_new for each of store1, store2 and store3, and sets all of their memory inputs to mem_out. But that means that the "new" stores all have the same memory input, and are not in a chain any more. Did I see this right? Is that ok?

Copy link
Contributor Author

@benoitmaillard benoitmaillard Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can happen. This is actually what we test with the last test case (test3), and this is why we have the following:

// We don't make changes if the memory input is in the loop body as well
if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

In this case, the condition is only true for store1 (as its memory input would be last memory operation before the loop, or the memory Parm), but not for store2 nor store3. We would only end up rewiring store1, and leave store2 and store3 as they are.
Does that make sense?

Comment on lines 68 to 77
static public void test3(A a1, A a2) {
a1.field = 0;
a2.field = 0;
for (int i = 0; i < 20000; i++) {
a1.field += i;
a2.field += i;
}
a1.field = 0;
a2.field = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the field stores both float out of the loop, and end up in a chain between exit and safepoint? Might be nice to add some comments to these tests so we can see what examples you already cover and if we might need some more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the entire chain floats out of the loop (each store is moved successively). I have added some comments about the structure that we are trying expose, and changed the test slightly as well.

@benoitmaillard
Copy link
Contributor Author

@eme64 Thanks a lot for your detailed comments, this is really helpful. I have tried to address all of them, let me you what you think once you get the chance.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this and for the clear analysis of this tricky issue, @benoitmaillard!

Your solution seems good, but I have a few coding suggestions below.

@benoitmaillard
Copy link
Contributor Author

Thank you for the review @mhaessig. I have adressed your comments, let me know what you think.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, looks good to me.

// is in the loop body as well, then we can safely assume it is still correct as the entire
// body was cloned as a unit
IdealLoopTree* input_loop = get_loop(get_ctrl(store->in(MemNode::Memory)));
if (!outer_loop->is_member(input_loop)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Actually, I wonder if a new method that also does the get_ctrl() (or ctrl_or_self()), wouldn't be useful given that pattern must be quite common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It seems there are a lot of occurrences indeed, maybe I should address this in a separate RFE. Btw, it seems we could also change the return type of PhaseIdealLoop::is_member from int to bool, to stay consistent with IdealLoopTree::is_member.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, no reason for a return type of int. Sure, a separate RFE works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have filed JDK-8369002.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my comments. Looks good.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change. Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 1, 2025
@benoitmaillard
Copy link
Contributor Author

benoitmaillard commented Oct 1, 2025

Thank you for your review @rwestrel! And apologies for not replying to your comment earlier, I saw it right before leaving on vacation and then forgot. I agree with what you said, and I may have overlooked that aspect while writing my explanation. Thanks for clearing that out.

@benoitmaillard
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 3, 2025
@openjdk
Copy link

openjdk bot commented Oct 3, 2025

@benoitmaillard
Your change (at version 73ee954) is now ready to be sponsored by a Committer.

@mhaessig
Copy link
Contributor

mhaessig commented Oct 3, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Oct 3, 2025

Going to push as commit 7231916.
Since your change was applied there have been 606 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 3, 2025
@openjdk openjdk bot closed this Oct 3, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 3, 2025
@openjdk
Copy link

openjdk bot commented Oct 3, 2025

@mhaessig @benoitmaillard Pushed as commit 7231916.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants