Skip to content

Conversation

@chhagedorn
Copy link
Member

@chhagedorn chhagedorn commented Jan 13, 2025

Failing Assert

The failing assert in PhaseCFG::schedule_late() checks the following:

// Assert that memory writers (e.g. stores) have a "home" block (the block
// given by their control input), and that this block corresponds to their
// earliest possible placement. This guarantees that
// hoist_to_cheaper_block() will always have at least one valid choice.
if (self->is_memory_writer()) {
assert(find_block_for_node(self->in(0)) == early,
"The home of a memory writer must also be its earliest placement");
}

In the test case, this is violated for 87 storeI:
image

The early block early for 87 storeI is bound by 115 loadI pinned at 161 Region which is dominated by the control input 146 Region of 87 storeI. This lets the assert fail.

How Did 115 loadI End up Being Pinned below 87 storeI?

Before Pre/Main/Post Loop Creation

Before the creation of pre/main/post loops, we have the following graph:

image

Everything looks fine: The control input of 312 StoreI (which is eventually cloned and becomes 87 storeI in the Mach graph) corresponds to the early placement of the store. 415 LoadI was hoisted out of the loop during Loop Predication and is pinned above at a Template Assertion Predicate.

Pre/Main/Post Loop Creation

Post Loop Body Creation

During the creation of pre/main/post loops, we clone the main loop body for the post loop body:

image

We notice that 312 StoreI is pinned on the main loop backedge. When finishing the last iteration from the main loop and possibly continuing in the post loop, we need to feed everything on the loop backedge of the main loop to the post loop. However, the pinned nodes on the main loop backedge cannot float. Therefore, we need to create new copies of these pinned nodes with PhaseIdealLoop::clone_up_backedge_goo().

The pins are updated to the entry of the post loop. All inputs into these pinned nodes that have their current control (fetched with get_ctrl()) on the main loop backedge as well are also cloned but keep their control inputs (if any) if it's not the loop backedge.

In our example, this applies to 453 StoreI -> 479 StoreI, and some inputs recursively (454 AddI -> 482 AddI, 481 LoadI -> 541 Load):

image

Still, all looks fine. Notice that the clone 481 LoadI of 455 LoadI is currently still pinned at the same Template Assertion.

Assertion Predicate Creation

In the next step, we create new Assertion Predicates at the post loop and rewire any data nodes control dependent on Assertion Predicates down to the post loop - including the new 481 LoadI from PhaseIdealLoop::clone_up_backedge_goo():

image

This creates the graph shape with which we are then later failing during scheduling in the backend: The control input of 479 StoreI further up in the graph as the actual early block limited by 481 LoadI pinned at 493 IfTrue.

Same Problem with clone_up_backedge_goo() for Main Loop?

The very same problem could theoretically also be observed for the main loop when creating the pre loop. But it is not due to how we implemented the rewiring of data nodes when creating new Assertion Predicates:

After the pre loop is created, the old Assertion Predicates are above the pre loop and actually need to be established at the main loop. Therefore, all data nodes control dependent on Assertion Predicates and belonging to the main loop need to be rewired. In our test case, this is 415 LoadI (original node) and 540 LoadI (cloned node by clone_up_backedge_goo() actually belonging to main loop):

image

Check If Data Belongs to Main Loop

Since the pre loop only contains cloned nodes we do the following trick to determine if a node belongs to the main loop (implemented here):

Store index IDX for the next newly created node just before pre loop creation.
For any data node dependency n:
  Is index of n < IDX? -> Not a node in the pre loop
    Is there a clone of n with index >= IDX? -> Clone is in pre loop and thus original node in main loop  

Cloned Nodes with clone_up_backedge_goo() Mess with "Node inside Main Loop" Check

Since the cloned nodes in clone_up_backedge_goo() are originally from pre loop nodes, our check will fail and we do not rewire these nodes, even though they belong to the main loop:

"540 LoadI < IDX" does not hold
=> we conclude 540 LoadI is a cloned node belonging to the pre loop and not the main loop 

Applied to our test case, we have the following after clone_up_backedge_goo():

image

We can see that 540 LoadI, cloned by clone_up_backedge_goo(), is still pinned before the pre loop because we have not rewired it and thus scheduling does not fail with the assert.

Even though I could not trigger a failure, I think it is an incorrect pin since the 540 LoadI belongs to the main loop.

Proposed Fix

  • Rewire any nodes created by clone_up_backedge_goo() which are pinned to the original loop entry before Assertion Predication to the new loop entry after Assertion Predicate creation. The new loop entry will be the the tail of the last Assertion Predicate (if any).
  • Update data node rewiring in Assertion Predication processing to also consider nodes from clone_up_backedge_goo() correctly. I've implemented a new NodeInMainLoopBody class for that purpse.

Why not just Add Assertion Predicates First?

This does not work straight forward because we do not know the init value before applying clone_up_backedge_goo() which is interleaved with updating the phi nodes. I've decided to go with the proposed fix instead.

Testing:

  • tier1-7
  • hs-precheckin-comp
  • hs-comp-stress

Deferring to JDK 25?

This seems to be an edge case (only found with fuzzing) and it's not entirely clear to me what the impact on product builds is. However, this is a regression in JDK 24 and should be considered to be fixed in JDK 24. But this fix became somewhat more complex to understand and implement. First applying the fix to JDK 25, letting it bake and then only considering it for an update release of JDK 25 could be a possible option I think. Opinions are welcomed.

Thanks,
Christian


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8347018: C2: Insertion of Assertion Predicates ignores the effects of PhaseIdealLoop::clone_up_backedge_goo() (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23071/head:pull/23071
$ git checkout pull/23071

Update a local copy of the PR:
$ git checkout pull/23071
$ git pull https://git.openjdk.org/jdk.git pull/23071/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23071

View PR using the GUI difftool:
$ git pr show -t 23071

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23071.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 13, 2025

👋 Welcome back chagedorn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 13, 2025

@chhagedorn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8347018: C2: Insertion of Assertion Predicates ignores the effects of PhaseIdealLoop::clone_up_backedge_goo()

Reviewed-by: epeter, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 14 new commits pushed to the master branch:

  • c207cc7: 8347923: Parallel: Simplify compute_survivor_space_size_and_threshold
  • 4b4b1e9: 8347922: Remove runtime/cds/appcds/customLoader/HelloCustom_JFR.java from ProblemList.txt
  • e7a1c86: 8217914: java/net/httpclient/ConnectTimeoutHandshakeSync.java failed on connection refused while doing POST
  • 644d154: 8347474: Options singleton is used before options are parsed
  • 3804082: 8346123: [REDO] NMT should not use ThreadCritical
  • 1f0efc0: 8347343: RISC-V: Unchecked zicntr csr reads
  • ca8ba5c: 8347366: RISC-V: Add extension asserts for CMO instructions
  • 0ff6700: 8347987: Bad ifdef in 8330851
  • e1cf351: 8348013: [doc] fix typo in java.md caused by JDK-8347763
  • 6ef860c: 8332857: Test vmTestbase/nsk/jvmti/GetThreadCpuTime/thrcputime002/TestDescription.java failed
  • ... and 4 more: https://git.openjdk.org/jdk/compare/2c41f5adbfcebb057c2ffc8396729bdd1c100079...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Jan 13, 2025

@chhagedorn The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jan 13, 2025
Comment on lines -1445 to -1453
// Nodes inside the loop may be control dependent on a predicate
// that was moved before the preloop. If the back branch of the main
// or post loops becomes dead, those nodes won't be dependent on the
// test that guards that loop nest anymore which could lead to an
// incorrect array access because it executes independently of the
// test that was guarding the loop nest. We add a special CastII on
// the if branch that enters the loop, between the input induction
// variable value and the induction variable Phi to preserve correct
// dependencies.
Copy link
Member Author

@chhagedorn chhagedorn Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that this comment block should have been removed earlier with JDK-8334724 which removed the cast node. I squeezed this in here - probably not worth a separate task.

@chhagedorn chhagedorn marked this pull request as ready for review January 13, 2025 14:01
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 13, 2025
@mlbridge
Copy link

mlbridge bot commented Jan 13, 2025

Webrevs

// clone from 'node' (i.e. _old_new entry is non-null). Then we know that 'node' belongs to the original loop body.
// Additionally check if a node was cloned after the pre loop was created. This indicates that it was created by
// PhaseIdealLoop::clone_up_backedge_goo(). These nodes should also be pinned at the main loop entry.
bool check(Node* node) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be more meaningful method's name? And I did not find where it is used.

Copy link
Member Author

@chhagedorn chhagedorn Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the interface name is NodeInLoopBody, I went with NodeInLoopBody::check(). But could also rename it to check_node_in_loop_body() for better readability. Pushed an update.

It is used here:

if (!output->is_CFG() && data_in_loop_body.check(output)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused because it was not use in changes but I now see that it is virtual method.

@vnkozlov
Copy link
Contributor

I agree with deferring it to JDK 25 and backport into JDK 24 update release after some time.

@chhagedorn
Copy link
Member Author

I agree with deferring it to JDK 25 and backport into JDK 24 update release after some time.

Thanks for your feedback Vladimir! Then let's target this to JDK 25.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix seems fine to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 14, 2025
@chhagedorn
Copy link
Member Author

Thanks Vladimir for your review!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably adjust the title of the PR / Bug.

Before I review, I have some understanding questions:
Why is it a problem that the control the store dominates the control of the load? Could such a constellation not happen in other circumstances too?
Could we not come up with some case where the store has its control moved up above the control of the load somehow - or is that generally not supposed to happen?
I suppose I am wondering why can GCM not just handle such cases?

@chhagedorn chhagedorn changed the title 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement 8347018: C2: Insertion of Assertion Predicates ignores the effects of PhaseIdealLoop::clone_up_backedge_goo() Jan 17, 2025
@chhagedorn
Copy link
Member Author

chhagedorn commented Jan 17, 2025

Thanks Emanuel for your questions, let's first make a step back. This patch essentially fixes two issues:

  1. "Cloned Nodes with clone_up_backedge_goo() Mess with "Node inside Main Loop" Check" (see section above):
    Regardless of the observed assertion failure, I think the cloned nodes only belonging to the main loop should not be pinned before the pre loop. I could not come up with a failing case but I think we should fix this (done with NodeInMainLoopBody class).
  2. The observed assert where the control input of a store does not match its early block in GCM:
    The main part of the fix ensures that the cloned nodes from clone_up_backedge_goo() actually end up at the loop entry as originally assumed by the method.

Why is it a problem that the control the store dominates the control of the load? Could such a constellation not happen in other circumstances too? Could we not come up with some case where the store has its control moved up above the control of the load somehow - or is that generally not supposed to happen?

The assert was introduced around 3 years ago and never failed before. It sounds like a condition that is always met - until discovered now. But when closer looking at the the original intention of clone_up_backedge_goo() to pin the cloned nodes at the loop entry, it should still ensure that this assert would hold. But Assertion Predicates now mess with that since we inject additional Ifs between the cloned nodes and the loop entry.

The comment at the failing assert suggests that this invariant is required such that hoist_to_cheaper_block() works properly. However, I'm unclear about the impact - whether it's just missing some optimization or possibly leading to wrong code or a crash. Maybe @robcasloz can comment on that who introduced the assert.

I've tried to play around to create such a situation without Assertion Predicates but could not find a case how we could violate the assert - but of course, that does not prove anything.

I suppose I am wondering why can GCM not just handle such cases?

I think that's also a possible option. I've decided to do the fix at the Assertion Predicates creation point for the following reasons:

  • Without Assertion Predicates, we have not seen this assert fail.
  • Assertion Predicates are violating the guarantees that clone_up_backedge_goo() wants to promise: The cloned nodes are not ending up at the loop entry. I'm not sure who else is indirectly relying on this. We do not seem to have traced anything back to this, yet, but we also cannot be sure if there are other hidden problems.
  • I'm not sure how difficult it will be to fix GCM and what the impact is. When assuming that Assertion Predicates should have been inserted correctly, without messing with the effects of clone_up_backedge_goo(), we have never seen a case where GCM should be able to support this case.
  • Since I first considered to get this into JDK 24, fixing the new Assertion Predicate code seemed more straight forward and less risky.

For these reasons, I've chosen the current fix idea without trying to change the current behavior of GCM. What are your thoughts about that?

You should probably adjust the title of the PR / Bug.

Good point, even though it's addressing two problems, I could just make it more obvious what the fixes are about. Changed!

…dAboveAssertionPredicatesAndUsingStore.java

Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jan 17, 2025
@eme64
Copy link
Contributor

eme64 commented Jan 17, 2025

Regardless of the observed assertion failure, I think the cloned nodes only belonging to the main loop should not be pinned before the pre loop

That sounds like a good argument. I will review the code now ;)

Thanks for all the extra explanations 😊

@chhagedorn
Copy link
Member Author

Regardless of the observed assertion failure, I think the cloned nodes only belonging to the main loop should not be pinned before the pre loop

That sounds like a good argument. I will review the code now ;)

Thanks for all the extra explanations 😊

Sure, you're welcome! :-)

Sounds good, thanks a lot!

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable :)

Comment on lines 1728 to 1730
const NodeInMainLoopBody node_in_original_loop_body(first_node_index_in_pre_loop_body,
last_node_index_in_pre_loop_body, old_new);
create_assertion_predicates_at_main_or_post_loop(pre_loop_head, main_loop_head, node_in_original_loop_body, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const NodeInMainLoopBody node_in_original_loop_body(first_node_index_in_pre_loop_body,
last_node_index_in_pre_loop_body, old_new);
create_assertion_predicates_at_main_or_post_loop(pre_loop_head, main_loop_head, node_in_original_loop_body, true);
const NodeInMainLoopBody node_in_main_loop_body(first_node_index_in_pre_loop_body,
last_node_index_in_pre_loop_body, old_new);
create_assertion_predicates_at_main_or_post_loop(pre_loop_head, main_loop_head, node_in_main_loop_body, true);

Would that make sense now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Updated.

// Rewire any control dependent nodes on the old target loop entry before adding Assertion Predicate related nodes.
// These have been added by PhaseIdealLoop::clone_up_backedge_goo() and assume to be ending up at the target loop entry
// which is no longer the case when adding additional Assertion Predicates. Fix this by rewiring these nodes to the new
// target loop entry which corresponds to the tail of the last Assertion Predicate before the target loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this introduce any circular dependency? I.e. that the Assertion Predicates have dependencies on the control dependencies that we just moved down? -> You could add a comment here why that is not possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is possible. I've added an additional comment.

bool check_node_in_loop_body(Node* node) const override {
if (node->_idx < _first_node_index_in_cloned_loop_body) {
Node* cloned_node = _old_new[node->_idx];
return cloned_node != nullptr && cloned_node->_idx >= _first_node_index_in_cloned_loop_body;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case would we return false here? Can you add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

return cloned_node != nullptr && cloned_node->_idx >= _first_node_index_in_pre_loop_body;
}
// Created in PhaseIdealLoop::clone_up_backedge_goo()?
return node->_idx > _last_node_index_in_pre_loop_body;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable now, but I'm a little afraid that this could be fragile in the future.
Hmm, not sure what to do.

You put a lower bound here. Could we also have an upper bound? Just in case somebody decides to add more nodes in the meantime ... and then you would return true here as well, which would probably be wrong?
Maybe there could also be some assert, but I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good input. Checking the origin from clones is difficult. But we could do it over storing node indices. I've added some assertion code with checking the last node index used in clone_up_backedge_goo(). That feels more robust. Let me know what you think.

Copy link
Member Author

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review! I've addressed your comments.

Comment on lines 1728 to 1730
const NodeInMainLoopBody node_in_original_loop_body(first_node_index_in_pre_loop_body,
last_node_index_in_pre_loop_body, old_new);
create_assertion_predicates_at_main_or_post_loop(pre_loop_head, main_loop_head, node_in_original_loop_body, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Updated.

// Rewire any control dependent nodes on the old target loop entry before adding Assertion Predicate related nodes.
// These have been added by PhaseIdealLoop::clone_up_backedge_goo() and assume to be ending up at the target loop entry
// which is no longer the case when adding additional Assertion Predicates. Fix this by rewiring these nodes to the new
// target loop entry which corresponds to the tail of the last Assertion Predicate before the target loop.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is possible. I've added an additional comment.

bool check_node_in_loop_body(Node* node) const override {
if (node->_idx < _first_node_index_in_cloned_loop_body) {
Node* cloned_node = _old_new[node->_idx];
return cloned_node != nullptr && cloned_node->_idx >= _first_node_index_in_cloned_loop_body;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

return cloned_node != nullptr && cloned_node->_idx >= _first_node_index_in_pre_loop_body;
}
// Created in PhaseIdealLoop::clone_up_backedge_goo()?
return node->_idx > _last_node_index_in_pre_loop_body;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good input. Checking the origin from clones is difficult. But we could do it over storing node indices. I've added some assertion code with checking the last node index used in clone_up_backedge_goo(). That feels more robust. Let me know what you think.

Comment on lines 1445 to 1446
const uint last_node_index_in_pre_loop_body = Compile::current()->unique() - 1;
assert(post_head->in(1)->is_IfProj(), "must be zero-trip guard If node projection of the post loop");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow moved this down after testing - I cannot remember why. It should be further up. Fixed here as well.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks for the updates!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 20, 2025
@chhagedorn
Copy link
Member Author

Thanks Emanuel for your review!

/integrate

@openjdk
Copy link

openjdk bot commented Jan 20, 2025

Going to push as commit 8a83dc2.
Since your change was applied there have been 15 commits pushed to the master branch:

  • 85fdd2c: 8347434: Richer VM operations events logging
  • c207cc7: 8347923: Parallel: Simplify compute_survivor_space_size_and_threshold
  • 4b4b1e9: 8347922: Remove runtime/cds/appcds/customLoader/HelloCustom_JFR.java from ProblemList.txt
  • e7a1c86: 8217914: java/net/httpclient/ConnectTimeoutHandshakeSync.java failed on connection refused while doing POST
  • 644d154: 8347474: Options singleton is used before options are parsed
  • 3804082: 8346123: [REDO] NMT should not use ThreadCritical
  • 1f0efc0: 8347343: RISC-V: Unchecked zicntr csr reads
  • ca8ba5c: 8347366: RISC-V: Add extension asserts for CMO instructions
  • 0ff6700: 8347987: Bad ifdef in 8330851
  • e1cf351: 8348013: [doc] fix typo in java.md caused by JDK-8347763
  • ... and 5 more: https://git.openjdk.org/jdk/compare/2c41f5adbfcebb057c2ffc8396729bdd1c100079...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 20, 2025
@openjdk openjdk bot closed this Jan 20, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 20, 2025
@openjdk
Copy link

openjdk bot commented Jan 20, 2025

@chhagedorn Pushed as commit 8a83dc2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants