New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() #3689
Conversation
|
Webrevs
|
Hi Roland, I didn't look at this in detail yet but gave it a quick run through our testing. I'm seeing many of the following failures with the
|
Thanks for running the tests. I will work on the failures. |
That issue should be fixed. I added a CastVV node for vectors. |
Thanks, testing looks good now. I need some more time to review this though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to review but looks reasonable to me. Performance and correctness testing also looks good.
src/hotspot/share/opto/loopopts.cpp
Outdated
if (u_loop->_child) { | ||
if (useblock == u_loop->_head && u_loop->_head->is_OuterStripMinedLoop()) { | ||
return u_loop->_head->in(LoopNode::EntryControl); | ||
Node* PhaseIdealLoop::place_near_use(Node* useblock, IdealLoopTree* loop) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the name and comment should be adjusted since we no longer place it next to the use but right outside of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Updated.
@rwestrel This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 94 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Thanks for reviewing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me.
Did we got performance results for it?
Yes, I've linked it to the bug. |
@vnkozlov thanks for the review |
/integrate |
@rwestrel Since your change was applied there have been 141 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 9d305b9. |
Sinking data nodes out of a loop when all uses are out of a loop has
several issues that this attempts to fix.
1- Only non control uses are considered which makes little sense (why
not sink if the data node is an argument to a call or a returned
value?)
2- Sinking of Loads is broken because of the handling of
anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control
in the loop because it takes all uses into account.
3- For data nodes for which a control edge can't be set, commoning of
clones back in the loop is prevented with:
_igvn._worklist.yank(x);
which gives no guarantee
This patch tries to address all issues:
1- it looks at all uses, not only non control uses
2- anti-dependences are computed for each use independently
3- Cast nodes are used to pin clones out of loop
2- requires refactoring of the PhaseIdealLoop::get_late_ctrl()
logic. While working on this, I noticed a bug in anti-dependence
analysis: when the use is a cfg node, the code sometimes looks at uses
of the memory state of the cfg. The logic uses the use of the cfg
which is a projection of adr_type identical to the cfg. It should
instead look at the use of the memory projection.
The existing logic for sinking loads calls clear_dom_lca_tags() for
every load which seems like quite a waste. I added a
_dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By
incrementing _dom_lca_tags_round, new tags that don't conflict with
existing ones are produced and there's no need for
clear_dom_lca_tags().
For anti-dependence analysis to return a correct result, early control
of the load is needed. The only way to get it at this stage, AFAICT,
is to compute it by following the load's input until a pinned node is
reached.
The existing logic pins cloned nodes next to their use. The logic I
propose pins them right out of the loop. This could possibly avoid
some redundant clones. It also makes some special handling for corner
cases with loop strip mining useless.
For 3-, I added extra Cast nodes for float types. If a chain of data
nodes are sunk, the new logic tries to keep a single Cast for the
entire chain rather than one Cast per node.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3689/head:pull/3689
$ git checkout pull/3689
Update a local copy of the PR:
$ git checkout pull/3689
$ git pull https://git.openjdk.java.net/jdk pull/3689/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3689
View PR using the GUI difftool:
$ git pr show -t 3689
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3689.diff