New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8259609: C2: optimize long range checks in long counted loops #2045
Conversation
|
Webrevs
|
Hi Roland, I haven't looked at the code in detail yet but submitted some quick testing. Unfortunately, there are failures with an internal test that I'm unable to share. Some details:
Hope that helps. |
Thanks. I pushed a fix. |
I think you forgot to push the fix. |
Indeed. Pushed now. |
I'm now seeing this with the same test at a higher tier (and therefore different flags):
|
This reverts commit e234477.
Fixed and added a test case that covers this issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass over the changes and added some comments but I need more time to review.
bool IdealLoopTree::is_range_check_if(IfNode *iff, PhaseIdealLoop *phase, Invariance& invar) const { | ||
bool IdealLoopTree::is_range_check_if(IfNode *iff, PhaseIdealLoop *phase, BasicType bt, Node *iv, Node *&range, | ||
Node *&offset, | ||
jlong &scale) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline should be removed.
return false; | ||
} | ||
if (offset && !invar.is_invariant(offset)) { // offset must be invariant | ||
return false; | ||
} | ||
return true; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is hard to read because the indentation of above lines is broken. Some *
could also be moved to the type.
if (p_offset != NULL) { | ||
if (*p_scale == min_signed_integer(bt)) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it suspicious that this edge case needs to be handled here. Could you explain why and add a corresponding comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot to address this comment?
src/hotspot/share/opto/loopnode.cpp
Outdated
} | ||
// Because RCE opportunities can be masked by split_thru_phi, | ||
// look for RCE candidates and inhibit split_thru_phi | ||
// on just their loop-phi's for this pass of loop opts | ||
if (SplitIfBlocks && do_split_ifs) { | ||
AutoNodeBudget node_budget(this, AutoNodeBudget::NO_BUDGET_CHECK); | ||
if (lpt->policy_range_check(this)) { | ||
if (lpt->may_have_range_check(this)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be merged with above if.
src/hotspot/share/opto/loopnode.cpp
Outdated
RangeCheckNode* rc = c->in(0)->as_RangeCheck(); | ||
if (loop->is_range_check_if(rc, this, T_LONG, phi, range, offset, scale) && | ||
loop->is_invariant(range) && loop->is_invariant(offset)) { | ||
if (iters_limit_as_long / ABS(scale * stride_con) > min_iters/* && UseNewCode*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UseNewCode
should be removed and if should be merged with if above.
@TobiHartmann thanks for the comments. I pushed a change that should address them. |
Partial review: The comment in PhaseTransform::integercon has a bug left over from JDK-8256655, (I also noticed also that typo: /had a change to be hoisted yet/s/change/chance/ Also there are unusually long comment lines there and inside This variable is unused: The argument Well done adapting The logic for turning a shift amount into a scale factor looks
I did a direct
Now for the biggest part: You are replacing long range checks with I suggest placing The variable IMO the expression This group of definitions is loop-invariant, and should probably
In You have an expression for accumulating the maximum scale: But, I suggest getting rid of
If you do this you won't need that delicate assert at the end. More later. I'm still working through the core logic in |
@rwestrel This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
comment to keep the PR open |
@rwestrel This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
comment to keep the PR open |
@rwestrel This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
comment to keep the PR open |
Do I understand right that you're suggesting modifying c2 so it handles long range checks with the existing range check optimizations (rc_predicate etc.)? The approach this PR has taken is to convert long range checks into int range checks so existing range check optimizations work as is. Ignoring the merit of either approach, what you're suggesting, if I understand right, is different so it can't help this particular PR. |
More or less I thought with constrain In any way I didn't want to interfere, and I hope it will be finished soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more comments from me. Ship it!
@rwestrel This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 114 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall but I did not verify the transform_long_range_checks
logic in detail.
I gave this a good amount of testing in our infra (tier 1-6). All green.
if (p_offset != NULL) { | ||
if (*p_scale == min_signed_integer(bt)) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot to address this comment?
src/hotspot/share/opto/loopnode.hpp
Outdated
@@ -1633,6 +1654,8 @@ class PhaseIdealLoop : public PhaseTransform { | |||
|
|||
void try_sink_out_of_loop(Node* n); | |||
|
|||
Node* clamp(Node* pNode, Node* pNode1, Node* pNode2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument naming is not consistent with the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the argument names.
-min_jint is min_jint. So there's no way to handle a min_jint (or min_jlong) scale above. How else would you handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit that addresses both comments.
/contributor add jrose |
@rwestrel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks good to me.
/integrate |
Going to push as commit 82f4aac.
Your commit was automatically rebased without conflicts. |
JDK-8255150 makes it possible for java code to explicitly perform a
range check on long values. JDK-8223051 provides a transformation of
long counted loops into loop nests with an inner int counted
loop. With this change I propose transforming long range checks that
operate on the iv of a long counted loop into range checks that
operate on the iv of the int inner loop once it has been
created. Existing range check eliminations can then kick in.
Transformation of range checks is piggy backed on the loop nest
creation for 2 reasons:
pattern matching range checks is easier right before the loop nest
is created
the number of iterations of the inner loop is adjusted so scale *
inner_iv doesn't overflow
C2 has logic to delay some split if transformations so they don't
break the scale * iv + offset pattern. I reused that logic for long
range checks and had to relax what's considered a range check because
initially a range check from Object.checkIndex() may include a test
for range > 0 that needs a round of loop opts to be hoisted. I realize
there's some code duplication but I didn't see a way to share logic
between IdealLoopTree::may_have_range_check()
IdealLoopTree::policy_range_check() that would feel right.
I realize the comment in PhaseIdealLoop::transform_long_range_checks()
is scary. FWIW, it's not as complicated as it looks. I found drawing
the range covered by the entire long loop and the range covered by the
inner loop help see how range checks can be transformed. Then the
comment helps make sure all cases are covered and verify the generated
code actually covers all of them.
One issue is overflow. I think the fact that inner_iv * scale doesn't
overflow helps simplify thing. One possible overflow is that of scale
case to deoptimize. I don't think other case of overflow needs special
handling.
This was tested with a Memory Segment micro benchmark (and patched
Memory Segment support to take advantage of the new checkIndex
intrinsic, both provided by Maurizio). Range checks in the micro
benchmark are properly optimized (and performance increases
significantly).
Progress
Issue
Reviewers
Contributors
<jrose@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045
$ git checkout pull/2045
Update a local copy of the PR:
$ git checkout pull/2045
$ git pull https://git.openjdk.java.net/jdk pull/2045/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 2045
View PR using the GUI difftool:
$ git pr show -t 2045
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/2045.diff