-
Notifications
You must be signed in to change notification settings - Fork 6k
8276116: C2: optimize long range checks in int counted loops #6576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No review yet, just run this through testing and TestLongRangeCheck.java fails with:
java.lang.RuntimeException: should have been deoptimized
at TestLongRangeCheck.assertIsNotCompiled(TestLongRangeCheck.java:60)
at TestLongRangeCheck.test(TestLongRangeCheck.java:127)
at TestLongRangeCheck.main(TestLongRangeCheck.java:215)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
at java.base/java.lang.Thread.run(Thread.java:833)
Flags are -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
@TobiHartmann thanks for running testing. That one should be fixed now. |
New round of testing all passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good to me.
@@ -13129,6 +13129,24 @@ instruct cmovLL_mem_LTGE(cmpOp cmp, flagsReg_long_LTGE flags, eRegL dst, load_lo | |||
ins_pipe( pipe_cmov_reg_long ); | |||
%} | |||
|
|||
instruct cmovLL_reg_LTGE_U(cmpOpU cmp, flagsReg_ulong_LTGE flags, eRegL dst, eRegL src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it is related to these changes? Seems like addition to 8277324 changes. Could be pushed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good to me.
Thanks for reviewing this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it is related to these changes? Seems like addition to 8277324 changes. Could be pushed separately.
That showed on github testing because of the new unsigned_min I think. So not including it would break x86_32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing and performance results looks fine.
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Tobias's tier6-7 passed. @rwestrel you can integrate. |
/integrate |
Going to push as commit b3faecf.
Your commit was automatically rebased without conflicts. |
exp = exp->in(1); | ||
bt = T_INT; | ||
if (converted != NULL) { | ||
*converted = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best not to assign *converted
until the function returns success.
It might also be wise to assign *converted
to false as well as true, as the case may be.
I noticed that there are uses of the function on several different inputs but with the same converted
pointer. If one use sets converted
to true but returns false, and another use returns true, then the original caller can get a bad converted
flag out of the deal.
Maurizio noticed that some of his panama micro benchmarks don't
perform better avec 8259609 (C2: optimize long range checks in long
counted loops). The reason is that 8259609 optimizes long range checks
in long counted loops but some of his benchmarks include long range
checks in int counted loops:
for (int i = start; i < stop; i += inc) {
Objects.checkIndex(scale * ((long)i) + offset, length);
}
This change applies the transformation from 8259609 for long counted
loop/long range checks to int counted loop/long range checks. That
includes creating a loop nest and transforming the long range check to
an int range check that's subject to range elimination in the inner
loop.
The reason it's required to create a loop nest is that the long range
check transformation logic depends on no overflow of scale * i for the
range of values that the transformed range check is applied to.
As a consequence, this change is mostly refactoring to make the loop
nest creation and range check transformation parameterized by the type
of the transformed loop.
I think this transformation needs to be applied as late as possible
but, in the case of an int counted loop, before pre/main/post loops
are created. I had to move it to IdealLoopTree::iteration_split_impl()
because of that.
There's an alternate shape for a long range check in an int counted
loop that Maurizio insisted needs to be supported:
for (int i = start; i < stop; i += inc) {
Objects.checkIndex(((long)(scale * i)) + offset, length);
}
scale * i can overflow in that case. This is also supported but as a
corner case of the previous one. The code in
PhaseIdealLoop::transform_long_range_checks() has a comment about
that.
Note also that this transformation works best if loop strip mining is
enabled (that is for G1, ZGC, Shenandoah by default). The reason is
that it needs a safepoint and when loop strip mining is enabled, the
outer loop contains one that's always available. A way to have this
work as well for all GCs would be to always construct the loop strip
mining loop nest (whether loop strip mining is enabled or not) and
then only once loop opts are over remove the outer loop when loop
strip mining is disabled. I'm looking for feedback on this.
BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl():
https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475
should_peel causes transformations to be skipped but peeling is never
applied AFAICT. Does it make sense to anyone?
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6576/head:pull/6576
$ git checkout pull/6576
Update a local copy of the PR:
$ git checkout pull/6576
$ git pull https://git.openjdk.java.net/jdk pull/6576/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6576
View PR using the GUI difftool:
$ git pr show -t 6576
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6576.diff