-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL #13269
Conversation
👋 Welcome back epeter! A progress list of the required criteria for merging this PR into |
Webrevs
|
That looks reasonable to me. |
At first I only fixed
I'll do that, will report back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@eme64 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
@rwestrel thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have several unrolls we will have chain of MaxL/MinL
nodes. Will the chain be folded by IGVN?
Node* stride_l = new ConvI2LNode(stride); | ||
register_new_node(stride_l, get_ctrl(limit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make Long constant (_igvn.longcon(stride)
) instead since stride
is constant? Similar to underflow_clamp_l
. My concern is you set control to constant which is not Root
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will replace it with constant. Yes, I had the ctrl wrong, it should be root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me and indeed much cleaner! I only have some minor code style comments.
There is a bit of an overhead here: We use longs even though we only want to have int values. But I think we should prefer a clean implementation here, with correct type computation. The performance impact is probably non-existent on 64-bit machines anyway.
I agree with that.
I have already recently fixed a bug around this CMoveI. 5a4945c I would now like to have a more satisfactory fix, that properly propagates the types.
I've had a feeling that we are revisiting this again at some point.
Should we add such an assert during IGVN? I think after IGVN, we should never have a MultiBranchNode that does not have the required number of outputs, right? We could add it to VerifyIterativeGVN.
That would be a good idea to investigate in an RFE - maybe also for other nodes to assert on well-known input/output patterns. We've had such problems before after CCP with If
nodes with only one out projection.
@vnkozlov I fear it would not fold currently. The CMove would not fold before either, but with repeated unrolling, the CMove was reused, and so there was only ever a single CMove (unless some RC got in between). I think in many cases, the type does not underflow, and the I think the performance impact is now insignificant, if it does not fold. Because the limits are only calculated once per loop. We can still improve the folding, if you want. I can also do that in a follow-up RFE, and try to add some IR tests that target type-limit underflow, and count the TLDR: @vnkozlov is it ok if I investingate & test |
Thanks for the suggestions! Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
I added the idea about verifying out-proj of |
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I would disagree. In many cases
Depending on result of investigation of generated code. |
I think you're running into an issue where some nodes created by counted loop expansion aren't properly passed onto the IGVN worklist- I found the same thing while trying to investigate some strange code generation from small loops. If you make that follow-up RFE I would be happy to attach the cases that I found as well. |
@jaskarth please send me those cases, if it is many then maybe better via email. I'm generally working on doing verification of that kind, see JDK-8298951. |
I only had a handful of cases so I've attached them in this gist. JDK-8298951 is exciting, it'll make reasoning about middle-end optimizations a lot easier :) |
@jaskarth I think your issues are not related, though I can look at them again once I get back to IGVN verification. @vnkozlov I thought about it a bit more. With a simple example like One idea would be to fold The example:
What to do with this?
I have an alternative proposal: But for the unroll-limits, we introduce a With this But I think I can do the same with just collapsing @vnkozlov What do you think? Do you have any other ideas? What solution would you prefer? |
|
I prefer this if you can do it. So you have sequence (after folding
Yes, I think it can be collapsed to:
If in any point of chain |
@vnkozlov I added some more IGVN optimizations that help to fold the
I added verification, that these optimizations are really taken: jdk/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitSubtractionsCollapse.java Lines 50 to 68 in 33e1ad5
Is this now ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just minor comments.
src/hotspot/share/opto/addnode.cpp
Outdated
// Collapse the "addition with overflow-protection" pattern, and the symetrical | ||
// "subtraction with underflow-protection" pattern. These are created during the | ||
// unrolling, when we have to adjust the limit by subtracting the stride, but want | ||
// to protect agains underflow: MaxL(SubL(limit, stride), min_jint). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add note that SubL node is replaced with AddL and reversed stride ( I assume that is what happened here).
// | | | ||
// Max/MinL (n) | ||
// | ||
Node* fold_subI_no_underflow_pattern(Node* n, PhaseGVN* phase) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this method and it comment before MaxLNode::add_ring
so all MaxL and MinL method stay together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me.
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good!
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
Thanks @chhagedorn @TobiHartmann @vnkozlov @rwestrel for the reviews and suggestions! |
Going to push as commit cc894d8.
Your commit was automatically rebased without conflicts. |
Context
During
PhaseIdealLoop::do_unroll
, we hack the loop-limit, and subtractstride
from it. We have to prevent underflow on that subtract. Currently, we do this with aCMoveI
. The problem with this:CMoveI
is not smart enough to generate a precise type. For example, there are many cases where the input types get better, and underflow is not possible anymore. But theCMoveI
does not detect this, and still has typemin_jint..hi
.We have the same issue in
PhaseIdealLoop::adjust_limit
, where we useCMoveL
to implement long max/min. The types are not as precise as they could and should be.Problem
The imprecise type is used for the zero-trip-guard. It does not fold to false, even though the data-path into the post loop does constant fold to
TOP
. The graph breaks, and assertmalformed control flow
triggers.Details: In these cases, we have the super-unrolled main-loop (SuperWord'ed, then further unrolled) directly leading to a vectorized post-loop. The effect is that there is no
region/phi
merging main-exit and main-zero-trip-guard. So the types are already more narrow here. It may be possible that the values are such that we find out that we should never enter the vectorized post-loop. But if data finds out and control does not, we get a broken graph.Note: we have pre-loop. Then a main-loop and vectorized post loop. Then we merge the main-zero-trip-guard. And at the end we have the scalar post loop.
I have already recently fixed a bug around this
CMoveI
. 5a4945c I would now like to have a more satisfactory fix, that properly propagates the types.Solution
PhaseIdealLoop::adjust_limit
already converts the limit from int to long, and does all computations in long, including taking max/min with aCMoveL
. I now use the so far unusedMaxL/MinL
. I implemented some missingValue/Identity
components for it. SinceMaxL/MinL
is not implemented in the backend, I just expand it in macro-expansion to aCMoveL
. At that point the loop-opts are over, and it is most likely ok that we do not make the types more precise after this.I take the same approach for
PhaseIdealLoop::do_unroll
: convert limits to long, do subtraction in long, takeMinL/MaxL
to clamp it to the int-range (prevent subtraction underflow).Discussion
This solution seems much cleaner to me, and I hope that we will see less bugs because of imprecise types in the limit computation, which were often due to the
CMove
not being smart enough to analyze all inputs (it would have to recognize a multitude of patterns, for the Cmp inputs and the direct inputs to the CMove - we currently do not do that, but just take the union of the input types - this is very inprecise).There is a bit of an overhead here: We use longs even though we only want to have int values. But I think we should prefer a clean implementation here, with correct type computation. The performance impact is probably non-existent on 64-bit machines anyway.
Caveat
I found some cases with the same assert
malformed control flow
that are most likely skeleton/assertion predicate bugs JDK-8288981. Some of those cases were new patterns, for example where we PreMainPost a main loop.I hope that this fix here at least reduces the frequency of failures significantly.
Testing
I added 2 regression tests. Our fuzzer seems to spit out examples regularly, so that gives us extra coverage.
Tested up to
tier5
and stress testing. Performance testing running...Future Work
We should implement
MaxL/MinL
in the backend. We should also use them during parsing. This would also allow toSuperWord
the instruction, on the platforms that support it.Should we add such an assert during IGVN? I think after IGVN, we should never have a
MultiBranchNode
that does not have the required number of outputs, right? We could add it toVerifyIterativeGVN
.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13269/head:pull/13269
$ git checkout pull/13269
Update a local copy of the PR:
$ git checkout pull/13269
$ git pull https://git.openjdk.org/jdk.git pull/13269/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13269
View PR using the GUI difftool:
$ git pr show -t 13269
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13269.diff
Webrev
Link to Webrev Comment