-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop #19086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
|
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 158 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
chhagedorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add the regression tests from the duplicated issue JDK-8298851.
| // ConvI2L may have type information on it which is unsafe to push up | ||
| if ((n->Opcode() == Op_ConvI2L && n->bottom_type() != TypeLong::LONG) || | ||
| (n->Opcode() == Op_ConvL2I && n->bottom_type() != TypeInt::INT)) { | ||
| // ConvI2L/ConvL2I may have type information on it which is unsafe to push up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good and we should probably move forward with that.
But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero? IIUC, the backedge should be an AddL with type [0..99], i.e. post-decremented. So, pushing through seems wrong in this case since the backedge type includes zero. But it could be detected and prevented. However, if the phi has type [5..100], for example, then it should be safe. We probably then just need to update the type of the pushed-through ConvL2I to whatever the type of the input is.
This type checking approach could work in the general case. But I'm not sure though, if it's beneficial to split these Conv nodes through phis in general. But it seems the bailouts have only been introduced due to correctness bugs and not due to performance reasons. Anyway, this should be investigated separately, including benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero?
Are we sure divisions are the only cause of bugs? My understanding of this issue is that once pushed thru phi, the type of the ConvL2I is simply not correct and that's the root cause. I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure divisions are the only cause of bugs?
Not 100% sure. But the only cases I've observed so far are with division/mod where they float above and end up being executed too early (the result is never actually observed, though).
that once pushed thru phi, the type of the ConvL2I is simply not correct and that's the root cause.
Yes, that's my understanding, too. But since the AddL input into the loop iv phi contains zero, it raised the question if we could actually detect that and do our decision based on whether the input contains zero instead of simply disabling pushing ConvL2I (and ConvI2L) nodes through phis entirely.
It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the Conv* nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE.
I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access.
Could very well be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the
Conv*nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE.
I agree that it would be worth investigating further.
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch.
But if the ConvI2L were not a type-node, then it would not restrict type, and you could simply push it through phis. Right?
Why do we have type restriction mixed into ConvI2L? Could that not be separated out into a CastII / CastLL?
Maybe we could generally separate ConvI2L, type restriction, and pinning? CastII also does multiple things, and it has hurt us many times in the past. Would this sort of maximal separation and specialization not be more "see of nodes" style?
Anyway, this would be interesting to look into for a future RFE.
| * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement | ||
| * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I | ||
| * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement | ||
| * -XX:+StressGCM TestLongCountedLoopConvL2I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have a run that allows OSR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also add -XX:+UnlockDiagnosticVMOptions for the stress flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
I added one of them because it doesn't seem to need |
That's not entirely true here. The |
So what exactly is it that guarantees the correctness of the I would now have to dive into the code and debug if the "type restriction" for counted loop phi happens purely because of the input values, or because of explicitly restrincting the type of the |
|
Before split if: after split if: |
|
@rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? I guess the issue is that the How exactly did we narrow the type to So I guess that is really a limitation: a trip count |
Yes.
The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences. |
|
@rwestrel Yes, I'm totally fine with the fix. It simply applies the In a future RFE, we could at least restrict the "bailout" to trip-count Phi's, and not all Phi's. In even further RFE's, we could consider doing the type narrowing not in the trip-count phi, but via casts at the checks. That would be a more unified solution. Generally, I feel like we are struggling way too much with all the different ways one can pin and narrow types: it is all mixed into trip-count phi's, Cast's, Conv's etc. Who really can understand all the complicated interactions? It seem we keep piling on special-case logic, but it is a endless whack-a-mole game. Every fix is "simple" but the sum of all those fixes is far from "simple" ;) |
chhagedorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good, thanks for adding the test!
|
FTR, I double checked that fuzzer test failures from JDK-8298851 are indeed the same issue and are fixed with this. |
|
/integrate |
|
Going to push as commit f398cd2.
Your commit was automatically rebased without conflicts. |
In the test case:
The long counted loop phi has type
[1..100]. As a consequence, theConvL2Ialso has type[1..100]. TheDivInode that follows can'tfault: it is not guarded by a zero check and has no control set.
The
ConvL2Iis split through phi and so is theDiVInode:PhaseIdealLoop::cannot_split_division()returns true because thevalue coming from the backedge into the
DivI(when it is about to besplit thru phi) is the result of the
ConvL2Iwhich has type[1..100] so is not zero as far as the compiler can tell.On the last iteration of the loop, i is 1. Because the DivI was split
thru Phi, it computes the value for the following iteration, so for i
= 0. This causes a crash when the compiled code runs.
The same problem can't happen with an int counted loop because logic
in
PhaseIdealLoop::split_thru_phi()prevents aConvI2Lfrom beingsplit thru phi. I propose to fix this the same way: in the test case,
it's not true that once the
ConvL2Iis split thru phi it keeps type[1..100]. The fix is fairly conservative because it's base on theexisting logic for
ConvI2L: we would want to not split aConvL2Ionly a counted loopd but. I suppose the same is true for the
ConvI2Land I thought it would be best to revisit both together.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086$ git checkout pull/19086Update a local copy of the PR:
$ git checkout pull/19086$ git pull https://git.openjdk.org/jdk.git pull/19086/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19086View PR using the GUI difftool:
$ git pr show -t 19086Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19086.diff
Webrev
Link to Webrev Comment