-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8324517: C2: crash in compiled code because of dependency on removed range check CastIIs #18377
Conversation
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 16 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rwestrel !
Generally makes sense, I have a few suggestions and questions.
src/hotspot/share/opto/compile.cpp
Outdated
// For memory access or integer divisions nodes that depend on the cast, record the dependency on the cast's control | ||
// as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a | ||
// range check or a null divisor check. | ||
assert(cast->in(0) != nullptr, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(cast->in(0) != nullptr, ""); | |
assert(cast->in(0) != nullptr, "All RangeCheck CastII must have a control dependency"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
// the corresponding range check is not removed. | ||
if (!_range_check_dependency) { | ||
res = widen_type(phase, res, T_INT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you changed this, and why it is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConvI2L
has a similar transformation. Let's say we have 2 ConvI2L
nodes with identical inputs but different types:
(ConvI2L _ input [2..max_jint])
(ConvI2L _ input [1..max_jint])
They are transformed to:
(ConvI2L _ input [0..max_jint])
(ConvI2L _ input [0..max_jint])
so they can common. With range checks, the pattern is:
(ConvI2L _ (CastII control input [2..max_jint]) [2..max_jint])
(ConvI2L _ (CastII control input [1..max_jint]) [1..max_jint]
Without this patch, the range checks CastII
are removed after loop opts so having the transformation be done only on ConvI2L
nodes is sufficient. With this change the CastII
are left in the IR so they need to be transformed the same way:
(ConvI2L _ (CastII control input [0..max_jint]) [0..max_jint])
(ConvI2L _ (CastII control input [0..max_jint]) [0..max_jint]
so they can common and the ConvI2L
then common.
// makes sure we run ::Value to potentially remove type assertion after loop opts | ||
phase->C->record_for_post_loop_opts_igvn(this); | ||
} | ||
if (!_range_check_dependency) { | ||
if (!_type->is_int()->empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also explain this change, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's similar to the case above. ConvI2L
has a similar transformation (Add
node pushed through ConvI2L
). For range checks, the CastII
gets in the way but it wasn't an issue without the patch I propose (the CastII
ends up going away). Leaving the CastII
requires that the transformation happens on the CastII
as well. The:
!_type->is_int()->empty()
test doesn't replace the
!_range_check_dependency
I removed the second one, ran some testing and then had a failure where the ConstraintCastNode::optimize_integer_cast()
crashes because it gets called with an empty type so I added the test for an empty type. I don't remember the details of the failure other than it seemed like a corner case.
* -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
* -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
* | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a "vanilla" run like this:
@run main TestArrayAccessAboveRCAfterRCCastIIEliminated
That would allow us to run the test with any flag combination from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe one that only does -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined
, since that seems important for reproducing the interesting patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
* | ||
*/ | ||
|
||
public class TestArrayAccessAboveRCAfterRCCastIIEliminated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this test is not in a package. Is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that is important, but it seems most other tests are in a package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never put tests in a package. So If there's an issue with that, then there are many more tests to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. I was just curious, and have mostly seen tests that are in a package.
Thanks for reviewing this. |
@eme64 did you get a chance to look at the answers to your questions? |
@rwestrel It seems I only get notifications for new messages, not responses. Looking at the PR now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates and answers, @rwestrel !
Looks reasonable, a second reviewer should also have a close look though.
* @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation | ||
* -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
* -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
* @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated | |
* @run main TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detail, optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, shouldn't I have kept -XX:-BackgroundCompilation
for this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have one run with absolutely no flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then it's not even guaranteed that the compilation completes? I'm not sure I understand the rational.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eme64 can you please explain why a run without flags make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel we internally have lots of different runs with different flags. Sometimes bugs only show under certain flag combinations. If you always have the flags on in the test already, then some combinations may not be effective. But if you think that some flags MUST always be on for the test to make any sense, then keep them, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job coming up with these tests, Roland!
Did you check if the other usages of _range_check_dependency
via CastIINode::has_range_check
are still needed? Seems to me as if at least the checks in PhaseIdealLoop::match_fill_loop
can be removed.
src/hotspot/share/opto/compile.cpp
Outdated
case Op_CastII: { | ||
remove_range_check_cast(n->as_CastII()); | ||
} | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off.
src/hotspot/share/opto/compile.cpp
Outdated
if (cast->has_range_check()) { | ||
// Range check CastII nodes feed into an address computation subgraph. Remove them to let that subgraph float freely. | ||
// For memory access or integer divisions nodes that depend on the cast, record the dependency on the cast's control | ||
// as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a | |
// as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminate a |
src/hotspot/share/opto/compile.cpp
Outdated
Node* m = wq.at(next); | ||
for (DUIterator_Fast imax, i = m->fast_outs(imax); i < imax; i++) { | ||
Node* use = m->fast_out(i); | ||
if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op_ModI
and Op_ModL
are missing here. And isn't this too strong in cases where we can prove that the operand is non-zero? Could you re-use PhaseIterGVN::no_dependent_zero_check
? Please also add corresponding tests.
Looking at PhaseIterGVN::no_dependent_zero_check
, I noticed that UDiv[I/L]Node
and UMod[I/L]Node
are not handled but I think they should. I think this was missed when these nodes where added by JDK-8282221. One can probably extend @chhagedorn's test from JDK-8259227 to trigger the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op_ModI
andOp_ModL
are missing here.
Good catch! I added test cases for Op_ModI
and Op_ModL
, the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them.
And isn't this too strong in cases where we can prove that the operand is non-zero?
I don't think it's too strong. The operand can be non zero because of a range check CastII
somewhere along the subgraph that starts at the node's second input. In that case, PhaseIterGVN::no_dependent_zero_check
would return true but removing the range CastII
would cause the bugs that are triggered by the test case.
Looking at
PhaseIterGVN::no_dependent_zero_check
, I noticed thatUDiv[I/L]Node
andUMod[I/L]Node
are not handled but I think they should. I think this was missed when these nodes where added by JDK-8282221. One can probably extend @chhagedorn's test from JDK-8259227 to trigger the same issue.
That seems like a different problem that out of the scope of this particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that I didn't understand your comment when I replied.
What you're saying, I think, is that if we have, say, a CastII
that's input to a DivI
node, if the input to that cast is non zero, then we don't need to add the CastII
control as dependency to the DivI
. The problem, I think, is that the CastII
could be input to say an AddI
node which would then be input to the DivI
. What we would then need to know is whether if we remove the CastII
, the AddI
is still non null or not. That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI
Yes, that was my point.
That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix.
Right, we can still go down that path if it ever becomes necessary.
That seems like a different problem that out of the scope of this particular issue.
Could you please file a follow-up bug for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* -XX:+StressIGVN TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
* @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation | ||
* -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
* -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in new commit.
Thanks for reviewing this.
I did but was fairly conservative. In the case of |
Right, maybe we can put that into the follow-up bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@eme64 do you need to review the updated change? Also can you answer the question you left unanswered about the test case: "can you please explain why a run without flags make sense?" |
src/hotspot/share/opto/compile.cpp
Outdated
if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL || | ||
use->Opcode() == Op_ModI || use->Opcode() == Op_ModL || use->Opcode() == Op_UDivI || | ||
use->Opcode() == Op_UDivL || use->Opcode() == Op_UModI || use->Opcode() == Op_UModL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda smells like it should be its own method. Seems we do not have a superclass for all Mod/Div nodes. Maybe we should have that? Or otherwise just a Node::is_div_or_mod()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you refactor the DIV/MOD checks, but otherwise I'm ok with the updates.
What about the new commit with the Node::is_div_or_mod() method @eme64 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel thanks for the update, looks good!
Should there be another follow up bug then? Or did I not understand what the follow up bug was about? |
Right, feel free to file a new one but I think to just keep track of it, we can as well add it to JDK-8332268 for now. |
|
Thanks for the reviews @TobiHartmann and @eme64 |
/integrate |
Going to push as commit ab8d7b0.
Your commit was automatically rebased without conflicts. |
This is causing a crash in compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java Aarch64 only so far: A fatal error has been detected by the Java Runtime Environment:Internal Error (/opt/mach5/mesos/work_dir/slaves/a4a7850a-7c35-410a-b879-d77fbb2f6087-S6223/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c30fdcec-3ee7-44a9-bfc5-38869fd48e4b/runs/4e34865c-00ca-43a6-87eb-21db2db1c8ab/workspace/open/src/hotspot/share/opto/gcm.cpp:1423), pid=1811107, tid=1811123assert(false) failed: graph should be schedulablewill get a bug filed |
@rwestrel Filed JDK-8332369, can you have a look at it? |
Hmm seems we only ran test from our side for v01, and the test there had a crash too, though different. |
Right, I did run testing on an early draft and v01 and only saw the |
Range check
CastII
nodes are removed once loop opts are over. Thetest case for this change includes 3 cases where elimination of a
range check
CastII
causes a crash in compiled code because either aout of bounds array load or a division by zero happen.
In
test1
:the range checks for the
array[otherArray.length]
loads constantfold:
otherArray.length
is aCastII
of i at theotherArray
allocation.
i
is less than 9. TheCastII
at the allocationnarrows the type down further to
[0-9]
.the
array[otherArray.length]
loads are control dependent on theunrelated:
test. There's an identical dominating test which replaces that one. As
a consequence, the
array[otherArray.length]
loads become controldependent on the dominating test.
CastII
nodes at theotherArray
allocations are replaced by adominating range check
CastII
nodes for:CastII
nodes are removed and the2
array[otherArray.length]
loads common at the first:test before the:
and
that guarantee
i
is positive.test1
is called withi = -1
, the array load proceeds with an outof bounds index and the crash occurs.
test2
andtest3
are mostly identical except for the check that'seliminated (a null divisor check) and the instruction that causes a
fault (an integer division).
The fix I propose is to not eliminate range check
CastII
nodes afterloop opts. When range check
CastII
nodes were introduced, performancewas observed to regress. Removing them after loop opts was found to
preserve both correctness and performance. Today, the performance
regression still exists when
CastII
nodes are left in. So I proposewe keep them until the end of optimizations (so the 2 array loads
above don't lose a dependency and wrongly common) but remove them at
the end of all optimizations.
In the case of the array loads, they are dependent on a range check
for another array through a range check
CastII
and we must not losethat dependency otherwise the array loads could float above the range
check at gcm time. I propose we deal with that problem the way it's
handled for
CastPP
nodes: add the dependency to the load (ordivision)nodes as a precedence edge when the cast is removed.
@TobiHartmann ran performance testing for that patch (Thanks!) and reported
no regression.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377
$ git checkout pull/18377
Update a local copy of the PR:
$ git checkout pull/18377
$ git pull https://git.openjdk.org/jdk.git pull/18377/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18377
View PR using the GUI difftool:
$ git pr show -t 18377
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18377.diff
Webrev
Link to Webrev Comment