8324517: C2: crash in compiled code because of dependency on removed range check CastIIs#18377
8324517: C2: crash in compiled code because of dependency on removed range check CastIIs#18377rwestrel wants to merge 9 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
|
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 16 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
src/hotspot/share/opto/compile.cpp
Outdated
| // For memory access or integer divisions nodes that depend on the cast, record the dependency on the cast's control | ||
| // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a | ||
| // range check or a null divisor check. | ||
| assert(cast->in(0) != nullptr, ""); |
There was a problem hiding this comment.
| assert(cast->in(0) != nullptr, ""); | |
| assert(cast->in(0) != nullptr, "All RangeCheck CastII must have a control dependency"); |
There was a problem hiding this comment.
Done in new commit.
| // the corresponding range check is not removed. | ||
| if (!_range_check_dependency) { | ||
| res = widen_type(phase, res, T_INT); | ||
| } |
There was a problem hiding this comment.
Can you explain why you changed this, and why it is ok?
There was a problem hiding this comment.
ConvI2L has a similar transformation. Let's say we have 2 ConvI2L nodes with identical inputs but different types:
(ConvI2L _ input [2..max_jint])
(ConvI2L _ input [1..max_jint])
They are transformed to:
(ConvI2L _ input [0..max_jint])
(ConvI2L _ input [0..max_jint])
so they can common. With range checks, the pattern is:
(ConvI2L _ (CastII control input [2..max_jint]) [2..max_jint])
(ConvI2L _ (CastII control input [1..max_jint]) [1..max_jint]
Without this patch, the range checks CastII are removed after loop opts so having the transformation be done only on ConvI2L nodes is sufficient. With this change the CastII are left in the IR so they need to be transformed the same way:
(ConvI2L _ (CastII control input [0..max_jint]) [0..max_jint])
(ConvI2L _ (CastII control input [0..max_jint]) [0..max_jint]
so they can common and the ConvI2L then common.
| phase->C->record_for_post_loop_opts_igvn(this); | ||
| } | ||
| if (!_range_check_dependency) { | ||
| if (!_type->is_int()->empty()) { |
There was a problem hiding this comment.
Can you also explain this change, please?
There was a problem hiding this comment.
It's similar to the case above. ConvI2L has a similar transformation (Add node pushed through ConvI2L). For range checks, the CastII gets in the way but it wasn't an issue without the patch I propose (the CastII ends up going away). Leaving the CastII requires that the transformation happens on the CastII as well. The:
!_type->is_int()->empty()
test doesn't replace the
!_range_check_dependency
I removed the second one, ran some testing and then had a failure where the ConstraintCastNode::optimize_integer_cast() crashes because it gets called with an empty type so I added the test for an empty type. I don't remember the details of the failure other than it seemed like a corner case.
| * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
| * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
| * | ||
| */ |
There was a problem hiding this comment.
Can you please add a "vanilla" run like this:
@run main TestArrayAccessAboveRCAfterRCCastIIEliminated
That would allow us to run the test with any flag combination from the outside.
There was a problem hiding this comment.
And maybe one that only does -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined, since that seems important for reproducing the interesting patterns.
There was a problem hiding this comment.
Done in new commit.
| * | ||
| */ | ||
|
|
||
| public class TestArrayAccessAboveRCAfterRCCastIIEliminated { |
There was a problem hiding this comment.
It seems this test is not in a package. Is this on purpose?
There was a problem hiding this comment.
Not sure if that is important, but it seems most other tests are in a package
There was a problem hiding this comment.
I never put tests in a package. So If there's an issue with that, then there are many more tests to fix.
There was a problem hiding this comment.
Fine with me. I was just curious, and have mostly seen tests that are in a package.
Thanks for reviewing this. |
|
@eme64 did you get a chance to look at the answers to your questions? |
|
@rwestrel It seems I only get notifications for new messages, not responses. Looking at the PR now... |
| * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation | ||
| * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
| * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
| * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
| * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated | |
| * @run main TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
Actually, shouldn't I have kept -XX:-BackgroundCompilation for this one?
There was a problem hiding this comment.
I think it would be great to have one run with absolutely no flags.
There was a problem hiding this comment.
But then it's not even guaranteed that the compilation completes? I'm not sure I understand the rational.
There was a problem hiding this comment.
@eme64 can you please explain why a run without flags make sense?
There was a problem hiding this comment.
@rwestrel we internally have lots of different runs with different flags. Sometimes bugs only show under certain flag combinations. If you always have the flags on in the test already, then some combinations may not be effective. But if you think that some flags MUST always be on for the test to make any sense, then keep them, I guess.
TobiHartmann
left a comment
There was a problem hiding this comment.
Great job coming up with these tests, Roland!
Did you check if the other usages of _range_check_dependency via CastIINode::has_range_check are still needed? Seems to me as if at least the checks in PhaseIdealLoop::match_fill_loop can be removed.
src/hotspot/share/opto/compile.cpp
Outdated
| case Op_CastII: { | ||
| remove_range_check_cast(n->as_CastII()); | ||
| } | ||
| break; |
src/hotspot/share/opto/compile.cpp
Outdated
| if (cast->has_range_check()) { | ||
| // Range check CastII nodes feed into an address computation subgraph. Remove them to let that subgraph float freely. | ||
| // For memory access or integer divisions nodes that depend on the cast, record the dependency on the cast's control | ||
| // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a |
There was a problem hiding this comment.
| // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a | |
| // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminate a |
src/hotspot/share/opto/compile.cpp
Outdated
| Node* m = wq.at(next); | ||
| for (DUIterator_Fast imax, i = m->fast_outs(imax); i < imax; i++) { | ||
| Node* use = m->fast_out(i); | ||
| if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL) { |
There was a problem hiding this comment.
Op_ModI and Op_ModL are missing here. And isn't this too strong in cases where we can prove that the operand is non-zero? Could you re-use PhaseIterGVN::no_dependent_zero_check? Please also add corresponding tests.
Looking at PhaseIterGVN::no_dependent_zero_check, I noticed that UDiv[I/L]Node and UMod[I/L]Node are not handled but I think they should. I think this was missed when these nodes where added by JDK-8282221. One can probably extend @chhagedorn's test from JDK-8259227 to trigger the same issue.
There was a problem hiding this comment.
Op_ModIandOp_ModLare missing here.
Good catch! I added test cases for Op_ModI and Op_ModL , the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them.
And isn't this too strong in cases where we can prove that the operand is non-zero?
I don't think it's too strong. The operand can be non zero because of a range check CastII somewhere along the subgraph that starts at the node's second input. In that case, PhaseIterGVN::no_dependent_zero_check would return true but removing the range CastII would cause the bugs that are triggered by the test case.
Looking at
PhaseIterGVN::no_dependent_zero_check, I noticed thatUDiv[I/L]NodeandUMod[I/L]Nodeare not handled but I think they should. I think this was missed when these nodes where added by JDK-8282221. One can probably extend @chhagedorn's test from JDK-8259227 to trigger the same issue.
That seems like a different problem that out of the scope of this particular issue.
There was a problem hiding this comment.
I realized that I didn't understand your comment when I replied.
What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI. The problem, I think, is that the CastII could be input to say an AddI node which would then be input to the DivI. What we would then need to know is whether if we remove the CastII, the AddI is still non null or not. That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix.
There was a problem hiding this comment.
What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI
Yes, that was my point.
That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix.
Right, we can still go down that path if it ever becomes necessary.
That seems like a different problem that out of the scope of this particular issue.
Could you please file a follow-up bug for that?
There was a problem hiding this comment.
| * -XX:+StressIGVN TestArrayAccessAboveRCAfterRCCastIIEliminated | ||
| * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation | ||
| * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined | ||
| * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated |
There was a problem hiding this comment.
Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.
There was a problem hiding this comment.
Fixed in new commit.
|
Thanks for reviewing this.
I did but was fairly conservative. In the case of |
Right, maybe we can put that into the follow-up bug. |
|
@eme64 do you need to review the updated change? Also can you answer the question you left unanswered about the test case: "can you please explain why a run without flags make sense?" |
src/hotspot/share/opto/compile.cpp
Outdated
| if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL || | ||
| use->Opcode() == Op_ModI || use->Opcode() == Op_ModL || use->Opcode() == Op_UDivI || | ||
| use->Opcode() == Op_UDivL || use->Opcode() == Op_UModI || use->Opcode() == Op_UModL) { |
There was a problem hiding this comment.
This kinda smells like it should be its own method. Seems we do not have a superclass for all Mod/Div nodes. Maybe we should have that? Or otherwise just a Node::is_div_or_mod() method?
eme64
left a comment
There was a problem hiding this comment.
I suggest you refactor the DIV/MOD checks, but otherwise I'm ok with the updates.
|
What about the new commit with the Node::is_div_or_mod() method @eme64 ? |
Should there be another follow up bug then? Or did I not understand what the follow up bug was about? |
Right, feel free to file a new one but I think to just keep track of it, we can as well add it to JDK-8332268 for now. |
|
|
Thanks for the reviews @TobiHartmann and @eme64 |
|
/integrate |
|
Going to push as commit ab8d7b0.
Your commit was automatically rebased without conflicts. |
|
This is causing a crash in compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java Aarch64 only so far: A fatal error has been detected by the Java Runtime Environment:Internal Error (/opt/mach5/mesos/work_dir/slaves/a4a7850a-7c35-410a-b879-d77fbb2f6087-S6223/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c30fdcec-3ee7-44a9-bfc5-38869fd48e4b/runs/4e34865c-00ca-43a6-87eb-21db2db1c8ab/workspace/open/src/hotspot/share/opto/gcm.cpp:1423), pid=1811107, tid=1811123assert(false) failed: graph should be schedulablewill get a bug filed |
@rwestrel Filed JDK-8332369, can you have a look at it? |
|
Hmm seems we only ran test from our side for v01, and the test there had a crash too, though different. |
|
Right, I did run testing on an early draft and v01 and only saw the |
Range check
CastIInodes are removed once loop opts are over. Thetest case for this change includes 3 cases where elimination of a
range check
CastIIcauses a crash in compiled code because either aout of bounds array load or a division by zero happen.
In
test1:the range checks for the
array[otherArray.length]loads constantfold:
otherArray.lengthis aCastIIof i at theotherArrayallocation.
iis less than 9. TheCastIIat the allocationnarrows the type down further to
[0-9].the
array[otherArray.length]loads are control dependent on theunrelated:
test. There's an identical dominating test which replaces that one. As
a consequence, the
array[otherArray.length]loads become controldependent on the dominating test.
CastIInodes at theotherArrayallocations are replaced by adominating range check
CastIInodes for:CastIInodes are removed and the2
array[otherArray.length]loads common at the first:test before the:
and
that guarantee
iis positive.test1is called withi = -1, the array load proceeds with an outof bounds index and the crash occurs.
test2andtest3are mostly identical except for the check that'seliminated (a null divisor check) and the instruction that causes a
fault (an integer division).
The fix I propose is to not eliminate range check
CastIInodes afterloop opts. When range check
CastIInodes were introduced, performancewas observed to regress. Removing them after loop opts was found to
preserve both correctness and performance. Today, the performance
regression still exists when
CastIInodes are left in. So I proposewe keep them until the end of optimizations (so the 2 array loads
above don't lose a dependency and wrongly common) but remove them at
the end of all optimizations.
In the case of the array loads, they are dependent on a range check
for another array through a range check
CastIIand we must not losethat dependency otherwise the array loads could float above the range
check at gcm time. I propose we deal with that problem the way it's
handled for
CastPPnodes: add the dependency to the load (ordivision)nodes as a precedence edge when the cast is removed.
@TobiHartmann ran performance testing for that patch (Thanks!) and reported
no regression.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377$ git checkout pull/18377Update a local copy of the PR:
$ git checkout pull/18377$ git pull https://git.openjdk.org/jdk.git pull/18377/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18377View PR using the GUI difftool:
$ git pr show -t 18377Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18377.diff
Webrev
Link to Webrev Comment