-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8286182: C2: crash with SIGFPE when executing compiled code #8726
Conversation
👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into |
@TheRealMDoerr The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
So the fix for JDK-8257822 misses the new nodes added by JDK-8284742. I wonder if this code needs to be adjusted as well: jdk/src/hotspot/share/opto/loopnode.cpp Lines 5780 to 5785 in fa1ca98
@chhagedorn should have a look as well. |
I was wondering about that, too. Also why we have checks for Div/ModI nodes, but not for Div/ModL at some places. We can have loops with long trip count since JDK-8223051. |
Thanks a lot for taking a look at this. I am considering this option, too. The problem is that IIUC this trouble comes from the fact that on x86 a |
Right, I had forgotten that x86 also raises SIGFPE in case of overflow. Catching SIGFPE is possible. I had experimented with --- a/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
+++ b/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
@@ -278,6 +278,15 @@ bool PosixSignals::pd_hotspot_signal_handler(int sig, siginfo_t* info,
pc,
SharedRuntime::
IMPLICIT_DIVIDE_BY_ZERO);
+ if (stub == nullptr) {
+ int op = pc[0];
+ // TODO: make sure to handle all variants used by C2!
+ if (op == 0xF7) {
+ // ignore SIGPFE by speculative div
+ uc->uc_mcontext.gregs[REG_PC] += 4;
+ return true;
+ }
+ }
#else
if (sig == SIGFPE /* && info->si_code == FPE_INTDIV */) {
// HACK: si_code does not work on linux 2.2.12-20!!!
diff --git a/src/hotspot/share/code/compiledMethod.cpp b/src/hotspot/share/code/compiledMethod.cpp
index 83c33408ea3..1c5e197511d 100644
--- a/src/hotspot/share/code/compiledMethod.cpp
+++ b/src/hotspot/share/code/compiledMethod.cpp
@@ -746,7 +746,7 @@ address CompiledMethod::continuation_for_implicit_exception(address pc, bool for
int exception_offset = pc - code_begin();
int cont_offset = ImplicitExceptionTable(this).continuation_offset( exception_offset );
#ifdef ASSERT
- if (cont_offset == 0) {
+ if (cont_offset == 0 && !for_div0_check) {
Thread* thread = Thread::current();
ResourceMark rm(thread);
CodeBlob* cb = CodeCache::find_blob(pc); This works and behaves like PPC64. However, we need to make sure it doesn't show up on any hot path. Otherwise, performance will be terrible. Feel free to create a new PR if you want to propose a solution. I only opened this one to show my findings and can close it again if we decide for something else. |
Btw. it is possible to make the div node dependent on only one check by using the following scheme for a/b with unsigned comparison:
It could be transformed back later to enable implicit div by 0 checks. Sounds complicated and I don't know if this makes sense. |
The fix looks reasonable to address the same problems in JDK-8257822 and JDK-8248552 for the new nodes
Do you have a failing test for this case? We've been seeing a lot of SIGFPE failures lately with Java Fuzzer. I have to walk through them to see if I can find a case that is still failing with this fix. Will get back with the result of this analysis.
I'm not sure if we should rely on signal catching to fix the cases where a division is wrongly floating above its zero check. I think we should not intentionally leave a graph in a broken state with the intention to fix it later at runtime. |
Overflow only happens with dividend = MIN_VALUE and divisor = -1 so it would be hard to have a failing test for this case. I will try to come up with one.
IIUC the graph is broken only because the division raises SIGFPE for invalid inputs. If we choose to ignore the signal instead then we can treat the Div nodes similar to how we treat other nodes such as Add or Sub and let them freely float around. Thanks. |
I completely agree with Christian, we should not let the division float above its zero check for various reasons:
Given that this issue leads to massive failures in our fuzzer testing, I would suggest to back out JDK-8284742 for now and properly re-implement it. |
I agree, too. Catching SIGFPE was an interesting experiment, but I didn't want to propose it as fix. |
I've checked all our fuzzer crashes and the current fix works for all of them. But given that there is potentially another issue with the overflow check mentioned by @merykitty, we could indeed think about backing the changes out for JDK 19 as we are getting closer to the fork. I'd suggest to wait and see for now if @merykitty was able to find a case where we still crash. |
While evaluation is ongoing I have thought a bit about making the div node dependent on a single check which avoids all SIGFPE cases: diff --git a/src/hotspot/share/opto/parse3.cpp b/src/hotspot/share/opto/parse3.cpp
index 7e0f854436b..9d4226571ae 100644
--- a/src/hotspot/share/opto/parse3.cpp
+++ b/src/hotspot/share/opto/parse3.cpp
@@ -484,26 +484,30 @@ void Parse::do_divmod_fixup() {
return;
}
- // The generated graph is equivalent to (in2 == -1) ? -in1 : (in1 / in2)
- // we need to have a separate branch for in2 == -1 due to the special
- // case of min_jint / -1
- Node* cmp = _gvn.transform(CmpNode::make(in2, _gvn.integercon(-1, bt), bt));
- Node* bol = Bool(cmp, BoolTest::eq);
- IfNode* iff = create_and_map_if(control(), bol, PROB_UNLIKELY_MAG(3), COUNT_UNKNOWN);
+ // Check if in2 < -1 or > 1 by one CmpUNode (in2 + 1 >u 1). Reason:
+ // Div node will have to get pinned, but we can only pin to one region.
+ Node* inc = _gvn.transform(AddNode::make(in2, _gvn.integercon(1, bt), bt));
+ Node* cmp = _gvn.transform(CmpUNode::make(inc, _gvn.integercon(1, bt), bt));
+
+ Node* bol = Bool(cmp, BoolTest::gt);
+ IfNode* iff = create_and_map_if(control(), bol, PROB_MAX, COUNT_UNKNOWN);
Node* iff_true = IfTrue(iff);
Node* iff_false = IfFalse(iff);
+
+ Node* res_slow = generate_division(_gvn, iff_true, in1, in2, bc);
+
Node* res_fast = (bc == Bytecodes::_idiv || bc == Bytecodes::_ldiv)
? _gvn.transform(SubNode::make(_gvn.zerocon(bt), in1, bt))
: _gvn.zerocon(bt);
- Node* res_slow = generate_division(_gvn, iff_false, in1, in2, bc);
+
Node* merge = new RegionNode(3);
merge->init_req(1, iff_true);
merge->init_req(2, iff_false);
record_for_igvn(merge);
set_control(_gvn.transform(merge));
Node* res = new PhiNode(merge, Type::get_const_basic_type(bt));
- res->init_req(1, res_fast);
- res->init_req(2, res_slow);
+ res->init_req(1, res_slow);
+ res->init_req(2, res_fast);
res = _gvn.transform(res);
push_result(*this, res, bt);
} This effectively disables implicit div by 0 checks, but I believe they could get repaired. Are there any opinions about this idea? |
I have made a backout for the changes made by JDK-8285390 and JDK-8284742 in #8774 , a more appropriate fix would be submitted for jdk20 as suggested by Tobias and Christian. Thanks. |
Closing in favor of #8774. |
The bug is not assigned to me, but I have seen that the C2 code which checks for div by 0 is not aware of the new nodes from JDK-8284742.
This fixes the VM to pass the reproducer. I'm not sure if more opcode checks are required to get added.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8726/head:pull/8726
$ git checkout pull/8726
Update a local copy of the PR:
$ git checkout pull/8726
$ git pull https://git.openjdk.java.net/jdk pull/8726/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8726
View PR using the GUI difftool:
$ git pr show -t 8726
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8726.diff