8261397: Try Catch Method Failing to Work When Dividing An Integer By 0 #2615

gerard-ziemski · 2021-02-17T19:06:45Z

On Mac ARM hardware running x86 JDK under Rosetta emulation, a div by 0 instruction causes the VM to crash.

The proposed fix (a workaround) for hotspot is to add FPE_FLTINV to the signal handler.

The actual fix needs to be done in macOS by Apple as the expected signal type here is FPE_FLTDIV
This issue has been filed with Apple and they are tracking it.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8261397: Try Catch Method Failing to Work When Dividing An Integer By 0

Reviewers

Thomas Stuefe (@tstuefe - Reviewer) ⚠️ Review applies to 0b81551
Phil Race (@prrace - Reviewer) ⚠️ Review applies to 8291664
Daniel D. Daugherty (@dcubed-ojdk - Reviewer) ⚠️ Review applies to 8291664
David Holmes (@dholmes-ora - Reviewer)

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2615/head:pull/2615
$ git checkout pull/2615

bridgekeeper · 2021-02-17T19:06:54Z

👋 Welcome back gziemski! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-02-17T19:08:00Z

@gerard-ziemski The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-02-17T19:19:48Z

Webrevs

gerard-ziemski · 2021-02-17T19:30:50Z

Simple native test case for those curious to try it out for themselves:

/*
 Natively on x86_64 we get:
   sigHandler caught sig: 8, info->si_code: 7 [FPE_INTDIV]
 Under Rossetta on M1 for x86_64 binary we get:
   sigHandler caught sig: 8, info->si_code: 5 [FPE_FLTINV]
 */
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
static void sigHandler(int sig, siginfo_t* info, void* arg) {
  printf("sigHandler caught sig: %d, info->si_code: %d [", sig, info->si_code);
  switch (info->si_code) {
    case FPE_FLTDIV: printf("FPE_FLTDIV");break;
    case FPE_INTDIV: printf("FPE_INTDIV");break;
    case FPE_FLTINV: printf("FPE_FLTINV");break;
    default: printf("???");
  }
  printf("]\n");
  exit(sig);
}
int main(int argc, const char * argv[]) {
  struct sigaction sa;
  sigemptyset(&sa.sa_mask);
  sa.sa_sigaction = sigHandler;
  sa.sa_flags = SA_SIGINFO|SA_RESETHAND;
  sigaction(SIGFPE, &sa, NULL);
  volatile int i = 0;
  volatile int j = 47 / i;
  return j;
}

tstuefe

Seems reasonable. Remark below. Looks good otherwise.

Cheers, Thomas

src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp

openjdk · 2021-02-17T19:41:16Z

@gerard-ziemski This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0

Reviewed-by: stuefe, prr, dcubed, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1008 new commits pushed to the master branch:

7b924d8: 8261973: Shenandoah: Cleanup/simplify root verifier
63f8fc8: 8259662: Don't wrap SocketExceptions into SSLExceptions in SSLSocketImpl
cf0019d: 8259942: Enable customizations in CompileJavaModules.gmk and Main.gmk
a6a7e43: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions
d7eebda: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread
5b7b18c: 8259800: timeout in tck test testForkJoin(ForkJoinPool8Test)
419717d: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor
011f5a5: 8261799: Remove unnecessary cast in psParallelCompact.hpp
e9d7c07: 8248318: Remove superfluous use of boxing in ObjectStreamClass
6b7575b: 8228748: Remove GCLocker::_doing_gc
... and 998 more: https://git.openjdk.java.net/jdk/compare/bbc44f57c4714a49e72a09f0b9d05765d6b41e9b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

prrace

Looks reasonable to me. I suppose there is a BSD unix port that uses this code hence the MACOS_ONLY ?

gerard-ziemski · 2021-02-17T21:17:24Z

Looks reasonable to me. I suppose there is a BSD unix port that uses this code hence the MACOS_ONLY ?

Exactly, any BSD platform uses this implementation. macOS is just one of them.

Thanks Phil!

dcubed-ojdk

Thumbs up. Nice job on keeping the work around very focused.

src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp

gerard-ziemski · 2021-02-17T22:37:04Z

Thumbs up. Nice job on keeping the work around very focused.

Thank you Dan!

dholmes-ora

Hi Gerard,

I'm concerned by the general problem of encountering new bugs because we are running on an x64 emulation mode on ARM! Is there any way to detect this Rosetta environment? Is there a way to know from the hs_err file (we don't have one for this issue yet) that we executed under Rosetta?

How will we track this bug on Apple's side, such that we can eventually remove this workaround?

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Thanks,
David

gerard-ziemski · 2021-02-18T16:28:38Z

Hi Gerard,

I'm concerned by the general problem of encountering new bugs because we are running on an x64 emulation mode on ARM! Is there any way to detect this Rosetta environment?

We can detect if we are running under emulation using this code provided by Apple https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment

Is there a way to know from the hs_err file (we don't have one for this issue yet) that we executed under Rosetta?

One can infer such scenario by looking at the log under the system section:

OS:uname: Darwin 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101 x86_64

We see RELEASE_ARM64_T8101 type kernel running x86_64 code, but we should detect this and report it to users directly. Filed a followup issue JDK-8261966 to track it. Thank you for the bringing this point up!

How will we track this bug on Apple's side, such that we can eventually remove this workaround?

I have filed the issue a while ago with Apple here https://feedbackassistant.apple.com/feedback/8984455, but please note that you probably need Mac developer account to be able to see it.

Apple has set the resolution of this issue as Potential fix identified - For a future OS update, but we will always need it for the already released macOS version.

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN. I will test this scenario out and report back...

Regardless though, we absolutely need this workaround for the time being, though we might want to use only in emulation environment.

gerard-ziemski · 2021-02-18T18:59:51Z

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN. I will test this scenario out and report back...

According to the man pages for log() function:

log(x), log2(x), and log10(x) return a NaN and raise the "invalid" floating-point exception for x < 0.

and the man pages for sqrt() function:

sqrt(x) returns a NaN and generates a domain error for x < 0

but what I see in our signal handler is a signal of type FPE_FLTDIV returned not FPE_FLTINV for sqrt(-1.0) and log(-1.0)

I can't seem to be able to get FPE_FLTINV at all - it's seems that it's unused natively on macOS x86_64 so we are good I think.

mlbridge · 2021-02-19T04:47:38Z

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Gerard,

On 19/02/2021 5:02 am, Gerard Ziemski wrote:

On Thu, 18 Feb 2021 16:28:38 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is `mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN.` I will test this scenario out and report back...

According to the **man pages** for `log()` function:

`log(x), log2(x), and log10(x) return a NaN and raise the "invalid" floating-point exception for x < 0.`

and the **man pages** for `sqrt()` function:

`sqrt(x) returns a NaN and generates a domain error for x < 0`

but what I see in our signal handler is a signal of type `FPE_FLTDIV` returned **not** `FPE_FLTINV` for `sqrt(-1.0)` and `log(-1.0)`

Thanks for the further investigation here.

Possibly more bugs in FP handling. :(

Though I'm unclear how to actually trigger SIGFPE in these cases - don't
we have to enable traps at the CPU level via the FPU control word? I
placed this in os::init_2() on Linux:

double d = ::sqrt(-1.0);
printf("%f\n", d);

and it just prints "-nan" with no signal generated.

I can't seem to be able to get `FPE_FLTINV` at all - it's seems that it's unused natively on `macOS x86_64` so we are good I think.

Ok, but I think we should fix JDK-8261966 and include
VM_Version::is_cpu_emulated() so that we use it to gate the workaround
at runtime. That at least minimises the exposure to the environments
that actually need the workaround.

Thanks,
David

gerard-ziemski · 2021-02-19T18:01:46Z

I can't seem to be able to get FPE_FLTINV at all - it's seems that it's unused natively on macOS x86_64 so we are good I think.

Ok, but I think we should fix JDK-8261966 and include
VM_Version::is_cpu_emulated() so that we use it to gate the workaround
at runtime. That at least minimises the exposure to the environments
that actually need the workaround.

I added the check in this fix - I don't think we should get this issue blocked by JDK-8261966

For JDK-8261966 we can further optimize the check to cache the results, if needed, and we can have a non rushed discussion on how exactly we want the API named (ex. I would prefer to follow Apple's usage and name it is_process_translated()) and how exactly we want it to look in the hs_err log files.

cheers

dholmes-ora

Hi Gerard,

The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.

I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.

Thanks,
David

gerard-ziemski · 2021-02-22T19:23:13Z

Hi Gerard,

The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.

I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.

Thanks,
David

I did not realize that there was vm_version_bsd_x86.cpp available till you pointed it out.

Fixed.

gerard-ziemski · 2021-02-22T19:28:17Z

Hi Gerard,
The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.
I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.
Thanks,
David

I did not realize that there was vm_version_bsd_x86.cpp available till you pointed it out.

Fixed.

Hmm, I still think we need #ifdef __APPLE__ here as well, there could be other BSD x86 OSes...

dholmes-ora

Hi Gerard,

Thanks for making the changes. I had hoped to completely isolate this to BSD/x86 (with Apple ifdefs) but it seems we are a bit inconsistent in platform management with VM_version code: we have os_cpu .cpp files but only cpu header files! Can't quite see the logic in that. :)

David

gerard-ziemski · 2021-02-23T16:38:08Z

/integrate

openjdk · 2021-02-23T16:39:16Z

@gerard-ziemski Since your change was applied there have been 1022 commits pushed to the master branch:

8a2f589: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation
67762de: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code
9d9bedd: 8262094: Handshake timeout scaled wrong
29c7263: 8252709: Enable JVMCI when building linux-aarch64 at Oracle
12f6ba0: 8262087: Use atomic boolean type in G1FullGCAdjustTask
a5c4b9a: 8260223: Handling of unnamed package in javadoc pages
8cfea7c: 8261921: ClassListParser::current should be used only by main thread
991f7c1: 8210373: Deadlock in libj2gss.so when loading "j2gss" and "net" libraries in parallel.
0217d69: 8261975: Missing "classpath exception" in VectorSupport.java
f2bde05: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229
... and 1012 more: https://git.openjdk.java.net/jdk/compare/bbc44f57c4714a49e72a09f0b9d05765d6b41e9b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 0257caa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk · 2021-03-29T08:53:48Z

@VladimirKempik Unknown command backport - for a list of valid commands use /help.

add FPE_FLTINV to div by 0 signal handler

8615373

openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 17, 2021

gerard-ziemski marked this pull request as ready for review February 17, 2021 19:14

openjdk bot added the rfr Pull request is ready for review label Feb 17, 2021

apply new check to macOS only

0b81551

tstuefe approved these changes Feb 17, 2021

View reviewed changes

src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp Outdated Show resolved Hide resolved

openjdk bot added the ready Pull request is ready to be integrated label Feb 17, 2021

add comment

8291664

prrace approved these changes Feb 17, 2021

View reviewed changes

dcubed-ojdk approved these changes Feb 17, 2021

View reviewed changes

src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp Outdated Show resolved Hide resolved

dholmes-ora reviewed Feb 18, 2021

View reviewed changes

check whether the process is translated

65dd76d

dholmes-ora suggested changes Feb 22, 2021

View reviewed changes

gerard-ziemski added 2 commits February 22, 2021 13:12

only add is_cpu_emulated() to bsd platform, where currently needed

3bfe8eb

move the API from vm_version_x86 to vm_version_bsd_x86

cadaa95

wrap macOS specific code in #ifdef __APPLE__

035adbf

dholmes-ora approved these changes Feb 22, 2021

View reviewed changes

openjdk bot closed this Feb 23, 2021

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 23, 2021

gerard-ziemski deleted the JDK-8261397 branch February 23, 2021 19:53

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0 #2615

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0 #2615

Uh oh!

Conversation

gerard-ziemski commented Feb 17, 2021 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Download

Uh oh!

bridgekeeper bot commented Feb 17, 2021

Uh oh!

openjdk bot commented Feb 17, 2021

Uh oh!

mlbridge bot commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

gerard-ziemski commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tstuefe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openjdk bot commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prrace left a comment

Choose a reason for hiding this comment

Uh oh!

gerard-ziemski commented Feb 17, 2021

Uh oh!

dcubed-ojdk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gerard-ziemski commented Feb 17, 2021

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

gerard-ziemski commented Feb 18, 2021

Uh oh!

gerard-ziemski commented Feb 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Feb 19, 2021

Uh oh!

gerard-ziemski commented Feb 19, 2021

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

gerard-ziemski commented Feb 22, 2021

Uh oh!

gerard-ziemski commented Feb 22, 2021

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

gerard-ziemski commented Feb 23, 2021

Uh oh!

openjdk bot commented Feb 23, 2021

Uh oh!

openjdk bot commented Mar 29, 2021

Uh oh!

Uh oh!

gerard-ziemski commented Feb 17, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Feb 17, 2021 •

edited

Loading

gerard-ziemski commented Feb 17, 2021 •

edited

Loading

openjdk bot commented Feb 17, 2021 •

edited

Loading

gerard-ziemski commented Feb 18, 2021 •

edited

Loading