Skip to content

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0 #2615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0 #2615

wants to merge 7 commits into from

Conversation

gerard-ziemski
Copy link

@gerard-ziemski gerard-ziemski commented Feb 17, 2021

On Mac ARM hardware running x86 JDK under Rosetta emulation, a div by 0 instruction causes the VM to crash.

The proposed fix (a workaround) for hotspot is to add FPE_FLTINV to the signal handler.

The actual fix needs to be done in macOS by Apple as the expected signal type here is FPE_FLTDIV
This issue has been filed with Apple and they are tracking it.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8261397: Try Catch Method Failing to Work When Dividing An Integer By 0

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2615/head:pull/2615
$ git checkout pull/2615

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 17, 2021

👋 Welcome back gziemski! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 17, 2021

@gerard-ziemski The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 17, 2021
@gerard-ziemski gerard-ziemski marked this pull request as ready for review February 17, 2021 19:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 17, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 17, 2021

Webrevs

@gerard-ziemski
Copy link
Author

gerard-ziemski commented Feb 17, 2021

Simple native test case for those curious to try it out for themselves:

/*
 Natively on x86_64 we get:
   sigHandler caught sig: 8, info->si_code: 7 [FPE_INTDIV]
 Under Rossetta on M1 for x86_64 binary we get:
   sigHandler caught sig: 8, info->si_code: 5 [FPE_FLTINV]
 */
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
static void sigHandler(int sig, siginfo_t* info, void* arg) {
  printf("sigHandler caught sig: %d, info->si_code: %d [", sig, info->si_code);
  switch (info->si_code) {
    case FPE_FLTDIV: printf("FPE_FLTDIV");break;
    case FPE_INTDIV: printf("FPE_INTDIV");break;
    case FPE_FLTINV: printf("FPE_FLTINV");break;
    default: printf("???");
  }
  printf("]\n");
  exit(sig);
}
int main(int argc, const char * argv[]) {
  struct sigaction sa;
  sigemptyset(&sa.sa_mask);
  sa.sa_sigaction = sigHandler;
  sa.sa_flags = SA_SIGINFO|SA_RESETHAND;
  sigaction(SIGFPE, &sa, NULL);
  volatile int i = 0;
  volatile int j = 47 / i;
  return j;
}

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Remark below. Looks good otherwise.

Cheers, Thomas

@openjdk
Copy link

openjdk bot commented Feb 17, 2021

@gerard-ziemski This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261397: Try Catch Method Failing to Work When Dividing An Integer By 0

Reviewed-by: stuefe, prr, dcubed, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1008 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 17, 2021
Copy link
Contributor

@prrace prrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I suppose there is a BSD unix port that uses this code hence the MACOS_ONLY ?

@gerard-ziemski
Copy link
Author

Looks reasonable to me. I suppose there is a BSD unix port that uses this code hence the MACOS_ONLY ?

Exactly, any BSD platform uses this implementation. macOS is just one of them.

Thanks Phil!

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up. Nice job on keeping the work around very focused.

@gerard-ziemski
Copy link
Author

Thumbs up. Nice job on keeping the work around very focused.

Thank you Dan!

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gerard,

I'm concerned by the general problem of encountering new bugs because we are running on an x64 emulation mode on ARM! Is there any way to detect this Rosetta environment? Is there a way to know from the hs_err file (we don't have one for this issue yet) that we executed under Rosetta?

How will we track this bug on Apple's side, such that we can eventually remove this workaround?

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Thanks,
David

@gerard-ziemski
Copy link
Author

Hi Gerard,

I'm concerned by the general problem of encountering new bugs because we are running on an x64 emulation mode on ARM! Is there any way to detect this Rosetta environment?

We can detect if we are running under emulation using this code provided by Apple https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment

Is there a way to know from the hs_err file (we don't have one for this issue yet) that we executed under Rosetta?

One can infer such scenario by looking at the log under the system section:

OS:uname: Darwin 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101 x86_64

We see RELEASE_ARM64_T8101 type kernel running x86_64 code, but we should detect this and report it to users directly. Filed a followup issue JDK-8261966 to track it. Thank you for the bringing this point up!

How will we track this bug on Apple's side, such that we can eventually remove this workaround?

I have filed the issue a while ago with Apple here https://feedbackassistant.apple.com/feedback/8984455, but please note that you probably need Mac developer account to be able to see it.

Apple has set the resolution of this issue as Potential fix identified - For a future OS update, but we will always need it for the already released macOS version.

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN. I will test this scenario out and report back...

Regardless though, we absolutely need this workaround for the time being, though we might want to use only in emulation environment.

@gerard-ziemski
Copy link
Author

gerard-ziemski commented Feb 18, 2021

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN. I will test this scenario out and report back...

According to the man pages for log() function:

log(x), log2(x), and log10(x) return a NaN and raise the "invalid" floating-point exception for x < 0.

and the man pages for sqrt() function:

sqrt(x) returns a NaN and generates a domain error for x < 0

but what I see in our signal handler is a signal of type FPE_FLTDIV returned not FPE_FLTINV for sqrt(-1.0) and log(-1.0)

I can't seem to be able to get FPE_FLTINV at all - it's seems that it's unused natively on macOS x86_64 so we are good I think.

@mlbridge
Copy link

mlbridge bot commented Feb 19, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Gerard,

On 19/02/2021 5:02 am, Gerard Ziemski wrote:

On Thu, 18 Feb 2021 16:28:38 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

The workaround itself seems okay as far as it goes but what happens if native code triggers a real FPE_FLTINV that we will now treat as "div by zero"?

Excellent question. According to https://en.wikipedia.org/wiki/IEEE_754 invalid operation is `mathematically undefined, e.g., the square root of a negative number. By default, returns qNaN.` I will test this scenario out and report back...

According to the **man pages** for `log()` function:

`log(x), log2(x), and log10(x) return a NaN and raise the "invalid" floating-point exception for x < 0.`

and the **man pages** for `sqrt()` function:

`sqrt(x) returns a NaN and generates a domain error for x < 0`

but what I see in our signal handler is a signal of type `FPE_FLTDIV` returned **not** `FPE_FLTINV` for `sqrt(-1.0)` and `log(-1.0)`

Thanks for the further investigation here.

Possibly more bugs in FP handling. :(

Though I'm unclear how to actually trigger SIGFPE in these cases - don't
we have to enable traps at the CPU level via the FPU control word? I
placed this in os::init_2() on Linux:

double d = ::sqrt(-1.0);
printf("%f\n", d);

and it just prints "-nan" with no signal generated.

I can't seem to be able to get `FPE_FLTINV` at all - it's seems that it's unused natively on `macOS x86_64` so we are good I think.

Ok, but I think we should fix JDK-8261966 and include
VM_Version::is_cpu_emulated() so that we use it to gate the workaround
at runtime. That at least minimises the exposure to the environments
that actually need the workaround.

Thanks,
David

@gerard-ziemski
Copy link
Author

I can't seem to be able to get FPE_FLTINV at all - it's seems that it's unused natively on macOS x86_64 so we are good I think.

Ok, but I think we should fix JDK-8261966 and include
VM_Version::is_cpu_emulated() so that we use it to gate the workaround
at runtime. That at least minimises the exposure to the environments
that actually need the workaround.

I added the check in this fix - I don't think we should get this issue blocked by JDK-8261966

For JDK-8261966 we can further optimize the check to cache the results, if needed, and we can have a non rushed discussion on how exactly we want the API named (ex. I would prefer to follow Apple's usage and name it is_process_translated()) and how exactly we want it to look in the hs_err log files.

cheers

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gerard,

The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.

I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.

Thanks,
David

@gerard-ziemski
Copy link
Author

Hi Gerard,

The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.

I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.

Thanks,
David

I did not realize that there was vm_version_bsd_x86.cpp available till you pointed it out.

Fixed.

@gerard-ziemski
Copy link
Author

Hi Gerard,
The new VM_Version check should be in vm_version_bsd_x86.cpp so that we don't need to ifdef the code in the generic x86 file.
I don't think we need make is_cpu_emulated available on all platforms, jusyt bsd-x86.
Thanks,
David

I did not realize that there was vm_version_bsd_x86.cpp available till you pointed it out.

Fixed.

Hmm, I still think we need #ifdef __APPLE__ here as well, there could be other BSD x86 OSes...

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gerard,

Thanks for making the changes. I had hoped to completely isolate this to BSD/x86 (with Apple ifdefs) but it seems we are a bit inconsistent in platform management with VM_version code: we have os_cpu .cpp files but only cpu header files! Can't quite see the logic in that. :)

David

@gerard-ziemski
Copy link
Author

/integrate

@openjdk openjdk bot closed this Feb 23, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 23, 2021
@openjdk
Copy link

openjdk bot commented Feb 23, 2021

@gerard-ziemski Since your change was applied there have been 1022 commits pushed to the master branch:

  • 8a2f589: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation
  • 67762de: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code
  • 9d9bedd: 8262094: Handshake timeout scaled wrong
  • 29c7263: 8252709: Enable JVMCI when building linux-aarch64 at Oracle
  • 12f6ba0: 8262087: Use atomic boolean type in G1FullGCAdjustTask
  • a5c4b9a: 8260223: Handling of unnamed package in javadoc pages
  • 8cfea7c: 8261921: ClassListParser::current should be used only by main thread
  • 991f7c1: 8210373: Deadlock in libj2gss.so when loading "j2gss" and "net" libraries in parallel.
  • 0217d69: 8261975: Missing "classpath exception" in VectorSupport.java
  • f2bde05: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229
  • ... and 1012 more: https://git.openjdk.java.net/jdk/compare/bbc44f57c4714a49e72a09f0b9d05765d6b41e9b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 0257caa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@gerard-ziemski gerard-ziemski deleted the JDK-8261397 branch February 23, 2021 19:53
@openjdk
Copy link

openjdk bot commented Mar 29, 2021

@VladimirKempik Unknown command backport - for a list of valid commands use /help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants