-
Notifications
You must be signed in to change notification settings - Fork 76
8258384: AArch64: SVE verify_ptrue fails on some tests #50
Conversation
After applying [1], some Vector API tests fail with SIGILL on SVE system. The SIGILL was triggered by verify_ptrue before c2 compiled function returns, which means that the preserved p7 register (as ptrue) has been clobbered before returning to c2 compiled code. (p7 is not preserved cross function calls, and system calls [2]). Currently we try to reinitialize ptrue at each entrypoint of returning from non-c2 compiled code, which indicating possible C or system calls. However, there's still one entrypoint missing, exception handling, as we may jump to c2 compiled code for exception handler. See OptoRuntime::generate_exception_blob(). Adding reinitialize_ptrue before jumping back to c2 compiled code in generate_exception_blob() could solve those Vector API failures. Actually I had that in my initial test patch [3], I don't know why I missed that in final patch... I reran tests with the same approach of [3] and found that there's still something missing, the nmethod_entry_barrier() in c2 function prolog. The barrier may call to runtime code (see generate_method_entry_barrier()). To reduce the risk of missing such reinitialize_ptrue in newly added code in future, I think it would be better to do the reinitialize in pop_call_clobbered_registers(). P.S. the SIGILL message is also not clear, it should print detailed message as indicated by MacroAssembler::stop() call. This is caused by JDK-8255711 removing the message printing code. This patch also adds it back, so that it could print detailed message for abort. Tested with tier1-3 on SVE hardware. Also verified with the same approach of patch [3] with jtreg tests hotspot_all_no_apps and jdk:tier1-3 passed without incorrect ptrue value assertion failure. [1] openjdk/jdk#1621 [2] https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst [3] http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
👋 Welcome back njian! A progress list of the required criteria for merging this PR into |
/label add hotspot-compiler |
@nsjian |
Webrevs
|
Can I get a review for jdk16 please? I think the x86 pre-submit test failure is not related to this patch, which looks like the issue fixed by: #64 |
va_list detail_args; | ||
VMError::report_and_die(INTERNAL_ERROR, msg, detail_msg, detail_args, thread, | ||
pc, info, NULL, NULL, 0, 0); | ||
va_end(detail_args); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is ok to revert this code. The fix this is part of (for JDK-8255711) was provided explicitly to re-organize the flow of control for handling of fatal errors. Reverting this code appears to undermine the goal of that issue. I would like to get Thomas Stuefe's (@tstuefe) opinion on whether it is appropriate to abort the JVM here vs returning false before accepting this specific change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding that, and for pinging me. Please revert this part of the change, I opened a separate issue for it (also affects ppc64): https://bugs.openjdk.java.net/browse/JDK-8259539.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Thomas. I have removed the change from my patch. Would you mind taking another look and approving this patch @adinn ?
Hi Ningsheng, The SVE changes look ok. I'm just unsure about restoring the exit in the signal handler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine now.
@nsjian This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 48 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@nsjian Since your change was applied there have been 51 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit a7e5da2. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
After applying [1], some Vector API tests fail with SIGILL on SVE
system. The SIGILL was triggered by verify_ptrue before c2 compiled
function returns, which means that the preserved p7 register (as ptrue)
has been clobbered before returning to c2 compiled code. (p7 is not
preserved cross function calls, and system calls [2]).
Currently we try to reinitialize ptrue at each entrypoint of returning
from non-c2 compiled code, which indicating possible C or system calls.
However, there's still one entrypoint missing, exception handling, as
we may jump to c2 compiled code for exception handler. See
OptoRuntime::generate_exception_blob().
Adding reinitialize_ptrue before jumping back to c2 compiled code in
generate_exception_blob() could solve those Vector API test failures.
Actually I had that in my initial test patch [3], I don't know why I
missed that in final patch... I reran tests with the same approach of
[3] and found that there's still something missing, the
nmethod_entry_barrier() in c2 function prolog. The barrier may call to
runtime code (see generate_method_entry_barrier()). To reduce the risk
of missing such reinitialize_ptrue in newly added code in future, I
think it would be better to do the reinitialize in
pop_call_clobbered_registers().
P.S. the SIGILL message is also not clear, it should print detailed
message as indicated by MacroAssembler::stop() call. This is caused by
JDK-8255711 removing the message printing code. This will be fixed by JDK-8259539.
Tested with tier1-3 on SVE hardware. Also verified with the same
approach of patch [3] with jtreg tests hotspot_all_no_apps and
jdk:tier1-3 passed without incorrect ptrue value assertion failure.
[1] openjdk/jdk#1621
[2] https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst
[3] http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk16 pull/50/head:pull/50
$ git checkout pull/50