Skip to content
This repository has been archived by the owner. It is now read-only.

8258384: AArch64: SVE verify_ptrue fails on some tests #50

Closed
wants to merge 3 commits into from

Conversation

nsjian
Copy link

@nsjian nsjian commented Dec 18, 2020

After applying [1], some Vector API tests fail with SIGILL on SVE
system. The SIGILL was triggered by verify_ptrue before c2 compiled
function returns, which means that the preserved p7 register (as ptrue)
has been clobbered before returning to c2 compiled code. (p7 is not
preserved cross function calls, and system calls [2]).

Currently we try to reinitialize ptrue at each entrypoint of returning
from non-c2 compiled code, which indicating possible C or system calls.
However, there's still one entrypoint missing, exception handling, as
we may jump to c2 compiled code for exception handler. See
OptoRuntime::generate_exception_blob().

Adding reinitialize_ptrue before jumping back to c2 compiled code in
generate_exception_blob() could solve those Vector API test failures.
Actually I had that in my initial test patch [3], I don't know why I
missed that in final patch... I reran tests with the same approach of
[3] and found that there's still something missing, the
nmethod_entry_barrier() in c2 function prolog. The barrier may call to
runtime code (see generate_method_entry_barrier()). To reduce the risk
of missing such reinitialize_ptrue in newly added code in future, I
think it would be better to do the reinitialize in
pop_call_clobbered_registers().

P.S. the SIGILL message is also not clear, it should print detailed
message as indicated by MacroAssembler::stop() call. This is caused by
JDK-8255711 removing the message printing code. This will be fixed by JDK-8259539.

Tested with tier1-3 on SVE hardware. Also verified with the same
approach of patch [3] with jtreg tests hotspot_all_no_apps and
jdk:tier1-3 passed without incorrect ptrue value assertion failure.

[1] openjdk/jdk#1621
[2] https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst
[3] http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8258384: AArch64: SVE verify_ptrue fails on some tests

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/50/head:pull/50
$ git checkout pull/50

After applying [1], some Vector API tests fail with SIGILL on SVE
system. The SIGILL was triggered by verify_ptrue before c2 compiled
function returns, which means that the preserved p7 register (as ptrue)
has been clobbered before returning to c2 compiled code. (p7 is not
preserved cross function calls, and system calls [2]).

Currently we try to reinitialize ptrue at each entrypoint of returning
from non-c2 compiled code, which indicating possible C or system calls.
However, there's still one entrypoint missing, exception handling, as
we may jump to c2 compiled code for exception handler. See
OptoRuntime::generate_exception_blob().

Adding reinitialize_ptrue before jumping back to c2 compiled code in
generate_exception_blob() could solve those Vector API failures.
Actually I had that in my initial test patch [3], I don't know why I
missed that in final patch... I reran tests with the same approach of
[3] and found that there's still something missing, the
nmethod_entry_barrier() in c2 function prolog. The barrier may call to
runtime code (see generate_method_entry_barrier()). To reduce the risk
of missing such reinitialize_ptrue in newly added code in future, I
think it would be better to do the reinitialize in
pop_call_clobbered_registers().

P.S. the SIGILL message is also not clear, it should print detailed
message as indicated by MacroAssembler::stop() call. This is caused by
JDK-8255711 removing the message printing code. This patch also adds it
back, so that it could print detailed message for abort.

Tested with tier1-3 on SVE hardware. Also verified with the same
approach of patch [3] with jtreg tests hotspot_all_no_apps and
jdk:tier1-3 passed without incorrect ptrue value assertion failure.

[1] openjdk/jdk#1621
[2] https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst
[3] http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 18, 2020

👋 Welcome back njian! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 18, 2020
@openjdk
Copy link

openjdk bot commented Dec 18, 2020

@nsjian The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Dec 18, 2020
@nsjian
Copy link
Author

nsjian commented Dec 18, 2020

/label add hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.java.net label Dec 18, 2020
@openjdk
Copy link

openjdk bot commented Dec 18, 2020

@nsjian
The hotspot-compiler label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Dec 18, 2020

Webrevs

@nsjian
Copy link
Author

nsjian commented Jan 5, 2021

Can I get a review for jdk16 please?

I think the x86 pre-submit test failure is not related to this patch, which looks like the issue fixed by: #64

va_list detail_args;
VMError::report_and_die(INTERNAL_ERROR, msg, detail_msg, detail_args, thread,
pc, info, NULL, NULL, 0, 0);
va_end(detail_args);
}
Copy link
Contributor

@adinn adinn Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is ok to revert this code. The fix this is part of (for JDK-8255711) was provided explicitly to re-organize the flow of control for handling of fatal errors. Reverting this code appears to undermine the goal of that issue. I would like to get Thomas Stuefe's (@tstuefe) opinion on whether it is appropriate to abort the JVM here vs returning false before accepting this specific change.

Copy link
Member

@tstuefe tstuefe Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I'll take a look.

Copy link
Member

@tstuefe tstuefe Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding that, and for pinging me. Please revert this part of the change, I opened a separate issue for it (also affects ppc64): https://bugs.openjdk.java.net/browse/JDK-8259539.

Copy link
Author

@nsjian nsjian Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Thomas. I have removed the change from my patch. Would you mind taking another look and approving this patch @adinn ?

@adinn
Copy link
Contributor

adinn commented Jan 5, 2021

Hi Ningsheng,

The SVE changes look ok. I'm just unsure about restoring the exit in the signal handler.

@nsjian
Copy link
Author

nsjian commented Jan 5, 2021

Hi Ningsheng,

The SVE changes look ok. I'm just unsure about restoring the exit in the signal handler.

Thank you @adinn for the review! @tstuefe would you mind helping to review the lines @adinn mentioned? Thanks!

adinn
adinn approved these changes Jan 11, 2021
Copy link
Contributor

@adinn adinn left a comment

This looks fine now.

@openjdk
Copy link

openjdk bot commented Jan 11, 2021

@nsjian This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8258384: AArch64: SVE verify_ptrue fails on some tests

Reviewed-by: adinn, ngasson

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 48 new commits pushed to the master branch:

  • 020ec84: 8259429: Update reference to README.md
  • fb68395: 8259014: (so) ServerSocketChannel.bind(UnixDomainSocketAddress)/SocketChannel.bind(UnixDomainSocketAddress) will have unknown user and group owner (win)
  • 677802d: 8258484: AIX build fails in Harfbuzz with XLC 16.01.0000.0006
  • 1973fbe: 8039278: console.sh failed Automatically with exit code 1
  • acdd90b: 8258972: unexpected compilation error with generic sealed interface
  • c1fb521: 8259227: C2 crashes with SIGFPE due to a division that floats above its zero check
  • 484e23b: 8258657: Doc build is broken by use of new language features
  • 4a478b8: 8250903: jdk/jfr/javaagent/TestLoadedAgent.java fails with Mismatch in TestEvent count
  • 4f914e2: 8249633: doclint reports missing javadoc for JavaFX property methods that have a property description
  • eef43be: 8251200: False positive messages about missing comments for serialization
  • ... and 38 more: https://git.openjdk.java.net/jdk16/compare/7afb01dce966e4c00880711ef232af12af755b3a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 11, 2021
@nsjian
Copy link
Author

nsjian commented Jan 12, 2021

Thank you @adinn @nick-arm for the review!

/integrate

@openjdk openjdk bot closed this Jan 12, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 12, 2021
@openjdk
Copy link

openjdk bot commented Jan 12, 2021

@nsjian Since your change was applied there have been 51 commits pushed to the master branch:

  • 2cb271e: 8253996: Javac error on jdk16 build 18: invalid flag: -Xdoclint:-missing
  • d60a937: 8259028: ClassCastException when using custom filesystem with wrapper FileChannel impl
  • e05f36f: 8259043: More Zero architectures need linkage with libatomic
  • 020ec84: 8259429: Update reference to README.md
  • fb68395: 8259014: (so) ServerSocketChannel.bind(UnixDomainSocketAddress)/SocketChannel.bind(UnixDomainSocketAddress) will have unknown user and group owner (win)
  • 677802d: 8258484: AIX build fails in Harfbuzz with XLC 16.01.0000.0006
  • 1973fbe: 8039278: console.sh failed Automatically with exit code 1
  • acdd90b: 8258972: unexpected compilation error with generic sealed interface
  • c1fb521: 8259227: C2 crashes with SIGFPE due to a division that floats above its zero check
  • 484e23b: 8258657: Doc build is broken by use of new language features
  • ... and 41 more: https://git.openjdk.java.net/jdk16/compare/7afb01dce966e4c00880711ef232af12af755b3a...master

Your commit was automatically rebased without conflicts.

Pushed as commit a7e5da2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot hotspot-compiler hotspot-compiler-dev@openjdk.java.net integrated Pull request has been integrated
4 participants