Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map #479

Closed
wants to merge 2 commits into from

Conversation

@nick-arm
Copy link
Member

@nick-arm nick-arm commented Jul 13, 2021

This happens reliably in TestLWorld::test9() scenario 0 on AArch64:

  # A fatal error has been detected by the Java Runtime Environment:
  #
  # Internal Error (/mnt/nicgas01-pc/valhalla/src/hotspot/share/opto/buildOopMap.cpp:360), pid=8866, tid=8882
  # assert(false) failed: there should be a oop in OopMap instead of a live raw oop at safepoint
  #

The crash can also be reproduced on x86 by running with -XX:+OptoScheduling (this is the default on AArch64).

The problem seems to be caused by a CheckCastPP node whose input is a raw pointer being scheduled after a SafePoint node such that the raw pointer is live in a register over the safepoint.

Before scheduling we have a basic block like:

  R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
  R0      84  checkCastPP  ===  11  73  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
          6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)

But after scheduling this is transformed into:

  R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
          6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  | 164  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
  ...
  R0      84  checkCastPP  ===  11  73  | 67  68  69  70  71  72  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)

Where R0 is holding the live raw pointer over the safepoint, which triggers the assertion failure.

The fix here is to add a precedence edge from any CheckCastPP with a raw pointer input to the following safepoint, which prevents them being rearranged. I'm not very familiar with this code so I can't be sure this is the correct solution, but the same logic exists in GCM's PhaseCFG::schedule_late().


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/valhalla pull/479/head:pull/479
$ git checkout pull/479

Update a local copy of the PR:
$ git checkout pull/479
$ git pull https://git.openjdk.java.net/valhalla pull/479/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 479

View PR using the GUI difftool:
$ git pr show -t 479

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/valhalla/pull/479.diff

…low::build_oop_map

This happens reliably in TestLWorld::test9() scenario 0 on AArch64:

  # A fatal error has been detected by the Java Runtime Environment:
  #
  # Internal Error (/mnt/nicgas01-pc/valhalla/src/hotspot/share/opto/buildOopMap.cpp:360), pid=8866, tid=8882
  # assert(false) failed: there should be a oop in OopMap instead of a live raw oop at safepoint
  #

The crash can also be reproduced on x86 by running with
-XX:+OptoScheduling (this is the default on AArch64).

The problem seems to be caused by a CheckCastPP node whose input is a
raw pointer being scheduled after a SafePoint node such that the raw
pointer is live in a register over the safepoint.

Before scheduling we have a basic block like:

  R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
  R0      84  checkCastPP  ===  11  73  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
          6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)

But after scheduling this is transformed into:

  R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
  ...
          6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  | 164  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
  ...
  R0      84  checkCastPP  ===  11  73  | 67  68  69  70  71  72  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)

Where R0 is holding the live raw pointer over the safepoint, which
triggers the assertion failure.

The fix here is to add a precedence edge from any CheckCastPP with a raw
pointer input to the following safepoint, which prevents them being
rearranged. I'm not very familiar with this code so I can't be sure this
is the correct solution, but the same logic exists in GCM's
PhaseCFG::schedule_late().
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jul 13, 2021

👋 Welcome back ngasson! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Jul 13, 2021

@nick-arm This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map

Reviewed-by: thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 597 new commits pushed to the lworld branch:

  • 9fe2304: 8271405: [lworld] Redo test/jdk/java/lang/invoke/VarHandles changes for JDK-8269956
  • ecca7a0: 8271536: [lworld] VerifyError in hotspot/jtreg/runtime/classFileParserBug/NameAndTypeSig.java
  • 44784e4: 8271508: [lworld] disallow primitive classes with super_class of 0
  • bdf2799: 8271544: [lworld] GraphBuilder::withfield should handle identity class holder
  • 2ca8eba: Merge jdk
  • a066c7b: 8270086: ARM32-softfp: Do not load CONSTANT_double using the condy helper methods in the interpreter
  • 072fe48: 8270901: Typo PHASE_CPP in CompilerPhaseType
  • d7b5cb6: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization
  • ecd4455: 8266510: Nimbus JTree default tree cell renderer does not use selected text color
  • d994b93: 8266054: VectorAPI rotate operation optimization
  • ... and 587 more: https://git.openjdk.java.net/valhalla/compare/c207165b8606d1b4505f52be6362681d61a9fc7e...lworld

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jul 13, 2021

Webrevs

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Is this specific to Valhalla and if so, do you know why?

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Jul 21, 2021

Is this specific to Valhalla and if so, do you know why?

It seems to be specific to Valhalla, so this is probably the wrong fix.

    public Object test9() {
        Object o = valueField1;
        for (int i = 1; i < 100; i *= 2) {
            MyValue1 v = (MyValue1)o;
            o = MyValue1.setX(v, v.x + 1);
        }
        return o;
    }

The problem seems to be related to the call to InlineTypeNode::buffer() that happens in Parse::return_current() when we have an inline type but need to return an oop. The CheckCastPP node above is the buffered return value.

I uploaded a screenshot from IGV here:

https://bugs.openjdk.java.net/secure/attachment/95580/checkcast-igv.png

There doesn't seem to be anything stopping the CheckCastPP node 321 floating past the SafePoint node 183 while its allocation in node 390 is on the other side? If I try the above test but make MyValue1 a normal non-primitive class, the CheckCastPP node from the inlined allocation in MyValue1.setX() is an input to the loop safepoint which prevents that.

@TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Aug 3, 2021

I finally got a chance to debug this. Here's what I think is going on that makes this specific to Valhalla / inline types (based on TestLWorld::test9):

  1. We buffer the inline type returned by MyValue1.setX in the loop right before the safepoint (for example, because -XX:+AlwaysIncrementalInline -XX:-InlineTypeReturnedAsFields are set). The corresponding CheckCastPP is connected to the safepoint.
  2. On return, we re-use the CheckCastPP from that allocation instead of allocating again.
  3. Scalarization replaces the CheckCastPP safepoint usage, allowing it to flow below the safepoint during scheduling.

Therefore, I think your fix is correct. Maybe add a comment explaining the details of how this can happen.

Of course, it's unfortunate that the return keeps the allocation(s) in the loop alive when it would be sufficient to allocate only on return. However, I don't think we can easily fix this and it's hopefully an edge case.

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Aug 3, 2021

Thanks for looking into this @TobiHartmann. I've added some more explanation to the comment.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Looks good.

@nick-arm
Copy link
Member Author

@nick-arm nick-arm commented Aug 3, 2021

/integrate

@openjdk openjdk bot added the sponsor label Aug 3, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Aug 3, 2021

@nick-arm
Your change (at version a31c3ec) is now ready to be sponsored by a Committer.

@TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Aug 3, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Aug 3, 2021

Going to push as commit ca9a0bc.
Since your change was applied there have been 597 commits pushed to the lworld branch:

  • 9fe2304: 8271405: [lworld] Redo test/jdk/java/lang/invoke/VarHandles changes for JDK-8269956
  • ecca7a0: 8271536: [lworld] VerifyError in hotspot/jtreg/runtime/classFileParserBug/NameAndTypeSig.java
  • 44784e4: 8271508: [lworld] disallow primitive classes with super_class of 0
  • bdf2799: 8271544: [lworld] GraphBuilder::withfield should handle identity class holder
  • 2ca8eba: Merge jdk
  • a066c7b: 8270086: ARM32-softfp: Do not load CONSTANT_double using the condy helper methods in the interpreter
  • 072fe48: 8270901: Typo PHASE_CPP in CompilerPhaseType
  • d7b5cb6: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization
  • ecd4455: 8266510: Nimbus JTree default tree cell renderer does not use selected text color
  • d994b93: 8266054: VectorAPI rotate operation optimization
  • ... and 587 more: https://git.openjdk.java.net/valhalla/compare/c207165b8606d1b4505f52be6362681d61a9fc7e...lworld

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Aug 3, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Aug 3, 2021

@TobiHartmann @nick-arm Pushed as commit ca9a0bc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants