Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

8279654: jdk/incubator/vector/Vector256ConversionTests.java crashes randomly with SVE #98

Closed
wants to merge 1 commit into from

Conversation

fg1417
Copy link

@fg1417 fg1417 commented Jan 13, 2022

While testing some vector api cases on SVE system, we see a random failure:
"assert(C->node_arena()->contains(s->_leaf) || !has_new_node (s->_leaf)) failed: duplicating node that's already been matched".

The root cause is that the if-condition in pd_clone_node()[1] is not aligned with predicate conditions[2] of match rules in the backend. Not all StoreVector (VectorStoreMask src) node patterns can be matched and subsumed into one instruction in the backend. For example, there is no match rule to combine them when the element basic type is T_BYTE, as the cost of VectorStoreMask for byte type is relatively low and there is no need to do narrow operations.

Here is the analysis about root cause in detail. When a multi-use VectorStoreMask node, whose type is byte, showed as below,

image

is identified by pd_clone_node() successfully, it will not be set shared in the stage of find_shared() if only the root selector visits
the call node earlier than StoreVector node[3]. The preorder walk in xform()[4][5] visits the VectorStoreMask node and reduces it by ReduceInst() starting from the call node first, then visits the VectorStoreMask node again starting from the StoreVector node as mentioned before. During the second visit, in the stage of Label_Root(), the postorder walk along use-def edges starts from the StoreVector node and then visits the VectorStoreMask node. It takes the VectorStoreMask node as an interior of the subtree and generates the corresponding state tree[6]. But in the stage of ReduceInst_Interior(), there is no internal rule in the backend to reduce the VectorStoreMask node as the state tree guides. Thus, the reducer has to reduce the VectorStoreMask node independently and visit it by ReduceInst()[7]. Therefore, it tries to reduce the VectorStoreMask node twice by ReduceInst(), resulting in the assertion failure. Assuming that there was a match rule to combine these two byte nodes,
StoreVector (VectorStoreMask src), it could find an internal rule in the backend to help skip the interior VectorStoreMask node, do nothing but recurse[8], and no assertion happens. If we delete the check for VectorStoreMask in pd_clone_node(), the VectorStoreMask node is going to be set shared in find_shared() and could be definitely reduced only once with no assertion failure[9][10].

There are two different methods to fix this issue.

The first one is setting the condition in pd_clone_node() the same as matching rules, making sure that all instruction sequences declared in pd_clone_node() can be subsumed by matching rules. However, the condition code has to be revisited when matching rules change in the future. It's not easy to maintain.

The other one is that we can remove the if-condition code for VectorStoreMask in pd_clone_node(). As a result, when a VectorStoreMask node has multiple use, it can't be matched by any chained rules. But after JDK-8273949, a multi-use VectorStoreMask node only exists for safepoint[11], which is not very common. Furthermore, even if a multi-use VectorStoreMask node occurs, the pattern is identified in the pd_clone_node() and a match rule is ready to subsume it in the backend, the combination may not happen due to some unexpected matching order issue. For example, in this case,
image

if we visit the StoreVector node first then the call node, the VectorStoreMask node is still going to be set shared and the prevention from pd_clone_node() doesn't work based on the current logic of find_shared()[12]. Once the node is set shared, there is no code path to use internal combination rule to subsume the chained StoreVector-VectorStoreMask nodes into one machine instruction.

Based on the above mentioned considerations, to make the code more maintainable, we choose the second one, which decouples pd_clone_node() from predicate conditions of matching rules in the backend with minimal impact on performance.

I tested the patch using all vector cases under compiler/vectorization, compiler/vectorapi and jdk/incubator/vector on SVE for multiple times. All tests passed.

[1] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64.ad#L2738
[2] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64_sve.ad#L2101
[3] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2151
[4] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1097
[5] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1119
[6] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1694
[7] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1977
[8] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1975
[9] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1692
[10] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1969
[11] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/vector.cpp#L262
[12] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2125


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8279654: jdk/incubator/vector/Vector256ConversionTests.java crashes randomly with SVE

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk18 pull/98/head:pull/98
$ git checkout pull/98

Update a local copy of the PR:
$ git checkout pull/98
$ git pull https://git.openjdk.java.net/jdk18 pull/98/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 98

View PR using the GUI difftool:
$ git pr show -t 98

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk18/pull/98.diff

…andomly with SVE

While testing some vector api cases on SVE system, we see a random
failure:
"assert(C->node_arena()->contains(s->_leaf) || !has_new_node
(s->_leaf)) failed: duplicating node that's already been matched".

The root cause is that the if-condition in pd_clone_node()[1] is not
aligned with predicate conditions[2] of match rules in the backend.
Not all StoreVector (VectorStoreMask src) node patterns can be matched
and subsumed into one instruction in the backend. For example, there
is no match rule to combine them when the element basic type is
T_BYTE, as the cost of VectorStoreMask for byte type is relative
low and there is no need to do narrow operations.

Here is the analysis about root cause in detail. When a multi-use
VectorStoreMask node, whose type is byte, showed as below,

          |
  VectorStoreMask (T_BYTE)
      /         \
Call/Others     StoreVector (T_BYTE)

is identified by pd_clone_node() successfully, it will not be set
shared in the stage of find_shared() if only the root selector visits
the call node earlier than StoreVector node[3]. The preorder walk
in xform()[4][5] visits the VectorStoreMask node and reduces it by
ReduceInst() starting from the call node first, then visits the
VectorStoreMask node again starting from the StoreVector node as
mentioned before. During the second visit, in the stage of
Label_Root(), the postorder walk along use-def edges starts from the
StoreVector node and then visits the VectorStoreMask node. It takes
the VectorStoreMask node as an interior of the subtree and generates
the corresponding state tree[6]. But in the stage of
ReduceInst_Interior(), there is no internal rule in the backend to
reduce the VectorStoreMask node as the state tree guides. Thus, the
reducer has to reduce the VectorStoreMask node independently and visit
it by ReduceInst()[7]. Therefore, it tries to reduce the VectorStoreMask
node twice by ReduceInst(), resulting in the assertion failure. Assuming
that there was a match rule to combine these two byte nodes,
StoreVector (VectorStoreMask src), it could find an internal rule in the
backend to help skip the interior VectorStoreMask node, do nothing
but recurse[8], and no assertion happens. If we delete the check for
VectorStoreMask in pd_clone_node(), the VectorStoreMask node is going to
be set shared in find_shared() and could be definitely reduced only once
with no assertion failure[9][10].

There are two different methods to fix this issue.

The first one is setting the condition in pd_clone_node() the same as
matching rules, making sure that all instruction sequences declared in
pd_clone_node() can be subsumed by matching rules. However, the
condition code has to be revisited when matching rules change in the
future. It's not easy to maintain.

The other one is that we can remove the if-condition code for
VectorStoreMask in pd_clone_node(). As a result, when a VectorStoreMask
node has multiple use, it can't be matched by any chained rules. But
after JDK-8273949, a multi-use VectorStoreMask node only exists for
safepoint[11], which is not very common. Furthermore, even if a
multi-use VectorStoreMask node occurs, the pattern is identified in the
pd_clone_node() and a match rule is ready to subsume it in the backend,
the combination may not happen due to some unexpected matching order
issue. For example, in this case,

          |
  VectorStoreMask
      /         \
StoreVector  Call/Others

if we visit the StoreVector node first then the call node, the
VectorStoreMask node is still going to be set shared and the prevention
from pd_clone_node() doesn't work based on the current logic of
find_shared()[12]. Once the node is set shared, there is no code path to
use internal combination rule to subsume the chained
StoreVector-VectorStoreMask nodes into one machine instruction.

Based on the above mentioned considerations, to make the code more
maintainable, we choose the second one, which decouples pd_clone_node()
from predicate conditions of matching rules in the backend with minimal
impact on performance.

I tested the patch using all vector cases under compiler/vectorization,
compiler/vectorapi and jdk/incubator/vector on SVE for multiple times.
It passed all these tests.

[1] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64.ad#L2738
[2] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64_sve.ad#L2101
[3] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2151
[4] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1097
[5] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1119
[6] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1694
[7] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1977
[8] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1975
[9] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1692
[10] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1969
[11] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/vector.cpp#L262
[12] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2125

Change-Id: I0a1baba9c49e11813c65d28882b837f33cf478d9
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 13, 2022

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 13, 2022
@openjdk
Copy link

openjdk bot commented Jan 13, 2022

@fg1417 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.java.net label Jan 13, 2022
@mlbridge
Copy link

mlbridge bot commented Jan 13, 2022

Webrevs

Copy link

@nsjian nsjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my bug. The fix looks good to me. Thanks!

@chhagedorn
Copy link
Member

Would be great if someone else could do a second review to get this in before RDP 2 is starting on Thursday. Maybe @nick-arm or @dean-long? Thanks!

@nsjian
Copy link

nsjian commented Jan 18, 2022

Would be great if someone else could do a second review to get this in before RDP 2 is starting on Thursday. Maybe @nick-arm or @dean-long? Thanks!

@nick-arm may not be available to review the code recently. Can @dean-long or @vnkozlov @iwanowww help to review this? Thanks!

Copy link

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing ran by Christian are clean
Approved.

@openjdk
Copy link

openjdk bot commented Jan 18, 2022

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8279654: jdk/incubator/vector/Vector256ConversionTests.java crashes randomly with SVE

Reviewed-by: njian, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 43 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@nsjian, @vnkozlov) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 18, 2022
@fg1417
Copy link
Author

fg1417 commented Jan 19, 2022

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jan 19, 2022
@openjdk
Copy link

openjdk bot commented Jan 19, 2022

@fg1417
Your change (at version e643021) is now ready to be sponsored by a Committer.

@nsjian
Copy link

nsjian commented Jan 19, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Jan 19, 2022

Going to push as commit af6c9ab.
Since your change was applied there have been 43 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 19, 2022
@openjdk openjdk bot closed this Jan 19, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jan 19, 2022
@openjdk
Copy link

openjdk bot commented Jan 19, 2022

@nsjian @fg1417 Pushed as commit af6c9ab.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.java.net integrated Pull request has been integrated
4 participants