8279654: jdk/incubator/vector/Vector256ConversionTests.java crashes randomly with SVE #98
Conversation
…andomly with SVE While testing some vector api cases on SVE system, we see a random failure: "assert(C->node_arena()->contains(s->_leaf) || !has_new_node (s->_leaf)) failed: duplicating node that's already been matched". The root cause is that the if-condition in pd_clone_node()[1] is not aligned with predicate conditions[2] of match rules in the backend. Not all StoreVector (VectorStoreMask src) node patterns can be matched and subsumed into one instruction in the backend. For example, there is no match rule to combine them when the element basic type is T_BYTE, as the cost of VectorStoreMask for byte type is relative low and there is no need to do narrow operations. Here is the analysis about root cause in detail. When a multi-use VectorStoreMask node, whose type is byte, showed as below, | VectorStoreMask (T_BYTE) / \ Call/Others StoreVector (T_BYTE) is identified by pd_clone_node() successfully, it will not be set shared in the stage of find_shared() if only the root selector visits the call node earlier than StoreVector node[3]. The preorder walk in xform()[4][5] visits the VectorStoreMask node and reduces it by ReduceInst() starting from the call node first, then visits the VectorStoreMask node again starting from the StoreVector node as mentioned before. During the second visit, in the stage of Label_Root(), the postorder walk along use-def edges starts from the StoreVector node and then visits the VectorStoreMask node. It takes the VectorStoreMask node as an interior of the subtree and generates the corresponding state tree[6]. But in the stage of ReduceInst_Interior(), there is no internal rule in the backend to reduce the VectorStoreMask node as the state tree guides. Thus, the reducer has to reduce the VectorStoreMask node independently and visit it by ReduceInst()[7]. Therefore, it tries to reduce the VectorStoreMask node twice by ReduceInst(), resulting in the assertion failure. Assuming that there was a match rule to combine these two byte nodes, StoreVector (VectorStoreMask src), it could find an internal rule in the backend to help skip the interior VectorStoreMask node, do nothing but recurse[8], and no assertion happens. If we delete the check for VectorStoreMask in pd_clone_node(), the VectorStoreMask node is going to be set shared in find_shared() and could be definitely reduced only once with no assertion failure[9][10]. There are two different methods to fix this issue. The first one is setting the condition in pd_clone_node() the same as matching rules, making sure that all instruction sequences declared in pd_clone_node() can be subsumed by matching rules. However, the condition code has to be revisited when matching rules change in the future. It's not easy to maintain. The other one is that we can remove the if-condition code for VectorStoreMask in pd_clone_node(). As a result, when a VectorStoreMask node has multiple use, it can't be matched by any chained rules. But after JDK-8273949, a multi-use VectorStoreMask node only exists for safepoint[11], which is not very common. Furthermore, even if a multi-use VectorStoreMask node occurs, the pattern is identified in the pd_clone_node() and a match rule is ready to subsume it in the backend, the combination may not happen due to some unexpected matching order issue. For example, in this case, | VectorStoreMask / \ StoreVector Call/Others if we visit the StoreVector node first then the call node, the VectorStoreMask node is still going to be set shared and the prevention from pd_clone_node() doesn't work based on the current logic of find_shared()[12]. Once the node is set shared, there is no code path to use internal combination rule to subsume the chained StoreVector-VectorStoreMask nodes into one machine instruction. Based on the above mentioned considerations, to make the code more maintainable, we choose the second one, which decouples pd_clone_node() from predicate conditions of matching rules in the backend with minimal impact on performance. I tested the patch using all vector cases under compiler/vectorization, compiler/vectorapi and jdk/incubator/vector on SVE for multiple times. It passed all these tests. [1] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64.ad#L2738 [2] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64_sve.ad#L2101 [3] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2151 [4] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1097 [5] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1119 [6] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1694 [7] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1977 [8] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1975 [9] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1692 [10] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1969 [11] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/vector.cpp#L262 [12] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2125 Change-Id: I0a1baba9c49e11813c65d28882b837f33cf478d9
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my bug. The fix looks good to me. Thanks!
Would be great if someone else could do a second review to get this in before RDP 2 is starting on Thursday. Maybe @nick-arm or @dean-long? Thanks! |
@nick-arm may not be available to review the code recently. Can @dean-long or @vnkozlov @iwanowww help to review this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing ran by Christian are clean
Approved.
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 43 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@nsjian, @vnkozlov) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/integrate |
/sponsor |
Going to push as commit af6c9ab.
Your commit was automatically rebased without conflicts. |
While testing some vector api cases on SVE system, we see a random failure:
"assert(C->node_arena()->contains(s->_leaf) || !has_new_node (s->_leaf)) failed: duplicating node that's already been matched".
The root cause is that the if-condition in pd_clone_node()[1] is not aligned with predicate conditions[2] of match rules in the backend. Not all StoreVector (VectorStoreMask src) node patterns can be matched and subsumed into one instruction in the backend. For example, there is no match rule to combine them when the element basic type is T_BYTE, as the cost of VectorStoreMask for byte type is relatively low and there is no need to do narrow operations.
Here is the analysis about root cause in detail. When a multi-use VectorStoreMask node, whose type is byte, showed as below,
is identified by pd_clone_node() successfully, it will not be set shared in the stage of find_shared() if only the root selector visits
the call node earlier than StoreVector node[3]. The preorder walk in xform()[4][5] visits the VectorStoreMask node and reduces it by ReduceInst() starting from the call node first, then visits the VectorStoreMask node again starting from the StoreVector node as mentioned before. During the second visit, in the stage of Label_Root(), the postorder walk along use-def edges starts from the StoreVector node and then visits the VectorStoreMask node. It takes the VectorStoreMask node as an interior of the subtree and generates the corresponding state tree[6]. But in the stage of ReduceInst_Interior(), there is no internal rule in the backend to reduce the VectorStoreMask node as the state tree guides. Thus, the reducer has to reduce the VectorStoreMask node independently and visit it by ReduceInst()[7]. Therefore, it tries to reduce the VectorStoreMask node twice by ReduceInst(), resulting in the assertion failure. Assuming that there was a match rule to combine these two byte nodes,
StoreVector (VectorStoreMask src), it could find an internal rule in the backend to help skip the interior VectorStoreMask node, do nothing but recurse[8], and no assertion happens. If we delete the check for VectorStoreMask in pd_clone_node(), the VectorStoreMask node is going to be set shared in find_shared() and could be definitely reduced only once with no assertion failure[9][10].
There are two different methods to fix this issue.
The first one is setting the condition in pd_clone_node() the same as matching rules, making sure that all instruction sequences declared in pd_clone_node() can be subsumed by matching rules. However, the condition code has to be revisited when matching rules change in the future. It's not easy to maintain.
The other one is that we can remove the if-condition code for VectorStoreMask in pd_clone_node(). As a result, when a VectorStoreMask node has multiple use, it can't be matched by any chained rules. But after JDK-8273949, a multi-use VectorStoreMask node only exists for safepoint[11], which is not very common. Furthermore, even if a multi-use VectorStoreMask node occurs, the pattern is identified in the pd_clone_node() and a match rule is ready to subsume it in the backend, the combination may not happen due to some unexpected matching order issue. For example, in this case,

if we visit the StoreVector node first then the call node, the VectorStoreMask node is still going to be set shared and the prevention from pd_clone_node() doesn't work based on the current logic of find_shared()[12]. Once the node is set shared, there is no code path to use internal combination rule to subsume the chained StoreVector-VectorStoreMask nodes into one machine instruction.
Based on the above mentioned considerations, to make the code more maintainable, we choose the second one, which decouples pd_clone_node() from predicate conditions of matching rules in the backend with minimal impact on performance.
I tested the patch using all vector cases under compiler/vectorization, compiler/vectorapi and jdk/incubator/vector on SVE for multiple times. All tests passed.
[1] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64.ad#L2738
[2] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/cpu/aarch64/aarch64_sve.ad#L2101
[3] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2151
[4] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1097
[5] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1119
[6] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1694
[7] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1977
[8] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1975
[9] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1692
[10] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L1969
[11] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/vector.cpp#L262
[12] https://github.com/openjdk/jdk/blob/8d1a1e83f40f7a147e033be6b2221c1bb1abd8ab/src/hotspot/share/opto/matcher.cpp#L2125
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk18 pull/98/head:pull/98
$ git checkout pull/98
Update a local copy of the PR:
$ git checkout pull/98
$ git pull https://git.openjdk.java.net/jdk18 pull/98/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 98
View PR using the GUI difftool:
$ git pr show -t 98
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk18/pull/98.diff