Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API #13862

Closed
wants to merge 22 commits into from

Conversation

DingliZhang
Copy link
Member

@DingliZhang DingliZhang commented May 8, 2023

Hi all,

We have added support for Extract, Compress, Expand and other nodes for Vector
API. It was implemented by referring to RVV v1.0 [1]. Please take a look and
have some reviews. Thanks a lot.

In this PR, we will support these new nodes:

CompressM/CompressV/ExpandV
LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked
Extract
VectorLongToMask/VectorMaskToLong
PopulateIndex
VectorLongToMask/VectorMaskToLong
VectorMaskTrueCount/VectorMaskFirstTrue
VectorInsert

At the same time, we refactored methods such as
match_rule_supported_vector_mask. All implemented vector nodes support mask
operations by default now, so we also added mask nodes for all implemented
nodes.

By the way, we will implement the VectorTest node in the next PR.

We can use the tests under test/jdk/jdk/incubator/vector to print the
compilation log for most of the new nodes. And we can use the following
command to print the compilation log of a jtreg test case:

$ jtreg \
-v:default \
-concurrency:16 -timeout:50 \
-javaoption:-XX:+UnlockExperimentalVMOptions \
-javaoption:-XX:+UseRVV \
-javaoption:-XX:+PrintOptoAssembly \
-javaoption:-XX:LogFile=log_name.log \
-jdk:build/linux-riscv64-server-fastdebug/jdk \
-compilejdk:build/linux-x86_64-server-release/images/jdk \
<test-case>

CompressM/CompressV/ExpandV

There is no inverse vdecompress provided in RVV, as this operation can be
readily synthesized using iota and a masked vrgather in ExpandV.

We can use test/jdk/jdk/incubator/vector/Float256VectorTests.java to emit
these nodes and the compilation log is as follows:

## CompressM
2aa     addi  R29, R10, #16	# ptr, #@addP_reg_imm
2ae     mcompress V0, V30	# KILL R30
2c2     vstoremask V2, V0
2ce     storeV [R7], V2	# vector (rvv)
2d6     bgeu  R29, R28, B47	#@cmpP_branch  P=0.000100 C=-1.000000

## CompressV
0ee     addi  R29, R10, #16	# ptr, #@addP_reg_imm
0f2     vcompress V1, V2, V0
0fe     storeV [R7], V1	# vector (rvv)
106     bgeu  R29, R28, B10	#@cmpP_branch  P=0.000100 C=-1.000000

## ExpandV
0ee     addi  R29, R10, #16	# ptr, #@addP_reg_imm
0f2     vexpand V3, V2, V0
102     storeV [R7], V3	# vector (rvv)
10a     bgeu  R29, R28, B10	#@cmpP_branch  P=0.000100 C=-1.000000

LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked

We use the vsoxei32_v instruction regardless of what sew is set to. The
indexMap in fromArray is an int array, so the index is always 32 bits. Because
index stores the index value, and vs2 of vsoxei32_v requires an offset, we need
to multiply the value corresponding to idx by the number of bytes of data width.

We can use test/jdk/jdk/incubator/vector/Float256VectorLoadStoreTests.java to
emit these nodes and the compilation log is as follows:

## LoadVectorGather
7ee     B56: #	out( B26 ) &lt;- in( B55 )  Freq: 338.569
7ee     spill [sp, #144] -&gt; R7	# spill size = 64
7f0     spill [sp, #192] -&gt; V3	# vector spill size = 256
7f8     gather_load V1, [R7], V3	# KILL V2
808     j  B26	#@branch

## StoreVectorScatter
290     loadV V1, [R7]	# vector (rvv)
298     addi  R7, R8, #16	# ptr, #@addP_reg_imm
29c     spill [sp, #32] -&gt; V3	# vector spill size = 256
2a4     scatter_store [R7], V3, V1	# KILL V2
2b4     # pop frame 208

## LoadVectorGatherMasked
41a     addi  R30, R10, #16	# ptr, #@addP_reg_imm
41e     spill [sp, #48] -&gt; V3	# vector spill size = 256
426     gather_load_masked V1, [R7], V3, V0	# KILL V2
43a     storeV [R28], V1	# vector (rvv)
442     bgeu  R30, R29, B46	#@cmpP_branch  P=0.000100 C=-1.000000

## StoreVectorScatterMasked
2ae     vloadmask V0, V1
2b6     spill [sp, #8] -&gt; R7	# spill size = 64
2b8     addi  R7, R7, #16	# ptr, #@addP_reg_imm
2ba     spill [sp, #48] -&gt; V3	# vector spill size = 256
2c2     scatter_store_masked [R7], V3, V2, V0	# KILL V1
2d2     # pop frame 224

Extract

Extract is used to return the element from a vector with the given index.

We can use test/jdk/jdk/incubator/vector/*MaxVectorTests.java to emit these
nodes and the compilation log is as follows:

## Extract
0fa     loadV V1, [R11]	# vector (rvv)
102     add R11, R19, R30	# ptr, #@addP_reg_reg
106     extract R15, V1, #0	# KILL V2
112     extract R12, V1, #1	# KILL V2
122     extract R13, V1, #2	# KILL V2
132     bgeu  R14, R7, B44	#@cmpU_branch  P=0.000001 C=-1.000000

## ExtractL
0fa     loadV V1, [R11]	# vector (rvv)
102     add R11, R19, R28	# ptr, #@addP_reg_reg
106     extractL R15, V1, #0	# KILL V2
112     extractL R13, V1, #1	# KILL V2
122     extractL R14, V1, #2	# KILL V2
132     bgeu  R7, R10, B44	#@cmpU_branch  P=0.000001 C=-1.000000

## ExtractF
0fa     loadV V1, [R12]	# vector (rvv)
102     add R12, R19, R28	# ptr, #@addP_reg_reg
106     extractF F0, V1, #0	# KILL V2
112     extractF F2, V1, #1	# KILL V2
122     extractF F1, V1, #2	# KILL V2
132     bgeu  R7, R11, B44	#@cmpU_branch  P=0.000001 C=-1.000000

## ExtractD
0fa     loadV V1, [R13]	# vector (rvv)
102     add R13, R19, R28	# ptr, #@addP_reg_reg
106     extractD F0, V1, #0	# KILL V2
112     extractD F1, V1, #1	# KILL V2
122     extractD F2, V1, #2	# KILL V2
132     bgeu  R7, R12, B44	#@cmpU_branch  P=0.000001 C=-1.000000

AndV/OrV/XorV masked

We can use Byte128VectorTests.java to emit these nodes and the compilation
log is as follows:

## AndV masked
1d0     B30: #	out( B57 B31 ) &lt;- in( B29 )  Freq: 75.1104
1d0     loadV V3, [R15]	# vector (rvv)
1d8     vloadmask V0, V1
1e0     vand_masked V2, V3, V0
1e8     spill [sp, #48] -&gt; R14	# spill size = 64
1ea     add R14, R14, R31	# ptr, #@addP_reg_reg
1ec     addi  R31, R14, #16	# ptr, #@addP_reg_imm
1f0     bgeu  R9, R29, B57	#@cmpU_branch  P=0.000001 C=-1.000000

## OrV masked
1d0     B30: #	out( B57 B31 ) &lt;- in( B29 )  Freq: 75.1104
1d0     loadV V3, [R15]	# vector (rvv)
1d8     vloadmask V0, V1
1e0     vor_masked V2, V3, V0
1e8     spill [sp, #48] -&gt; R14	# spill size = 64
1ea     add R14, R14, R31	# ptr, #@addP_reg_reg
1ec     addi  R31, R14, #16	# ptr, #@addP_reg_imm
1f0     bgeu  R9, R29, B57	#@cmpU_branch  P=0.000001 C=-1.000000

## XorV masked
1d0     B30: #	out( B57 B31 ) &lt;- in( B29 )  Freq: 75.1104
1d0     loadV V3, [R15]	# vector (rvv)
1d8     vloadmask V0, V1
1e0     vxor_masked V2, V3, V0
1e8     spill [sp, #48] -&gt; R14	# spill size = 64
1ea     add R14, R14, R31	# ptr, #@addP_reg_reg
1ec     addi  R31, R14, #16	# ptr, #@addP_reg_imm
1f0     bgeu  R9, R29, B57	#@cmpU_branch  P=0.000001 C=-1.000000

VectorLongToMask/VectorMaskToLong

We can use VectorMaskLoadStoreTest.java and Float256VectorTests.java to
emit these nodes and the compilation log is as follows:

## VectorLongToMask
05e     B3: #	out( B29 B4 ) &lt;- in( B22 B2 )  Freq: 1
05e     vmask_fromlong V0, R30
066     vstoremask V1, V0
072     addi  R7, R10, #16	# ptr, #@addP_reg_imm
076     storeV [R7], V1	# vector (rvv)

## VectorMaskToLong
064     addi  R7, R7, #16	# ptr, #@addP_reg_imm
066     loadV V1, [R7]	# vector (rvv)
06e     vloadmask V0, V1
076     vmask_tolong R7, V0
084     li R29, #8	# int, #@loadConI
086     bgeu  R12, R29, B5	#@cmpU_branch  P=0.000001 C=-1.000000

PopulateIndex

We need PopulateIndexNode to enable the vectorization of operations with loop
induction variable by extending current scope of C2 superword vectorizable
packs, just like JDK-8280510.

With this we can vectorize some operations in loop with the induction variable
operand, such as below.

  for (int i = 0; i < count; i++) {
    b[i] = a[i] * i;
  }

Final compilation log for above loop expression is like below.

loadV V1, [R17]	# vector (rvv)
add R15, R14, R15	# ptr, #@addP_reg_reg
addi  R17, R15, #16	# ptr, #@addP_reg_imm
addi  R16, R16, #48	# ptr, #@addP_reg_imm
addiw  R9, R30, #8	#@addI_reg_imm
addi  R15, R15, #48	# ptr, #@addP_reg_imm
populateindex V3, R30, R11	# KILL V2
vmul.vv V1, V3, V1	#@vmulI
storeV [R17], V1	# vector (rvv)

Hotspot jtreg has existing tests in compiler/c2/cr7192963/Test*Vect.java and
will be all passed.

VectorLongToMask/VectorMaskToLong

We can use VectorMaskLoadStoreTest.java and Float256VectorTests.java to
emit these nodes and the compilation log is as follows:

## VectorLongToMask
05e     B3: #	out( B29 B4 ) &lt;- in( B22 B2 )  Freq: 1
05e     vmask_fromlong V0, R30
066     vstoremask V1, V0
072     addi  R7, R10, #16	# ptr, #@addP_reg_imm
076     storeV [R7], V1	# vector (rvv)

## VectorMaskToLong
064     addi  R7, R7, #16	# ptr, #@addP_reg_imm
066     loadV V1, [R7]	# vector (rvv)
06e     vloadmask V0, V1
076     vmask_tolong R7, V0
084     li R29, #8	# int, #@loadConI
086     bgeu  R12, R29, B5	#@cmpU_branch  P=0.000001 C=-1.000000

VectorMaskTrueCount/VectorMaskFirstTrue

We can use Double128VectorTests.java to emit these nodes and the compilation
log is as follows:

## VectorMaskTrueCount
050     addi  R7, R7, #16	# ptr, #@addP_reg_imm
052     loadV V1, [R7]	# vector (rvv)
05a     vloadmask V0, V1
062     vmask_truecount R10, V0
06a     # pop frame 32

## VectorMaskFirstTrue
070     loadV V1, [R7]	# vector (rvv)
078     vmv.v.i  V2, #0	#@replicateL_imm5
080     spill V1 -&gt; V3	# vector spill size = 256
084     # reinterpret V3	# do nothing
084     vmaskcmp V0, V3, V2, #4
090     vmask_firsttrue R8, V0	# KILL V30
09c     li R28, #2	# int, #@loadConI
09e     bge  R8, R28, B42	#@cmpI_branch  P=0.000000 C=5952.000000

VectorInsert

We can use test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java to
emit lt32 node and the compilation log is as follows:

05e     B4: #	out( B13 B5 ) &lt;- in( B3 )  Freq: 0.999997
05e     loadV V1, [R30]	# vector (rvv)
066     li R28, #0	# int, #@loadConI
068     lwu  R29, [R7, #120]	# loadN, compressed ptr, #@loadN ! Field: compiler/vectorapi/TestVectorInsertByte.rb
06c     decode_heap_oop  R29, R29	#@decodeHeapOop
06e     insertI_index_lt32 V1, V1, R28, #0
082     lwu  R7, [R29, #12]	# range, #@loadRange
086     NullCheck R29

In order to cover the case where idx is greater than 31, we need to modify
TestVectorInsertByte.java

diff --git a/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java b/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java
index 7969b7bea40..480d6bec074 100644
--- a/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java
+++ b/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java
@@ -51,7 +51,7 @@ public class TestVectorInsertByte {
 
     static void testByteVectorInsert() {
         ByteVector av = ByteVector.fromArray(SPECIESb, ab, 0);
-        av = av.withLane(0, (byte) (0));
+        av = av.withLane(32, (byte) (0));
         av.intoArray(rb, 0);
     }

Then the compilation log is as follows:

060     B4: #	out( B13 B5 ) &lt;- in( B3 )  Freq: 0.999997
060     loadV V1, [R30]	# vector (rvv)
068     li R28, #0	# int, #@loadConI
06a     li R30, #32	# int, #@loadConI
06e     lwu  R7, [R7, #120]	# loadN, compressed ptr, #@loadN ! Field: compiler/vectorapi/TestVectorInsertByte.rb
072     decode_heap_oop  R7, R7	#@decodeHeapOop
074     insertI_index V1, V1, R28, R30	# KILL V2
088     lwu  R28, [R7, #12]	# range, #@loadRange
08c     NullCheck R7

MaskAll masked

SVE can use the case shuffleTest() in Int64VectorTests.java to emit
vmaskAllI_masked, and the function vector_needs_partial_operations will
judge and emit masked vmaskAllI node. RISC-V uses vsetvl to set vector element
length, so we do not need partial operations. But we can use
vector_needs_partial_operations to cover vmaskAllI_masked this point.
Apply patch:

diff --git a/src/hotspot/cpu/riscv/riscv.ad b/src/hotspot/cpu/riscv/riscv.ad
index 6c5ceb9c359..b4ef13768fc 100644
--- a/src/hotspot/cpu/riscv/riscv.ad
+++ b/src/hotspot/cpu/riscv/riscv.ad
@@ -1968,7 +1968,19 @@ const bool Matcher::match_rule_supported_vector_masked(int opcode, int vlen, Bas
 }
 
 const bool Matcher::vector_needs_partial_operations(Node* node, const TypeVect* vt) {
-  return false;
+  if (UseRVV == 0) {
+      return false;
+    }
+  switch(node->Opcode()) {
+    case Op_MaskAll:
+        return !node->in(1)->is_Con();
+    default:
+      return false;
+  }
 }
 
 const bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) {

Then the compilation log is as follows:

0c8     B7: #	out( B13 B8 ) &lt;- in( B12 B6 )  Freq: 0.999999
0c8     addi  R7, R30, #16	# ptr, #@addP_reg_imm
0cc     vmask_gen_imm V0, #2
0d4     vmaskAllI_masked V30, R31, V0	# KILL V1
0e4     spill V30 -&gt; V0	# vmask spill size = 32
0e8     vstoremask V1, V0 # elem size is #4 byte[s]
0f4     storeV [R7], V1	# vector (rvv)

[1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc

Testing:

qemu with UseRVV:

  • Tier1 tests (release)
  • Tier2 tests (release)
  • Tier3 tests (release)
  • test/jdk/jdk/incubator/vector (fastdebug)
  • test/hotspot/jtreg/compiler/c2/cr7192963/Test*Vect.java

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API

Reviewers

Contributors

  • zifeihan <caogui@iscas.ac.cn>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862
$ git checkout pull/13862

Update a local copy of the PR:
$ git checkout pull/13862
$ git pull https://git.openjdk.org/jdk.git pull/13862/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13862

View PR using the GUI difftool:
$ git pr show -t 13862

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13862.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 8, 2023

👋 Welcome back dzhang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@DingliZhang DingliZhang marked this pull request as draft May 8, 2023 11:07
@openjdk openjdk bot added the rfr Pull request is ready for review label May 8, 2023
@openjdk
Copy link

openjdk bot commented May 8, 2023

@DingliZhang The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org and removed rfr Pull request is ready for review labels May 8, 2023
@DingliZhang
Copy link
Member Author

/contributor add zifeihan caogui@iscas.ac.cn

@openjdk
Copy link

openjdk bot commented May 11, 2023

@DingliZhang
Contributor zifeihan <caogui@iscas.ac.cn> successfully added.

@DingliZhang DingliZhang marked this pull request as ready for review May 12, 2023 11:11
@openjdk openjdk bot added the rfr Pull request is ready for review label May 12, 2023
@mlbridge
Copy link

mlbridge bot commented May 12, 2023

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments from a cursory look.

src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
Copy link
Member

@feilongjiang feilongjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, with some comments.

src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
@openjdk openjdk bot removed the rfr Pull request is ready for review label May 16, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label May 16, 2023
Copy link
Member

@feilongjiang feilongjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Would you mind a few more tweaks?

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra nit-picking suggestions. Otherwise, looks good. Thanks.

src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv_v.ad Outdated Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented May 19, 2023

@DingliZhang This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API

Co-authored-by: zifeihan <caogui@iscas.ac.cn>
Reviewed-by: fyang, fjiang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@RealFYang) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 19, 2023
@DingliZhang
Copy link
Member Author

@feilongjiang @RealFYang Thanks for the review!
/integrate

@RealFYang
Copy link
Member

/sponsor

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 19, 2023
@openjdk
Copy link

openjdk bot commented May 19, 2023

@DingliZhang
Your change (at version 75f8437) is now ready to be sponsored by a Committer.

@openjdk
Copy link

openjdk bot commented May 19, 2023

Going to push as commit 97ade57.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 19, 2023
@openjdk openjdk bot closed this May 19, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 19, 2023
@openjdk
Copy link

openjdk bot commented May 19, 2023

@RealFYang @DingliZhang Pushed as commit 97ade57.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants