Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Move V0 to the end of register allocation order #82967

Merged
merged 2 commits into from
Mar 1, 2024

Conversation

wangpc-pp
Copy link
Contributor

According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

The v0 register defined by the RISC-V vector extension is special in
that it can be used both as a general purpose vector register and also
as a mask register. As a preference, use registers other than v0 for
non-mask values. Otherwise data will have to be moved out of v0 when a
mask is required in an operation. v0 may be used when all other
registers are in use, and using v0 would avoid spilling register state
to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 26, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

Changes

According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

> The v0 register defined by the RISC-V vector extension is special in
> that it can be used both as a general purpose vector register and also
> as a mask register. As a preference, use registers other than v0 for
> non-mask values. Otherwise data will have to be moved out of v0 when a
> mask is required in an operation. v0 may be used when all other
> registers are in use, and using v0 would avoid spilling register state
> to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.


Patch is 28.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82967.diff

10 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/pr69586.ll (+85-85)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwmacc-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmacc-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmsac-vp.ll (+2-2)
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 381e0082c49b0b..bcf4f303ac4ca9 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -510,17 +510,17 @@ defvar VM8VTs = [vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_t,
 
 def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
               (add (sequence "V%u", 8, 31),
-                   (sequence "V%u", 0, 7)), 1>;
+                   (sequence "V%u", 1, 7), V0), 1>;
 
 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
 
 def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
-                             (sequence "V%uM2", 0, 7, 2)), 2>;
+                             (sequence "V%uM2", 2, 7, 2), V0M2), 2>;
 
 def VRM2NoV0 : VReg<VM2VTs, (sub VRM2, V0M2), 2>;
 
 def VRM4 : VReg<VM4VTs, (add V8M4, V12M4, V16M4, V20M4,
-                             V24M4, V28M4, V0M4, V4M4), 4>;
+                             V24M4, V28M4, V4M4, V0M4), 4>;
 
 def VRM4NoV0 : VReg<VM4VTs, (sub VRM4, V0M4), 4>;
 
diff --git a/llvm/test/CodeGen/RISCV/pr69586.ll b/llvm/test/CodeGen/RISCV/pr69586.ll
index 2d5fce2ca4970e..07ba2a5b3b1378 100644
--- a/llvm/test/CodeGen/RISCV/pr69586.ll
+++ b/llvm/test/CodeGen/RISCV/pr69586.ll
@@ -146,19 +146,19 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    slli a6, a3, 10
 ; NOREMAT-NEXT:    sd a6, 176(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a6, a0, a6
-; NOREMAT-NEXT:    vle32.v v0, (a6)
+; NOREMAT-NEXT:    vle32.v v2, (a6)
 ; NOREMAT-NEXT:    vle32.v v20, (a6)
 ; NOREMAT-NEXT:    li a6, 19
 ; NOREMAT-NEXT:    slli a6, a6, 9
 ; NOREMAT-NEXT:    sd a6, 168(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    li a7, 19
 ; NOREMAT-NEXT:    add a6, a0, a6
-; NOREMAT-NEXT:    vle32.v v2, (a6)
+; NOREMAT-NEXT:    vle32.v v4, (a6)
 ; NOREMAT-NEXT:    vle32.v v22, (a6)
 ; NOREMAT-NEXT:    slli a5, a5, 11
 ; NOREMAT-NEXT:    sd a5, 160(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v4, (a5)
+; NOREMAT-NEXT:    vle32.v v6, (a5)
 ; NOREMAT-NEXT:    vle32.v v12, (a5)
 ; NOREMAT-NEXT:    li s10, 21
 ; NOREMAT-NEXT:    slli a5, s10, 9
@@ -184,25 +184,25 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a5, a0, a5
 ; NOREMAT-NEXT:    vle32.v v30, (a5)
 ; NOREMAT-NEXT:    vle32.v v10, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
 ; NOREMAT-NEXT:    li s3, 25
 ; NOREMAT-NEXT:    slli a5, s3, 9
 ; NOREMAT-NEXT:    sd a5, 128(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v0, (a5)
+; NOREMAT-NEXT:    vle32.v v2, (a5)
 ; NOREMAT-NEXT:    vle32.v v18, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
 ; NOREMAT-NEXT:    slli a5, s2, 10
 ; NOREMAT-NEXT:    sd a5, 120(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v2, (a5)
+; NOREMAT-NEXT:    vle32.v v4, (a5)
 ; NOREMAT-NEXT:    vle32.v v20, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v6
 ; NOREMAT-NEXT:    li t5, 27
 ; NOREMAT-NEXT:    slli a5, t5, 9
 ; NOREMAT-NEXT:    sd a5, 112(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v4, (a5)
+; NOREMAT-NEXT:    vle32.v v6, (a5)
 ; NOREMAT-NEXT:    vle32.v v22, (a5)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v24
 ; NOREMAT-NEXT:    slli a4, a4, 11
@@ -235,33 +235,33 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a4, a0, a4
 ; NOREMAT-NEXT:    vle32.v v30, (a4)
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v10, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
 ; NOREMAT-NEXT:    lui a6, 4
 ; NOREMAT-NEXT:    add a4, a0, a6
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    vle32.v v8, (a4)
 ; NOREMAT-NEXT:    csrr a4, vlenb
 ; NOREMAT-NEXT:    slli a4, a4, 1
 ; NOREMAT-NEXT:    add a4, sp, a4
 ; NOREMAT-NEXT:    addi a4, a4, 288
 ; NOREMAT-NEXT:    vs2r.v v8, (a4) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
 ; NOREMAT-NEXT:    addiw a4, a6, 512
 ; NOREMAT-NEXT:    sd a4, 72(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v2, (a4)
+; NOREMAT-NEXT:    vle32.v v4, (a4)
 ; NOREMAT-NEXT:    vle32.v v18, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
 ; NOREMAT-NEXT:    slli a4, t0, 10
 ; NOREMAT-NEXT:    sd a4, 64(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v4, (a4)
+; NOREMAT-NEXT:    vle32.v v6, (a4)
 ; NOREMAT-NEXT:    vle32.v v20, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v24
 ; NOREMAT-NEXT:    addiw a4, a6, 1536
 ; NOREMAT-NEXT:    sd a4, 56(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    vle32.v v0, (a4)
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v26
 ; NOREMAT-NEXT:    slli a3, a3, 11
@@ -289,18 +289,18 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, a0, a3
 ; NOREMAT-NEXT:    vle32.v v10, (a3)
 ; NOREMAT-NEXT:    vle32.v v14, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
 ; NOREMAT-NEXT:    addiw a3, s1, -512
 ; NOREMAT-NEXT:    sd a3, 24(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a3, a0, a3
-; NOREMAT-NEXT:    vle32.v v0, (a3)
+; NOREMAT-NEXT:    vle32.v v2, (a3)
 ; NOREMAT-NEXT:    vle32.v v16, (a3)
 ; NOREMAT-NEXT:    csrr a3, vlenb
 ; NOREMAT-NEXT:    slli a3, a3, 1
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vl2r.v v26, (a3) # Unknown-size Folded Reload
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v4
 ; NOREMAT-NEXT:    add a3, a0, s1
 ; NOREMAT-NEXT:    vle32.v v26, (a3)
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
@@ -309,27 +309,27 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v28, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; NOREMAT-NEXT:    addiw ra, s1, 512
 ; NOREMAT-NEXT:    add a3, a0, ra
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
 ; NOREMAT-NEXT:    vle32.v v30, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v0
 ; NOREMAT-NEXT:    slli s11, s10, 10
 ; NOREMAT-NEXT:    add a3, a0, s11
-; NOREMAT-NEXT:    vle32.v v2, (a3)
+; NOREMAT-NEXT:    vle32.v v4, (a3)
 ; NOREMAT-NEXT:    vle32.v v18, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v12
 ; NOREMAT-NEXT:    addiw s10, s1, 1536
 ; NOREMAT-NEXT:    add a3, a0, s10
-; NOREMAT-NEXT:    vle32.v v4, (a3)
+; NOREMAT-NEXT:    vle32.v v6, (a3)
 ; NOREMAT-NEXT:    vle32.v v20, (a3)
 ; NOREMAT-NEXT:    addi a3, sp, 288
 ; NOREMAT-NEXT:    vl2r.v v12, (a3) # Unknown-size Folded Reload
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v8
 ; NOREMAT-NEXT:    slli s9, s8, 11
 ; NOREMAT-NEXT:    add a3, a0, s9
-; NOREMAT-NEXT:    vle32.v v6, (a3)
+; NOREMAT-NEXT:    vle32.v v0, (a3)
 ; NOREMAT-NEXT:    vle32.v v12, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v24, v10
 ; NOREMAT-NEXT:    lui t0, 6
@@ -337,7 +337,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, a0, s8
 ; NOREMAT-NEXT:    vle32.v v8, (a3)
 ; NOREMAT-NEXT:    vle32.v v22, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli s7, s6, 10
 ; NOREMAT-NEXT:    add a3, a0, s7
 ; NOREMAT-NEXT:    vle32.v v10, (a3)
@@ -345,7 +345,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v26
 ; NOREMAT-NEXT:    addiw s6, t0, -512
 ; NOREMAT-NEXT:    add a3, a0, s6
-; NOREMAT-NEXT:    vle32.v v0, (a3)
+; NOREMAT-NEXT:    vle32.v v2, (a3)
 ; NOREMAT-NEXT:    vle32.v v16, (a3)
 ; NOREMAT-NEXT:    csrr a3, vlenb
 ; NOREMAT-NEXT:    slli a3, a3, 2
@@ -361,7 +361,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v26, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v4
 ; NOREMAT-NEXT:    addiw s5, t0, 512
 ; NOREMAT-NEXT:    add a3, a0, s5
 ; NOREMAT-NEXT:    vle32.v v26, (a3)
@@ -371,12 +371,12 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v28, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; NOREMAT-NEXT:    slli s4, s3, 10
 ; NOREMAT-NEXT:    add a3, a0, s4
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
 ; NOREMAT-NEXT:    vle32.v v18, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v0
 ; NOREMAT-NEXT:    addiw s3, t0, 1536
 ; NOREMAT-NEXT:    add a3, a0, s3
 ; NOREMAT-NEXT:    vle32.v v30, (a3)
@@ -384,23 +384,23 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v8
 ; NOREMAT-NEXT:    slli s2, s2, 11
 ; NOREMAT-NEXT:    add a3, a0, s2
-; NOREMAT-NEXT:    vle32.v v2, (a3)
+; NOREMAT-NEXT:    vle32.v v4, (a3)
 ; NOREMAT-NEXT:    vle32.v v12, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v10
 ; NOREMAT-NEXT:    lui a3, 7
 ; NOREMAT-NEXT:    addiw s0, a3, -1536
 ; NOREMAT-NEXT:    add a4, a0, s0
-; NOREMAT-NEXT:    vle32.v v4, (a4)
+; NOREMAT-NEXT:    vle32.v v6, (a4)
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli t6, t5, 10
 ; NOREMAT-NEXT:    add a4, a0, t6
-; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    vle32.v v0, (a4)
 ; NOREMAT-NEXT:    vle32.v v14, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v24
 ; NOREMAT-NEXT:    addiw t5, a3, -512
 ; NOREMAT-NEXT:    add a4, a0, t5
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
 ; NOREMAT-NEXT:    csrr a4, vlenb
 ; NOREMAT-NEXT:    slli a4, a4, 2
@@ -426,42 +426,42 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a4, a0, t3
 ; NOREMAT-NEXT:    vle32.v v18, (a4)
 ; NOREMAT-NEXT:    vle32.v v28, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
 ; NOREMAT-NEXT:    addiw t2, a3, 1536
 ; NOREMAT-NEXT:    add a4, a0, t2
 ; NOREMAT-NEXT:    vle32.v v20, (a4)
 ; NOREMAT-NEXT:    vle32.v v30, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v6
 ; NOREMAT-NEXT:    slli t1, a2, 11
 ; NOREMAT-NEXT:    add a2, a0, t1
 ; NOREMAT-NEXT:    vle32.v v12, (a2)
-; NOREMAT-NEXT:    vle32.v v2, (a2)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v6
+; NOREMAT-NEXT:    vle32.v v4, (a2)
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v0
 ; NOREMAT-NEXT:    lui a2, 8
 ; NOREMAT-NEXT:    addiw a7, a2, -1536
 ; NOREMAT-NEXT:    add a4, a0, a7
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
-; NOREMAT-NEXT:    vle32.v v4, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli a6, a5, 10
 ; NOREMAT-NEXT:    add a4, a0, a6
 ; NOREMAT-NEXT:    vle32.v v14, (a4)
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v26
 ; NOREMAT-NEXT:    addiw a5, a2, -512
 ; NOREMAT-NEXT:    add a4, a0, a5
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
 ; NOREMAT-NEXT:    vle32.v v26, (a4)
 ; NOREMAT-NEXT:    add a0, a0, a2
-; NOREMAT-NEXT:    vle32.v v6, (a0)
+; NOREMAT-NEXT:    vle32.v v0, (a0)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v8, v10
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v24, v18
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v28, v20
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v12
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v2, v22
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v4, v14
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v0, v16
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v4, v22
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v6, v14
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v2, v16
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v0
 ; NOREMAT-NEXT:    sf.vc.v.i 2, 0, v8, 0
 ; NOREMAT-NEXT:    addi a0, a1, 1024
 ; NOREMAT-NEXT:    vse32.v v8, (a0)
@@ -952,17 +952,17 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    li a2, 13
 ; REMAT-NEXT:    slli a2, a2, 10
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    li a2, 27
 ; REMAT-NEXT:    slli a2, a2, 9
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v20, (a2)
 ; REMAT-NEXT:    li a2, 7
 ; REMAT-NEXT:    slli a2, a2, 11
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v22, (a2)
 ; REMAT-NEXT:    li a2, 29
 ; REMAT-NEXT:    slli a2, a2, 9
@@ -986,23 +986,23 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, a2
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
 ; REMAT-NEXT:    lui a2, 4
 ; REMAT-NEXT:    addiw a2, a2, 512
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
 ; REMAT-NEXT:    li a2, 17
 ; REMAT-NEXT:    slli a2, a2, 10
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
 ; REMAT-NEXT:    lui a2, 4
 ; REMAT-NEXT:    addiw a2, a2, 1536
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v20, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v22, v24
 ; REMAT-NEXT:    li a2, 9
@@ -1028,22 +1028,22 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, ra
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; REMAT-NEXT:    lui s11, 5
 ; REMAT-NEXT:    add a2, a0, s11
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v4
 ; REMAT-NEXT:    lui s10, 5
 ; REMAT-NEXT:    addiw s10, s10, 512
 ; REMAT-NEXT:    add a2, a0, s10
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; REMAT-NEXT:    li s9, 21
 ; REMAT-NEXT:    slli s9, s9, 10
 ; REMAT-NEXT:    add a2, a0, s9
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v24
 ; REMAT-NEXT:    lui s8, 5
@@ -1069,28 +1069,28 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, s5
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v2
 ; REMAT-NEXT:    lui s4, 6
 ; REMAT-NEXT:    addiw s4, s4, -512
 ; REMAT-NEXT:    add a2, a0, s4
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v4
 ; REMAT-NEXT:    lui s3, 6
 ; REMAT-NEXT:    add a2, a0, s3
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v6
 ; REMAT-NEXT:    lui s2, 6
 ; REMAT-NEXT:    addiw s2, s2, 512
 ; REMAT-NEXT:    add a2, a0, s2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v24
 ; REMAT-NEXT:    li s1, 25
 ; REMAT-NEXT:    slli s1, s1, 10
 ; REMAT-NEXT:    add a2, a0, s1
-; REMAT-NEXT:    vle32.v v6, (a2)
+; REMAT-NEXT:    vle32.v v0, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v26
 ; REMAT-NEXT:    lui s0, 6
@@ -1110,24 +1110,24 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, t5
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v24, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
 ; REMAT-NEXT:    li t4, 27
 ; REMAT-NEXT:    slli t4, t4, 10
 ; REMAT-NEXT:    add a2, a0, t4
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
 ; REMAT-NEXT:    lui t3, 7
 ; REMAT-NEXT:    addiw t3, t3, -512
 ; REMAT-NEXT:    add a2, a0, t3
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v6
 ; REMAT-NEXT:    lui t2, 7
 ; REMAT-NEXT:    add a2, a0, t2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v8, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v6
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
 ; REMAT-NEXT:    lui t1, 7
 ; REMAT-NEXT:    addiw t1, t1, 512
 ; REMAT-NEXT:    add a2, a0, t1
@@ -1151,35 +1151,35 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, a6
 ; REMAT-NEXT:    vle32.v v22, (a2)
 ; REMAT-NEXT:    vle32.v v30, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v24, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v24, v2
 ; REMAT-NEXT:    lui a5, 8
 ; REMAT-NEXT:    addiw a5, a5, -1536
 ; REMAT-NEXT:    add a2, a0, a5
 ; REMAT-NEXT:    vle32.v v24, (a2)
-; REMAT-NEXT:    vle32.v v0, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
+; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v4
 ; REMAT-NEXT:    li a4, 31
 ; REMAT-NEXT:    slli a4, a4, 10
 ; REMAT-NEXT:    add a2, a0, a4
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    vle32.v v2, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
+; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v6
 ; REMAT-NEXT:    lui a3, 8
 ; REMAT-NEXT:    addiw a3, a3, -512
 ; REMAT-NEXT:    add a2, a0, a3
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    lui a2, 8
 ; REMAT-NEXT:    add a0, a0, a2
-; REMAT-NEXT:    vle32.v v6, (a0)
+; REMAT-NEXT:    vle32.v v0, (a0)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v8, v14
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v18
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v26, v20
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v28, v22
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v30, v24
-; REMAT-NEXT:    sf.vc.vv 3, 0, v0, v10
-; REMAT-NEXT:    sf.vc.vv 3, 0, v2, v12
-; REMAT-NEXT:    sf.vc.vv 3, 0, v4, v6
+; REMAT-NEXT:    sf.vc.vv 3, 0, v2, v10
+; REMAT-NEXT:    sf.vc.vv 3, 0, v4, v12
+; REMAT-NEXT:    sf.vc.vv 3, 0, v6, v0
 ; REMAT-NEXT:    sf.vc.v.i 2, 0, v8, 0
 ; REMAT-NEXT:    addi a0, a1, 1024
 ; REMAT-NEXT:    vse32.v v8, (a0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
index be0c68f443af59..c6c837ef8f5014 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
@@ -148,13 +148,13 @@ define <512 x i8> @two_source(<512 x i8> %a, <512 x i8> %b) {
 ; CHECK-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.x v24, a1
 ; CHECK-NEXT:    vsetivli zero, 8, e64, m1, ta, ma
-; CHECK-NEXT:    vmv.v.i v0, 0
+; CHECK-NEXT:    vmv.v.i v1, 0
 ; CHECK-NEXT:    lui a1, 1047552
 ; CHECK-NEXT:    addiw a1, a1, 1
 ; CHECK-NEXT:    slli a1, a1, 23
 ; CHECK-NEXT:    addi a1, a1, 1
 ; CHECK-NEXT:    slli a1, a1, 18
-; CHECK-NEXT:    vslide1down.vx v0, v0, a1
+; CHECK-NEXT:    vslide1down.vx v0, v1, a1
 ; CHECK-NEXT:    lui a1, 4
 ;...
[truncated]

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

As a side note this reminds me of #69290, but I'm not sure where it stands with the vector calling convention now

@wangpc-pp
Copy link
Contributor Author

@4vtomat Would this change impact Vector Calling Convention?

@kito-cheng
Copy link
Member

The allocation order should not matter to calling convention, compiler should move value to/from the right register between the ABI boundary.

@4vtomat
Copy link
Member

4vtomat commented Feb 26, 2024

@4vtomat Would this change impact Vector Calling Convention?

It doesn't matter for now since vector calling convention uses the order of these arrays~

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see this patch, I'd thought we already did this.

Good idea, one suggestions inline.

llvm/lib/Target/RISCV/RISCVRegisterInfo.td Outdated Show resolved Hide resolved
According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

> The v0 register defined by the RISC-V vector extension is special in
> that it can be used both as a general purpose vector register and also
> as a mask register. As a preference, use registers other than v0 for
> non-mask values. Otherwise data will have to be moved out of v0 when a
> mask is required in an operation. v0 may be used when all other
> registers are in use, and using v0 would avoid spilling register state
> to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try not to use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.
; RV64-NEXT: vmv.v.v v4, v8
; RV64-NEXT: vrgather.vi v4, v16, 5
; RV64-NEXT: vmv1r.v v0, v28
; RV64-NEXT: vrgather.vi v4, v24, 3, v0.t
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows that we can reduce some spillings/reloads. :-)

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/comment addressed, but please wait for confirmation from @topperc

llvm/lib/Target/RISCV/RISCVRegisterInfo.td Outdated Show resolved Hide resolved
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangpc-pp wangpc-pp merged commit 2023a23 into llvm:main Mar 1, 2024
3 of 4 checks passed
@wangpc-pp wangpc-pp deleted the main-rvv-avoid-use-v0 branch March 1, 2024 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants