[RISCV] Move V0 to the end of register allocation order #82967

wangpc-pp · 2024-02-26T08:21:55Z

According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

The v0 register defined by the RISC-V vector extension is special in
that it can be used both as a general purpose vector register and also
as a mask register. As a preference, use registers other than v0 for
non-mask values. Otherwise data will have to be moved out of v0 when a
mask is required in an operation. v0 may be used when all other
registers are in use, and using v0 would avoid spilling register state
to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.

llvmbot · 2024-02-26T08:22:34Z

@llvm/pr-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

Changes

According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

> The v0 register defined by the RISC-V vector extension is special in
> that it can be used both as a general purpose vector register and also
> as a mask register. As a preference, use registers other than v0 for
> non-mask values. Otherwise data will have to be moved out of v0 when a
> mask is required in an operation. v0 may be used when all other
> registers are in use, and using v0 would avoid spilling register state
> to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.

Patch is 28.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82967.diff

10 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+3-3)
(modified) llvm/test/CodeGen/RISCV/pr69586.ll (+85-85)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwmacc-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwnmacc-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwnmsac-vp.ll (+2-2)

diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 381e0082c49b0b..bcf4f303ac4ca9 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -510,17 +510,17 @@ defvar VM8VTs = [vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_t,
 
 def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
               (add (sequence "V%u", 8, 31),
-                   (sequence "V%u", 0, 7)), 1>;
+                   (sequence "V%u", 1, 7), V0), 1>;
 
 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
 
 def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
-                             (sequence "V%uM2", 0, 7, 2)), 2>;
+                             (sequence "V%uM2", 2, 7, 2), V0M2), 2>;
 
 def VRM2NoV0 : VReg<VM2VTs, (sub VRM2, V0M2), 2>;
 
 def VRM4 : VReg<VM4VTs, (add V8M4, V12M4, V16M4, V20M4,
-                             V24M4, V28M4, V0M4, V4M4), 4>;
+                             V24M4, V28M4, V4M4, V0M4), 4>;
 
 def VRM4NoV0 : VReg<VM4VTs, (sub VRM4, V0M4), 4>;
 
diff --git a/llvm/test/CodeGen/RISCV/pr69586.ll b/llvm/test/CodeGen/RISCV/pr69586.ll
index 2d5fce2ca4970e..07ba2a5b3b1378 100644
--- a/llvm/test/CodeGen/RISCV/pr69586.ll
+++ b/llvm/test/CodeGen/RISCV/pr69586.ll
@@ -146,19 +146,19 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    slli a6, a3, 10
 ; NOREMAT-NEXT:    sd a6, 176(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a6, a0, a6
-; NOREMAT-NEXT:    vle32.v v0, (a6)
+; NOREMAT-NEXT:    vle32.v v2, (a6)
 ; NOREMAT-NEXT:    vle32.v v20, (a6)
 ; NOREMAT-NEXT:    li a6, 19
 ; NOREMAT-NEXT:    slli a6, a6, 9
 ; NOREMAT-NEXT:    sd a6, 168(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    li a7, 19
 ; NOREMAT-NEXT:    add a6, a0, a6
-; NOREMAT-NEXT:    vle32.v v2, (a6)
+; NOREMAT-NEXT:    vle32.v v4, (a6)
 ; NOREMAT-NEXT:    vle32.v v22, (a6)
 ; NOREMAT-NEXT:    slli a5, a5, 11
 ; NOREMAT-NEXT:    sd a5, 160(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v4, (a5)
+; NOREMAT-NEXT:    vle32.v v6, (a5)
 ; NOREMAT-NEXT:    vle32.v v12, (a5)
 ; NOREMAT-NEXT:    li s10, 21
 ; NOREMAT-NEXT:    slli a5, s10, 9
@@ -184,25 +184,25 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a5, a0, a5
 ; NOREMAT-NEXT:    vle32.v v30, (a5)
 ; NOREMAT-NEXT:    vle32.v v10, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
 ; NOREMAT-NEXT:    li s3, 25
 ; NOREMAT-NEXT:    slli a5, s3, 9
 ; NOREMAT-NEXT:    sd a5, 128(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v0, (a5)
+; NOREMAT-NEXT:    vle32.v v2, (a5)
 ; NOREMAT-NEXT:    vle32.v v18, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
 ; NOREMAT-NEXT:    slli a5, s2, 10
 ; NOREMAT-NEXT:    sd a5, 120(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v2, (a5)
+; NOREMAT-NEXT:    vle32.v v4, (a5)
 ; NOREMAT-NEXT:    vle32.v v20, (a5)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v6
 ; NOREMAT-NEXT:    li t5, 27
 ; NOREMAT-NEXT:    slli a5, t5, 9
 ; NOREMAT-NEXT:    sd a5, 112(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a5, a0, a5
-; NOREMAT-NEXT:    vle32.v v4, (a5)
+; NOREMAT-NEXT:    vle32.v v6, (a5)
 ; NOREMAT-NEXT:    vle32.v v22, (a5)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v24
 ; NOREMAT-NEXT:    slli a4, a4, 11
@@ -235,33 +235,33 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a4, a0, a4
 ; NOREMAT-NEXT:    vle32.v v30, (a4)
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v10, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
 ; NOREMAT-NEXT:    lui a6, 4
 ; NOREMAT-NEXT:    add a4, a0, a6
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    vle32.v v8, (a4)
 ; NOREMAT-NEXT:    csrr a4, vlenb
 ; NOREMAT-NEXT:    slli a4, a4, 1
 ; NOREMAT-NEXT:    add a4, sp, a4
 ; NOREMAT-NEXT:    addi a4, a4, 288
 ; NOREMAT-NEXT:    vs2r.v v8, (a4) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
 ; NOREMAT-NEXT:    addiw a4, a6, 512
 ; NOREMAT-NEXT:    sd a4, 72(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v2, (a4)
+; NOREMAT-NEXT:    vle32.v v4, (a4)
 ; NOREMAT-NEXT:    vle32.v v18, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
 ; NOREMAT-NEXT:    slli a4, t0, 10
 ; NOREMAT-NEXT:    sd a4, 64(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v4, (a4)
+; NOREMAT-NEXT:    vle32.v v6, (a4)
 ; NOREMAT-NEXT:    vle32.v v20, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v24
 ; NOREMAT-NEXT:    addiw a4, a6, 1536
 ; NOREMAT-NEXT:    sd a4, 56(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a4, a0, a4
-; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    vle32.v v0, (a4)
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v26
 ; NOREMAT-NEXT:    slli a3, a3, 11
@@ -289,18 +289,18 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, a0, a3
 ; NOREMAT-NEXT:    vle32.v v10, (a3)
 ; NOREMAT-NEXT:    vle32.v v14, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
 ; NOREMAT-NEXT:    addiw a3, s1, -512
 ; NOREMAT-NEXT:    sd a3, 24(sp) # 8-byte Folded Spill
 ; NOREMAT-NEXT:    add a3, a0, a3
-; NOREMAT-NEXT:    vle32.v v0, (a3)
+; NOREMAT-NEXT:    vle32.v v2, (a3)
 ; NOREMAT-NEXT:    vle32.v v16, (a3)
 ; NOREMAT-NEXT:    csrr a3, vlenb
 ; NOREMAT-NEXT:    slli a3, a3, 1
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vl2r.v v26, (a3) # Unknown-size Folded Reload
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v4
 ; NOREMAT-NEXT:    add a3, a0, s1
 ; NOREMAT-NEXT:    vle32.v v26, (a3)
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
@@ -309,27 +309,27 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v28, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; NOREMAT-NEXT:    addiw ra, s1, 512
 ; NOREMAT-NEXT:    add a3, a0, ra
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
 ; NOREMAT-NEXT:    vle32.v v30, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v0
 ; NOREMAT-NEXT:    slli s11, s10, 10
 ; NOREMAT-NEXT:    add a3, a0, s11
-; NOREMAT-NEXT:    vle32.v v2, (a3)
+; NOREMAT-NEXT:    vle32.v v4, (a3)
 ; NOREMAT-NEXT:    vle32.v v18, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v12
 ; NOREMAT-NEXT:    addiw s10, s1, 1536
 ; NOREMAT-NEXT:    add a3, a0, s10
-; NOREMAT-NEXT:    vle32.v v4, (a3)
+; NOREMAT-NEXT:    vle32.v v6, (a3)
 ; NOREMAT-NEXT:    vle32.v v20, (a3)
 ; NOREMAT-NEXT:    addi a3, sp, 288
 ; NOREMAT-NEXT:    vl2r.v v12, (a3) # Unknown-size Folded Reload
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v8
 ; NOREMAT-NEXT:    slli s9, s8, 11
 ; NOREMAT-NEXT:    add a3, a0, s9
-; NOREMAT-NEXT:    vle32.v v6, (a3)
+; NOREMAT-NEXT:    vle32.v v0, (a3)
 ; NOREMAT-NEXT:    vle32.v v12, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v24, v10
 ; NOREMAT-NEXT:    lui t0, 6
@@ -337,7 +337,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, a0, s8
 ; NOREMAT-NEXT:    vle32.v v8, (a3)
 ; NOREMAT-NEXT:    vle32.v v22, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli s7, s6, 10
 ; NOREMAT-NEXT:    add a3, a0, s7
 ; NOREMAT-NEXT:    vle32.v v10, (a3)
@@ -345,7 +345,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v26
 ; NOREMAT-NEXT:    addiw s6, t0, -512
 ; NOREMAT-NEXT:    add a3, a0, s6
-; NOREMAT-NEXT:    vle32.v v0, (a3)
+; NOREMAT-NEXT:    vle32.v v2, (a3)
 ; NOREMAT-NEXT:    vle32.v v16, (a3)
 ; NOREMAT-NEXT:    csrr a3, vlenb
 ; NOREMAT-NEXT:    slli a3, a3, 2
@@ -361,7 +361,7 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v26, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v4
 ; NOREMAT-NEXT:    addiw s5, t0, 512
 ; NOREMAT-NEXT:    add a3, a0, s5
 ; NOREMAT-NEXT:    vle32.v v26, (a3)
@@ -371,12 +371,12 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a3, sp, a3
 ; NOREMAT-NEXT:    addi a3, a3, 288
 ; NOREMAT-NEXT:    vs2r.v v28, (a3) # Unknown-size Folded Spill
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; NOREMAT-NEXT:    slli s4, s3, 10
 ; NOREMAT-NEXT:    add a3, a0, s4
 ; NOREMAT-NEXT:    vle32.v v28, (a3)
 ; NOREMAT-NEXT:    vle32.v v18, (a3)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v0
 ; NOREMAT-NEXT:    addiw s3, t0, 1536
 ; NOREMAT-NEXT:    add a3, a0, s3
 ; NOREMAT-NEXT:    vle32.v v30, (a3)
@@ -384,23 +384,23 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v8
 ; NOREMAT-NEXT:    slli s2, s2, 11
 ; NOREMAT-NEXT:    add a3, a0, s2
-; NOREMAT-NEXT:    vle32.v v2, (a3)
+; NOREMAT-NEXT:    vle32.v v4, (a3)
 ; NOREMAT-NEXT:    vle32.v v12, (a3)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v10
 ; NOREMAT-NEXT:    lui a3, 7
 ; NOREMAT-NEXT:    addiw s0, a3, -1536
 ; NOREMAT-NEXT:    add a4, a0, s0
-; NOREMAT-NEXT:    vle32.v v4, (a4)
+; NOREMAT-NEXT:    vle32.v v6, (a4)
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli t6, t5, 10
 ; NOREMAT-NEXT:    add a4, a0, t6
-; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    vle32.v v0, (a4)
 ; NOREMAT-NEXT:    vle32.v v14, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v24
 ; NOREMAT-NEXT:    addiw t5, a3, -512
 ; NOREMAT-NEXT:    add a4, a0, t5
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
 ; NOREMAT-NEXT:    csrr a4, vlenb
 ; NOREMAT-NEXT:    slli a4, a4, 2
@@ -426,42 +426,42 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; NOREMAT-NEXT:    add a4, a0, t3
 ; NOREMAT-NEXT:    vle32.v v18, (a4)
 ; NOREMAT-NEXT:    vle32.v v28, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v2
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
 ; NOREMAT-NEXT:    addiw t2, a3, 1536
 ; NOREMAT-NEXT:    add a4, a0, t2
 ; NOREMAT-NEXT:    vle32.v v20, (a4)
 ; NOREMAT-NEXT:    vle32.v v30, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v12, v6
 ; NOREMAT-NEXT:    slli t1, a2, 11
 ; NOREMAT-NEXT:    add a2, a0, t1
 ; NOREMAT-NEXT:    vle32.v v12, (a2)
-; NOREMAT-NEXT:    vle32.v v2, (a2)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v6
+; NOREMAT-NEXT:    vle32.v v4, (a2)
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v22, v0
 ; NOREMAT-NEXT:    lui a2, 8
 ; NOREMAT-NEXT:    addiw a7, a2, -1536
 ; NOREMAT-NEXT:    add a4, a0, a7
 ; NOREMAT-NEXT:    vle32.v v22, (a4)
-; NOREMAT-NEXT:    vle32.v v4, (a4)
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; NOREMAT-NEXT:    vle32.v v6, (a4)
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; NOREMAT-NEXT:    slli a6, a5, 10
 ; NOREMAT-NEXT:    add a4, a0, a6
 ; NOREMAT-NEXT:    vle32.v v14, (a4)
-; NOREMAT-NEXT:    vle32.v v0, (a4)
+; NOREMAT-NEXT:    vle32.v v2, (a4)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v16, v26
 ; NOREMAT-NEXT:    addiw a5, a2, -512
 ; NOREMAT-NEXT:    add a4, a0, a5
 ; NOREMAT-NEXT:    vle32.v v16, (a4)
 ; NOREMAT-NEXT:    vle32.v v26, (a4)
 ; NOREMAT-NEXT:    add a0, a0, a2
-; NOREMAT-NEXT:    vle32.v v6, (a0)
+; NOREMAT-NEXT:    vle32.v v0, (a0)
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v8, v10
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v24, v18
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v28, v20
 ; NOREMAT-NEXT:    sf.vc.vv 3, 0, v30, v12
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v2, v22
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v4, v14
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v0, v16
-; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v6
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v4, v22
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v6, v14
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v2, v16
+; NOREMAT-NEXT:    sf.vc.vv 3, 0, v26, v0
 ; NOREMAT-NEXT:    sf.vc.v.i 2, 0, v8, 0
 ; NOREMAT-NEXT:    addi a0, a1, 1024
 ; NOREMAT-NEXT:    vse32.v v8, (a0)
@@ -952,17 +952,17 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    li a2, 13
 ; REMAT-NEXT:    slli a2, a2, 10
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    li a2, 27
 ; REMAT-NEXT:    slli a2, a2, 9
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v20, (a2)
 ; REMAT-NEXT:    li a2, 7
 ; REMAT-NEXT:    slli a2, a2, 11
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v22, (a2)
 ; REMAT-NEXT:    li a2, 29
 ; REMAT-NEXT:    slli a2, a2, 9
@@ -986,23 +986,23 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, a2
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
 ; REMAT-NEXT:    lui a2, 4
 ; REMAT-NEXT:    addiw a2, a2, 512
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
 ; REMAT-NEXT:    li a2, 17
 ; REMAT-NEXT:    slli a2, a2, 10
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v6
 ; REMAT-NEXT:    lui a2, 4
 ; REMAT-NEXT:    addiw a2, a2, 1536
 ; REMAT-NEXT:    add a2, a0, a2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v20, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v22, v24
 ; REMAT-NEXT:    li a2, 9
@@ -1028,22 +1028,22 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, ra
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
 ; REMAT-NEXT:    lui s11, 5
 ; REMAT-NEXT:    add a2, a0, s11
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v4
 ; REMAT-NEXT:    lui s10, 5
 ; REMAT-NEXT:    addiw s10, s10, 512
 ; REMAT-NEXT:    add a2, a0, s10
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v6
 ; REMAT-NEXT:    li s9, 21
 ; REMAT-NEXT:    slli s9, s9, 10
 ; REMAT-NEXT:    add a2, a0, s9
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v24
 ; REMAT-NEXT:    lui s8, 5
@@ -1069,28 +1069,28 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, s5
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v2
 ; REMAT-NEXT:    lui s4, 6
 ; REMAT-NEXT:    addiw s4, s4, -512
 ; REMAT-NEXT:    add a2, a0, s4
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v4
 ; REMAT-NEXT:    lui s3, 6
 ; REMAT-NEXT:    add a2, a0, s3
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v14, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v6
 ; REMAT-NEXT:    lui s2, 6
 ; REMAT-NEXT:    addiw s2, s2, 512
 ; REMAT-NEXT:    add a2, a0, s2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v16, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v18, v24
 ; REMAT-NEXT:    li s1, 25
 ; REMAT-NEXT:    slli s1, s1, 10
 ; REMAT-NEXT:    add a2, a0, s1
-; REMAT-NEXT:    vle32.v v6, (a2)
+; REMAT-NEXT:    vle32.v v0, (a2)
 ; REMAT-NEXT:    vle32.v v18, (a2)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v20, v26
 ; REMAT-NEXT:    lui s0, 6
@@ -1110,24 +1110,24 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, t5
 ; REMAT-NEXT:    vle32.v v30, (a2)
 ; REMAT-NEXT:    vle32.v v24, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
 ; REMAT-NEXT:    li t4, 27
 ; REMAT-NEXT:    slli t4, t4, 10
 ; REMAT-NEXT:    add a2, a0, t4
-; REMAT-NEXT:    vle32.v v0, (a2)
+; REMAT-NEXT:    vle32.v v2, (a2)
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v2
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
 ; REMAT-NEXT:    lui t3, 7
 ; REMAT-NEXT:    addiw t3, t3, -512
 ; REMAT-NEXT:    add a2, a0, t3
-; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    vle32.v v4, (a2)
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v4
+; REMAT-NEXT:    sf.vc.vv 3, 0, v14, v6
 ; REMAT-NEXT:    lui t2, 7
 ; REMAT-NEXT:    add a2, a0, t2
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    vle32.v v8, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v6
+; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v0
 ; REMAT-NEXT:    lui t1, 7
 ; REMAT-NEXT:    addiw t1, t1, 512
 ; REMAT-NEXT:    add a2, a0, t1
@@ -1151,35 +1151,35 @@ define void @test(ptr %0, ptr %1, i64 %2) {
 ; REMAT-NEXT:    add a2, a0, a6
 ; REMAT-NEXT:    vle32.v v22, (a2)
 ; REMAT-NEXT:    vle32.v v30, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v24, v0
+; REMAT-NEXT:    sf.vc.vv 3, 0, v24, v2
 ; REMAT-NEXT:    lui a5, 8
 ; REMAT-NEXT:    addiw a5, a5, -1536
 ; REMAT-NEXT:    add a2, a0, a5
 ; REMAT-NEXT:    vle32.v v24, (a2)
-; REMAT-NEXT:    vle32.v v0, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v2
+; REMAT-NEXT:    vle32.v v2, (a2)
+; REMAT-NEXT:    sf.vc.vv 3, 0, v10, v4
 ; REMAT-NEXT:    li a4, 31
 ; REMAT-NEXT:    slli a4, a4, 10
 ; REMAT-NEXT:    add a2, a0, a4
 ; REMAT-NEXT:    vle32.v v10, (a2)
-; REMAT-NEXT:    vle32.v v2, (a2)
-; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v4
+; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    sf.vc.vv 3, 0, v12, v6
 ; REMAT-NEXT:    lui a3, 8
 ; REMAT-NEXT:    addiw a3, a3, -512
 ; REMAT-NEXT:    add a2, a0, a3
 ; REMAT-NEXT:    vle32.v v12, (a2)
-; REMAT-NEXT:    vle32.v v4, (a2)
+; REMAT-NEXT:    vle32.v v6, (a2)
 ; REMAT-NEXT:    lui a2, 8
 ; REMAT-NEXT:    add a0, a0, a2
-; REMAT-NEXT:    vle32.v v6, (a0)
+; REMAT-NEXT:    vle32.v v0, (a0)
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v8, v14
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v16, v18
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v26, v20
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v28, v22
 ; REMAT-NEXT:    sf.vc.vv 3, 0, v30, v24
-; REMAT-NEXT:    sf.vc.vv 3, 0, v0, v10
-; REMAT-NEXT:    sf.vc.vv 3, 0, v2, v12
-; REMAT-NEXT:    sf.vc.vv 3, 0, v4, v6
+; REMAT-NEXT:    sf.vc.vv 3, 0, v2, v10
+; REMAT-NEXT:    sf.vc.vv 3, 0, v4, v12
+; REMAT-NEXT:    sf.vc.vv 3, 0, v6, v0
 ; REMAT-NEXT:    sf.vc.v.i 2, 0, v8, 0
 ; REMAT-NEXT:    addi a0, a1, 1024
 ; REMAT-NEXT:    vse32.v v8, (a0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
index be0c68f443af59..c6c837ef8f5014 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
@@ -148,13 +148,13 @@ define <512 x i8> @two_source(<512 x i8> %a, <512 x i8> %b) {
 ; CHECK-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.x v24, a1
 ; CHECK-NEXT:    vsetivli zero, 8, e64, m1, ta, ma
-; CHECK-NEXT:    vmv.v.i v0, 0
+; CHECK-NEXT:    vmv.v.i v1, 0
 ; CHECK-NEXT:    lui a1, 1047552
 ; CHECK-NEXT:    addiw a1, a1, 1
 ; CHECK-NEXT:    slli a1, a1, 23
 ; CHECK-NEXT:    addi a1, a1, 1
 ; CHECK-NEXT:    slli a1, a1, 18
-; CHECK-NEXT:    vslide1down.vx v0, v0, a1
+; CHECK-NEXT:    vslide1down.vx v0, v1, a1
 ; CHECK-NEXT:    lui a1, 4
 ;...
[truncated]

lukel97

Makes sense to me.

As a side note this reminds me of #69290, but I'm not sure where it stands with the vector calling convention now

wangpc-pp · 2024-02-26T08:43:47Z

@4vtomat Would this change impact Vector Calling Convention?

kito-cheng · 2024-02-26T08:51:26Z

The allocation order should not matter to calling convention, compiler should move value to/from the right register between the ABI boundary.

4vtomat · 2024-02-26T09:17:03Z

@4vtomat Would this change impact Vector Calling Convention?

It doesn't matter for now since vector calling convention uses the order of these arrays~

preames

I was surprised to see this patch, I'd thought we already did this.

Good idea, one suggestions inline.

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

According to https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html: > The v0 register defined by the RISC-V vector extension is special in > that it can be used both as a general purpose vector register and also > as a mask register. As a preference, use registers other than v0 for > non-mask values. Otherwise data will have to be moved out of v0 when a > mask is required in an operation. v0 may be used when all other > registers are in use, and using v0 would avoid spilling register state > to memory. And using V0 register may stall masking pipeline and stop chaining for some microarchitectures. So we should try not to use V0 and register groups contained it as much as possible. We achieve this via moving V0 to the end of RA order.

wangpc-pp · 2024-02-29T09:01:36Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll

-; RV64-NEXT:    vmv.v.v v4, v8
+; RV64-NEXT:    vrgather.vi v4, v16, 5
+; RV64-NEXT:    vmv1r.v v0, v28
+; RV64-NEXT:    vrgather.vi v4, v24, 3, v0.t


This shows that we can reduce some spillings/reloads. :-)

preames

LGTM w/comment addressed, but please wait for confirmation from @topperc

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

topperc

LGTM

llvmbot added the backend:RISC-V label Feb 26, 2024

wangpc-pp requested review from asb, preames, lukel97, topperc and 4vtomat February 26, 2024 08:23

lukel97 reviewed Feb 26, 2024

View reviewed changes

preames reviewed Feb 28, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVRegisterInfo.td Outdated Show resolved Hide resolved

wangpc-pp force-pushed the main-rvv-avoid-use-v0 branch from 1dab565 to 77e74c5 Compare February 29, 2024 08:56

wangpc-pp commented Feb 29, 2024

View reviewed changes

preames approved these changes Feb 29, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVRegisterInfo.td Outdated Show resolved Hide resolved

Revise comments

a7bb4bd

topperc approved these changes Mar 1, 2024

View reviewed changes

wangpc-pp merged commit 2023a23 into llvm:main Mar 1, 2024
3 of 4 checks passed

wangpc-pp deleted the main-rvv-avoid-use-v0 branch March 1, 2024 04:19

yanming123456 mentioned this pull request Mar 27, 2024

[RISCV] Reorder the vector register allocation sequence. #69290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Move V0 to the end of register allocation order #82967

[RISCV] Move V0 to the end of register allocation order #82967

wangpc-pp commented Feb 26, 2024

llvmbot commented Feb 26, 2024

lukel97 left a comment

wangpc-pp commented Feb 26, 2024

kito-cheng commented Feb 26, 2024

4vtomat commented Feb 26, 2024 •

edited

preames left a comment

wangpc-pp Feb 29, 2024

preames left a comment

topperc left a comment

[RISCV] Move V0 to the end of register allocation order #82967

[RISCV] Move V0 to the end of register allocation order #82967

Conversation

wangpc-pp commented Feb 26, 2024

llvmbot commented Feb 26, 2024

lukel97 left a comment

Choose a reason for hiding this comment

wangpc-pp commented Feb 26, 2024

kito-cheng commented Feb 26, 2024

4vtomat commented Feb 26, 2024 • edited

preames left a comment

Choose a reason for hiding this comment

wangpc-pp Feb 29, 2024

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

topperc left a comment

Choose a reason for hiding this comment

4vtomat commented Feb 26, 2024 •

edited