[RISCV] Make X5 allocatable for JALR on CPUs without RAS #78417

wangpc-pp · 2024-01-17T09:40:29Z

Some microarchitectures may not support RAS, then we don't need to
reserve X5 register for JALR.

If RAS is supported, we will select the register allocation order
without X5 (because alternative orders should be subsets of the
default order).

llvmbot · 2024-01-17T09:40:59Z

@llvm/pr-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

Changes

Some microarchitectures may not support RAS, then we don't need to
reserve X5 register for JALR.

If RAS is supported, we will select the register allocation order
without X5 (because alternative orders should be subsets of the
default order).

Full diff: https://github.com/llvm/llvm-project/pull/78417.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+3)
(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+17-3)
(modified) llvm/test/CodeGen/RISCV/calls.ll (+118-206)
(modified) llvm/test/CodeGen/RISCV/tail-calls.ll (+21)

diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index fa334c69ddc982b..a6a23f63df4e825 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -970,6 +970,9 @@ def FeatureFastUnalignedAccess
 def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
     "UsePostRAScheduler", "true", "Schedule again after register allocation">;
 
+def FeatureNoRAS : SubtargetFeature<"no-ras", "HasRAS", "false",
+                                    "Hasn't RAS (Return Address Stack)">;
+
 def TuneNoOptimizedZeroStrideLoad
    : SubtargetFeature<"no-optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
                       "false", "Hasn't optimized (perform fewer memory operations)"
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 5a4d8c4cfece7ff..40284967135afd6 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -153,7 +153,13 @@ def GPRNoX0X2 : GPRRegisterClass<(sub GPR, X0, X2)>;
 // stack on some microarchitectures. Also remove the reserved registers X0, X2,
 // X3, and X4 as it reduces the number of register classes that get synthesized
 // by tablegen.
-def GPRJALR : GPRRegisterClass<(sub GPR, (sequence "X%u", 0, 5))>;
+// If RAS is supported, we select the alternative register order without X5. 
+def GPRJALR : GPRRegisterClass<(sub GPR, (sequence "X%u", 0, 4))> {
+  list<dag> AltOrders = [(sub GPR, (sequence "X%u", 0, 5))];
+  code AltOrderSelect = [{
+    return MF.getSubtarget<RISCVSubtarget>().hasRAS();
+  }];
+}
 
 def GPRC : GPRRegisterClass<(add (sequence "X%u", 10, 15),
                                  (sequence "X%u", 8, 9))>;
@@ -162,9 +168,17 @@ def GPRC : GPRRegisterClass<(add (sequence "X%u", 10, 15),
 // restored to the saved value before the tail call, which would clobber a call
 // address. We shouldn't use x5 since that is a hint for to pop the return
 // address stack on some microarchitectures.
-def GPRTC : GPRRegisterClass<(add (sequence "X%u", 6, 7),
+// If RAS is supported, we select the alternative register order without X5.
+def GPRTC : GPRRegisterClass<(add (sequence "X%u", 5, 7),
                                   (sequence "X%u", 10, 17),
-                                  (sequence "X%u", 28, 31))>;
+                                  (sequence "X%u", 28, 31))> {
+  list<dag> AltOrders = [(add (sequence "X%u", 6, 7),
+                              (sequence "X%u", 10, 17),
+                              (sequence "X%u", 28, 31))];
+  code AltOrderSelect = [{
+    return MF.getSubtarget<RISCVSubtarget>().hasRAS();
+  }];
+}
 
 def SP : GPRRegisterClass<(add X2)>;
 
diff --git a/llvm/test/CodeGen/RISCV/calls.ll b/llvm/test/CodeGen/RISCV/calls.ll
index 365f255dd82447b..b83991d3eaff36c 100644
--- a/llvm/test/CodeGen/RISCV/calls.ll
+++ b/llvm/test/CodeGen/RISCV/calls.ll
@@ -1,29 +1,22 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefix=RV32I %s
+; RUN:   | FileCheck -check-prefixes=CHECK,RV32I %s
 ; RUN: llc -relocation-model=pic -mtriple=riscv32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefix=RV32I-PIC %s
+; RUN:   | FileCheck -check-prefixes=CHECK,RV32I-PIC %s
+; RUN: llc -mtriple=riscv32 -mattr=+no-ras -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=CHECK,RV32I-NO-RAS %s
 
 declare i32 @external_function(i32)
 
 define i32 @test_call_external(i32 %a) nounwind {
-; RV32I-LABEL: test_call_external:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call external_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_external:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call external_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_external:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call external_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @external_function(i32 %a)
   ret i32 %1
 }
@@ -31,85 +24,51 @@ define i32 @test_call_external(i32 %a) nounwind {
 declare dso_local i32 @dso_local_function(i32)
 
 define i32 @test_call_dso_local(i32 %a) nounwind {
-; RV32I-LABEL: test_call_dso_local:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call dso_local_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_dso_local:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call dso_local_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_dso_local:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call dso_local_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @dso_local_function(i32 %a)
   ret i32 %1
 }
 
 define i32 @defined_function(i32 %a) nounwind {
-; RV32I-LABEL: defined_function:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi a0, a0, 1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: defined_function:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi a0, a0, 1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: defined_function:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a0, a0, 1
+; CHECK-NEXT:    ret
   %1 = add i32 %a, 1
   ret i32 %1
 }
 
 define i32 @test_call_defined(i32 %a) nounwind {
-; RV32I-LABEL: test_call_defined:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call defined_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_defined:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call defined_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_defined:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call defined_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @defined_function(i32 %a)
   ret i32 %1
 }
 
 define i32 @test_call_indirect(ptr %a, i32 %b) nounwind {
-; RV32I-LABEL: test_call_indirect:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a0, a1
-; RV32I-NEXT:    jalr a2
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_indirect:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a0, a1
-; RV32I-PIC-NEXT:    jalr a2
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_indirect:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    jalr a2
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 %a(i32 %b)
   ret i32 %1
 }
@@ -150,6 +109,23 @@ define i32 @test_call_indirect_no_t0(ptr %a, i32 %b, i32 %c, i32 %d, i32 %e, i32
 ; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-PIC-NEXT:    addi sp, sp, 16
 ; RV32I-PIC-NEXT:    ret
+;
+; RV32I-NO-RAS-LABEL: test_call_indirect_no_t0:
+; RV32I-NO-RAS:       # %bb.0:
+; RV32I-NO-RAS-NEXT:    addi sp, sp, -16
+; RV32I-NO-RAS-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32I-NO-RAS-NEXT:    mv t0, a0
+; RV32I-NO-RAS-NEXT:    mv a0, a1
+; RV32I-NO-RAS-NEXT:    mv a1, a2
+; RV32I-NO-RAS-NEXT:    mv a2, a3
+; RV32I-NO-RAS-NEXT:    mv a3, a4
+; RV32I-NO-RAS-NEXT:    mv a4, a5
+; RV32I-NO-RAS-NEXT:    mv a5, a6
+; RV32I-NO-RAS-NEXT:    mv a6, a7
+; RV32I-NO-RAS-NEXT:    jalr t0
+; RV32I-NO-RAS-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32I-NO-RAS-NEXT:    addi sp, sp, 16
+; RV32I-NO-RAS-NEXT:    ret
   %1 = call i32 %a(i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h)
   ret i32 %1
 }
@@ -158,45 +134,27 @@ define i32 @test_call_indirect_no_t0(ptr %a, i32 %b, i32 %c, i32 %d, i32 %e, i32
 ; introduced when compiling with optimisation.
 
 define fastcc i32 @fastcc_function(i32 %a, i32 %b) nounwind {
-; RV32I-LABEL: fastcc_function:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    add a0, a0, a1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: fastcc_function:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    add a0, a0, a1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: fastcc_function:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    add a0, a0, a1
+; CHECK-NEXT:    ret
  %1 = add i32 %a, %b
  ret i32 %1
 }
 
 define i32 @test_call_fastcc(i32 %a, i32 %b) nounwind {
-; RV32I-LABEL: test_call_fastcc:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv s0, a0
-; RV32I-NEXT:    call fastcc_function
-; RV32I-NEXT:    mv a0, s0
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_fastcc:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv s0, a0
-; RV32I-PIC-NEXT:    call fastcc_function
-; RV32I-PIC-NEXT:    mv a0, s0
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_fastcc:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    call fastcc_function
+; CHECK-NEXT:    mv a0, s0
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call fastcc i32 @fastcc_function(i32 %a, i32 %b)
   ret i32 %a
 }
@@ -204,106 +162,60 @@ define i32 @test_call_fastcc(i32 %a, i32 %b) nounwind {
 declare i32 @external_many_args(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) nounwind
 
 define i32 @test_call_external_many_args(i32 %a) nounwind {
-; RV32I-LABEL: test_call_external_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv s0, a0
-; RV32I-NEXT:    sw a0, 4(sp)
-; RV32I-NEXT:    sw a0, 0(sp)
-; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    mv a4, a0
-; RV32I-NEXT:    mv a5, a0
-; RV32I-NEXT:    mv a6, a0
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    call external_many_args
-; RV32I-NEXT:    mv a0, s0
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_external_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv s0, a0
-; RV32I-PIC-NEXT:    sw a0, 4(sp)
-; RV32I-PIC-NEXT:    sw a0, 0(sp)
-; RV32I-PIC-NEXT:    mv a1, a0
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a3, a0
-; RV32I-PIC-NEXT:    mv a4, a0
-; RV32I-PIC-NEXT:    mv a5, a0
-; RV32I-PIC-NEXT:    mv a6, a0
-; RV32I-PIC-NEXT:    mv a7, a0
-; RV32I-PIC-NEXT:    call external_many_args
-; RV32I-PIC-NEXT:    mv a0, s0
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_external_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    sw a0, 4(sp)
+; CHECK-NEXT:    sw a0, 0(sp)
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a4, a0
+; CHECK-NEXT:    mv a5, a0
+; CHECK-NEXT:    mv a6, a0
+; CHECK-NEXT:    mv a7, a0
+; CHECK-NEXT:    call external_many_args
+; CHECK-NEXT:    mv a0, s0
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @external_many_args(i32 %a, i32 %a, i32 %a, i32 %a, i32 %a,
                                     i32 %a, i32 %a, i32 %a, i32 %a, i32 %a)
   ret i32 %a
 }
 
 define i32 @defined_many_args(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 %j) nounwind {
-; RV32I-LABEL: defined_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    lw a0, 4(sp)
-; RV32I-NEXT:    addi a0, a0, 1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: defined_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    lw a0, 4(sp)
-; RV32I-PIC-NEXT:    addi a0, a0, 1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: defined_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lw a0, 4(sp)
+; CHECK-NEXT:    addi a0, a0, 1
+; CHECK-NEXT:    ret
   %added = add i32 %j, 1
   ret i32 %added
 }
 
 define i32 @test_call_defined_many_args(i32 %a) nounwind {
-; RV32I-LABEL: test_call_defined_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw a0, 4(sp)
-; RV32I-NEXT:    sw a0, 0(sp)
-; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    mv a4, a0
-; RV32I-NEXT:    mv a5, a0
-; RV32I-NEXT:    mv a6, a0
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    call defined_many_args
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_defined_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw a0, 4(sp)
-; RV32I-PIC-NEXT:    sw a0, 0(sp)
-; RV32I-PIC-NEXT:    mv a1, a0
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a3, a0
-; RV32I-PIC-NEXT:    mv a4, a0
-; RV32I-PIC-NEXT:    mv a5, a0
-; RV32I-PIC-NEXT:    mv a6, a0
-; RV32I-PIC-NEXT:    mv a7, a0
-; RV32I-PIC-NEXT:    call defined_many_args
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_defined_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a0, 4(sp)
+; CHECK-NEXT:    sw a0, 0(sp)
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a4, a0
+; CHECK-NEXT:    mv a5, a0
+; CHECK-NEXT:    mv a6, a0
+; CHECK-NEXT:    mv a7, a0
+; CHECK-NEXT:    call defined_many_args
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @defined_many_args(i32 %a, i32 %a, i32 %a, i32 %a, i32 %a,
                                    i32 %a, i32 %a, i32 %a, i32 %a, i32 %a)
   ret i32 %1
diff --git a/llvm/test/CodeGen/RISCV/tail-calls.ll b/llvm/test/CodeGen/RISCV/tail-calls.ll
index e3079424230bcc3..2cc978260e49aa3 100644
--- a/llvm/test/CodeGen/RISCV/tail-calls.ll
+++ b/llvm/test/CodeGen/RISCV/tail-calls.ll
@@ -1,5 +1,6 @@
 ; RUN: llc -mtriple riscv32-unknown-linux-gnu -o - %s | FileCheck %s
 ; RUN: llc -mtriple riscv32-unknown-elf       -o - %s | FileCheck %s
+; RUN: llc -mtriple riscv32 -mattr=+no-ras    -o - %s | FileCheck --check-prefix=CHECK-NO-RAS %s 
 
 ; Perform tail call optimization for global address.
 declare i32 @callee_tail(i32 %i)
@@ -51,6 +52,14 @@ define void @caller_indirect_tail(i32 %a) nounwind {
 ; CHECK: lui a0, %hi(callee_indirect1)
 ; CHECK-NEXT: addi t1, a0, %lo(callee_indirect1)
 ; CHECK-NEXT: jr t1
+
+; CHECK-NO-RAS: lui a0, %hi(callee_indirect2)
+; CHECK-NO-RAS-NEXT: addi t0, a0, %lo(callee_indirect2)
+; CHECK-NO-RAS-NEXT: jr t0
+
+; CHECK-NO-RAS: lui a0, %hi(callee_indirect1)
+; CHECK-NO-RAS-NEXT: addi t0, a0, %lo(callee_indirect1)
+; CHECK-NO-RAS-NEXT: jr t0
 entry:
   %tobool = icmp eq i32 %a, 0
   %callee = select i1 %tobool, ptr @callee_indirect1, ptr @callee_indirect2
@@ -72,6 +81,18 @@ define i32 @caller_indirect_no_t0(ptr %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5
 ; CHECK-NEXT:    mv a5, a6
 ; CHECK-NEXT:    mv a6, a7
 ; CHECK-NEXT:    jr t1
+
+; CHECK-NO-RAS-LABEL: caller_indirect_no_t0:
+; CHECK-NO-RAS:       # %bb.0:
+; CHECK-NO-RAS-NEXT:    mv t0, a0
+; CHECK-NO-RAS-NEXT:    mv a0, a1
+; CHECK-NO-RAS-NEXT:    mv a1, a2
+; CHECK-NO-RAS-NEXT:    mv a2, a3
+; CHECK-NO-RAS-NEXT:    mv a3, a4
+; CHECK-NO-RAS-NEXT:    mv a4, a5
+; CHECK-NO-RAS-NEXT:    mv a5, a6
+; CHECK-NO-RAS-NEXT:    mv a6, a7
+; CHECK-NO-RAS-NEXT:    jr t0
   %9 = tail call i32 %0(i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7)
   ret i32 %9
 }

llvm/lib/Target/RISCV/RISCVFeatures.td

topperc · 2024-01-17T17:08:31Z

Do you have any performance or code size data to show that this change is a benefit to CPUs with RAS?

wangpc-pp · 2024-01-22T12:53:19Z

Do you have any performance or code size data to show that this change is a benefit to CPUs with RAS?

I don't have performance data as I don't have such hardware implementation. And I don't see code size change on llvm-test-suite.
The thought of this PR came from a random discussion with my colleague, and ARM supports such feature. So I think maybe some low-end products need it.

topperc · 2024-01-22T23:47:08Z

Do you have any performance or code size data to show that this change is a benefit to CPUs with RAS?

I don't have performance data as I don't have such hardware implementation. And I don't see code size change on llvm-test-suite. The thought of this PR came from a random discussion with my colleague, and ARM supports such feature. So I think maybe some low-end products need it.

It doesn't look like ARM uses it for register allocation though. This is the only code I can find on ARM that uses it

    if (!isDirect && !Subtarget->hasV5TOps())                                    
      CallOpc = ARMISD::CALL_NOLINK;                                             
    else if (doesNotRet && isDirect && Subtarget->hasRetAddrStack() &&           
             // Emit regular call when code size is the priority                 
             !Subtarget->hasMinSize())                                           
      // "mov lr, pc; b _foo" to avoid confusing the RSP                         
      CallOpc = ARMISD::CALL_NOLINK;                                             
    else                                                                         
      CallOpc = isLocalARMFunc ? ARMISD::CALL_PRED : ARMISD::CALL;

wangpc-pp · 2024-01-23T10:21:11Z

Do you have any performance or code size data to show that this change is a benefit to CPUs with RAS?

I don't have performance data as I don't have such hardware implementation. And I don't see code size change on llvm-test-suite. The thought of this PR came from a random discussion with my colleague, and ARM supports such feature. So I think maybe some low-end products need it.

It doesn't look like ARM uses it for register allocation though. This is the only code I can find on ARM that uses it
    if (!isDirect && !Subtarget->hasV5TOps())                                    
      CallOpc = ARMISD::CALL_NOLINK;                                             
    else if (doesNotRet && isDirect && Subtarget->hasRetAddrStack() &&           
             // Emit regular call when code size is the priority                 
             !Subtarget->hasMinSize())                                           
      // "mov lr, pc; b _foo" to avoid confusing the RSP                         
      CallOpc = ARMISD::CALL_NOLINK;                                             
    else                                                                         
      CallOpc = isLocalARMFunc ? ARMISD::CALL_PRED : ARMISD::CALL;

This is some kind of reversion of your patch: https://reviews.llvm.org/D105875, it seems that we used to use GPR for JALR. The benefit is that such implementaion can have one more available register.
I don't have more data to support such configuration, so I'll let you guys decide whether we should support it. :-)

llvm/test/CodeGen/RISCV/calls.ll

Split out from #78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: #79248

Split out from llvm#78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm#79248

Some microarchitectures may not support RAS, then we don't need to reserve X5 register for JALR. If RAS is supported, we will select the register allocation order without X5 (because alternative orders should be subsets of the default order).

wangpc-pp · 2024-02-05T06:26:28Z

Ping for comments. :-)

topperc · 2024-02-05T06:51:03Z

Ping for comments. :-)

If this didn't improve code size on llvm-test-suite, I think it just adds an extra configuration option for no benefit.

The ARM code specifically seems to use the feature to generate a longer code sequence for noreturn calls on CPUs that support RAS to avoid corrupting the predictor. So that seems different than this patch.

wangpc-pp · 2024-02-05T07:23:51Z

Ping for comments. :-)

If this didn't improve code size on llvm-test-suite, I think it just adds an extra configuration option for no benefit.

The ARM code specifically seems to use the feature to generate a longer code sequence for noreturn calls on CPUs that support RAS to avoid corrupting the predictor. So that seems different than this patch.

Yeah, make sense to me. We may investigate this again if it will be profitable in the future.

dtcxzyw · 2024-02-05T08:18:14Z

Ping for comments. :-)

If this didn't improve code size on llvm-test-suite, I think it just adds an extra configuration option for no benefit.

Result: dtcxzyw/llvm-ci#1006 (comment)

wangpc-pp · 2024-02-05T08:25:35Z

Ping for comments. :-)

If this didn't improve code size on llvm-test-suite, I think it just adds an extra configuration option for no benefit.

Result: dtcxzyw/llvm-ci#1006 (comment)

Thanks! The result is the same as my previous.

Split out from llvm#78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm#79248

wangpc-pp requested a review from topperc January 17, 2024 09:40

llvmbot added the backend:RISC-V label Jan 17, 2024

wangpc-pp requested review from asb and preames January 17, 2024 09:40

wangpc-pp requested a review from lukel97 January 17, 2024 09:46

dtcxzyw reviewed Jan 17, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVFeatures.td Show resolved Hide resolved

wangpc-pp force-pushed the main-riscv-feature-no-ras branch from 333963a to 7f0c77c Compare January 23, 2024 10:23

kito-cheng reviewed Jan 23, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/calls.ll Outdated Show resolved Hide resolved

wangpc-pp mentioned this pull request Jan 24, 2024

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll #79248

Merged

wangpc-pp added a commit that referenced this pull request Feb 1, 2024

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

178719e

Split out from #78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: #79248

smithp35 pushed a commit to smithp35/llvm-project that referenced this pull request Feb 1, 2024

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

05371d8

Split out from llvm#78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm#79248

carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

1528387

Split out from llvm#78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm#79248

wangpc-pp force-pushed the main-riscv-feature-no-ras branch from 7f0c77c to f051f4a Compare February 5, 2024 06:24

wangpc-pp closed this Feb 5, 2024

dtcxzyw mentioned this pull request Feb 5, 2024

Test PR78417 dtcxzyw/llvm-ci#1006

Closed

agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

3081e5b

Split out from llvm#78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm#79248

wangpc-pp deleted the main-riscv-feature-no-ras branch March 29, 2024 04:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Make X5 allocatable for JALR on CPUs without RAS #78417

[RISCV] Make X5 allocatable for JALR on CPUs without RAS #78417

wangpc-pp commented Jan 17, 2024

llvmbot commented Jan 17, 2024

topperc commented Jan 17, 2024

wangpc-pp commented Jan 22, 2024

topperc commented Jan 22, 2024

wangpc-pp commented Jan 23, 2024

wangpc-pp commented Feb 5, 2024

topperc commented Feb 5, 2024

wangpc-pp commented Feb 5, 2024

dtcxzyw commented Feb 5, 2024

wangpc-pp commented Feb 5, 2024

[RISCV] Make X5 allocatable for JALR on CPUs without RAS #78417

[RISCV] Make X5 allocatable for JALR on CPUs without RAS #78417

Conversation

wangpc-pp commented Jan 17, 2024

llvmbot commented Jan 17, 2024

topperc commented Jan 17, 2024

wangpc-pp commented Jan 22, 2024

topperc commented Jan 22, 2024

wangpc-pp commented Jan 23, 2024

wangpc-pp commented Feb 5, 2024

topperc commented Feb 5, 2024

wangpc-pp commented Feb 5, 2024

dtcxzyw commented Feb 5, 2024

wangpc-pp commented Feb 5, 2024