-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[RISCV] Enable (non trivial) remat for most scalar instructions #162311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RISCV] Enable (non trivial) remat for most scalar instructions #162311
Conversation
This is a follow up to the recent infrastructure work for to generally support non-trivial rematerialization. This is the first in a small series to enable non-trivially agressively for the RISC-V backend. It deliberately avoids both vector instructions and loads as those seem most likely to expose unexpected interactions. Note that this isn't ready to land just yet. We need to collect both compile time (in progress), and more perf numbers/stats on at least e.g. spec2017/test-suite. I'm posting it mostly as a placeholder since multiple people were talking about this and I want us to avoid duplicating work.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis is a follow up to the recent infrastructure work for to generally support non-trivial rematerialization. This is the first in a small series to enable non-trivially agressively for the RISC-V backend. It deliberately avoids both vector instructions and loads as those seem most likely to expose unexpected interactions. Note that this isn't ready to land just yet. We need to collect both compile time (in progress), and more perf numbers/stats on at least e.g. spec2017/test-suite. I'm posting it mostly as a placeholder since multiple people were talking about this and I want us to avoid duplicating work. Patch is 419.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162311.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 9855c47a63392..f1ac3a5b7e9a5 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -780,21 +780,18 @@ def SB : Store_rri<0b000, "sb">, Sched<[WriteSTB, ReadStoreData, ReadMemBase]>;
def SH : Store_rri<0b001, "sh">, Sched<[WriteSTH, ReadStoreData, ReadMemBase]>;
def SW : Store_rri<0b010, "sw">, Sched<[WriteSTW, ReadStoreData, ReadMemBase]>;
-// ADDI isn't always rematerializable, but isReMaterializable will be used as
-// a hint which is verified in isReMaterializableImpl.
-let isReMaterializable = 1, isAsCheapAsAMove = 1 in
+let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
def ADDI : ALU_ri<0b000, "addi">;
+def XORI : ALU_ri<0b100, "xori">;
+def ORI : ALU_ri<0b110, "ori">;
+}
-let IsSignExtendingOpW = 1 in {
+let IsSignExtendingOpW = 1, isReMaterializable = 1 in {
def SLTI : ALU_ri<0b010, "slti">;
def SLTIU : ALU_ri<0b011, "sltiu">;
}
-let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
-def XORI : ALU_ri<0b100, "xori">;
-def ORI : ALU_ri<0b110, "ori">;
-}
-
+let isReMaterializable = 1 in {
def ANDI : ALU_ri<0b111, "andi">;
def SLLI : Shift_ri<0b00000, 0b001, "slli">,
@@ -826,6 +823,7 @@ def OR : ALU_rr<0b0000000, 0b110, "or", Commutable=1>,
Sched<[WriteIALU, ReadIALU, ReadIALU]>;
def AND : ALU_rr<0b0000000, 0b111, "and", Commutable=1>,
Sched<[WriteIALU, ReadIALU, ReadIALU]>;
+}
let hasSideEffects = 1, mayLoad = 0, mayStore = 0 in {
def FENCE : RVInstI<0b000, OPC_MISC_MEM, (outs),
@@ -893,7 +891,7 @@ def LWU : Load_ri<0b110, "lwu">, Sched<[WriteLDW, ReadMemBase]>;
def LD : Load_ri<0b011, "ld">, Sched<[WriteLDD, ReadMemBase]>;
def SD : Store_rri<0b011, "sd">, Sched<[WriteSTD, ReadStoreData, ReadMemBase]>;
-let IsSignExtendingOpW = 1 in {
+let IsSignExtendingOpW = 1, isReMaterializable = 1 in {
let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
def ADDIW : RVInstI<0b000, OPC_OP_IMM_32, (outs GPR:$rd),
(ins GPR:$rs1, simm12_lo:$imm12),
@@ -917,7 +915,7 @@ def SRLW : ALUW_rr<0b0000000, 0b101, "srlw">,
Sched<[WriteShiftReg32, ReadShiftReg32, ReadShiftReg32]>;
def SRAW : ALUW_rr<0b0100000, 0b101, "sraw">,
Sched<[WriteShiftReg32, ReadShiftReg32, ReadShiftReg32]>;
-} // IsSignExtendingOpW = 1
+} // IsSignExtendingOpW = 1, isReMaterializable = 1
} // Predicates = [IsRV64]
//===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index ca9f7637388f7..74c31a229dad4 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -3000,9 +3000,9 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
; RV32I-NEXT: sw s9, 20(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s10, 16(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s11, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT: li a4, 0
+; RV32I-NEXT: li a5, 0
; RV32I-NEXT: lbu a3, 0(a0)
-; RV32I-NEXT: lbu a5, 1(a0)
+; RV32I-NEXT: lbu a4, 1(a0)
; RV32I-NEXT: lbu a6, 2(a0)
; RV32I-NEXT: lbu a7, 3(a0)
; RV32I-NEXT: lbu t0, 4(a0)
@@ -3013,736 +3013,750 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
; RV32I-NEXT: lbu t5, 9(a0)
; RV32I-NEXT: lbu t6, 10(a0)
; RV32I-NEXT: lbu s0, 11(a0)
-; RV32I-NEXT: slli a5, a5, 8
+; RV32I-NEXT: slli a4, a4, 8
; RV32I-NEXT: slli a7, a7, 8
; RV32I-NEXT: slli t1, t1, 8
-; RV32I-NEXT: or a3, a5, a3
-; RV32I-NEXT: or a7, a7, a6
-; RV32I-NEXT: or t1, t1, t0
-; RV32I-NEXT: lbu a6, 13(a0)
-; RV32I-NEXT: lbu a5, 14(a0)
-; RV32I-NEXT: lbu s1, 15(a0)
+; RV32I-NEXT: or a3, a4, a3
+; RV32I-NEXT: or a4, a7, a6
+; RV32I-NEXT: or a7, t1, t0
+; RV32I-NEXT: lbu t0, 13(a0)
+; RV32I-NEXT: lbu a6, 14(a0)
+; RV32I-NEXT: lbu t1, 15(a0)
; RV32I-NEXT: slli t3, t3, 8
; RV32I-NEXT: slli t5, t5, 8
; RV32I-NEXT: slli s0, s0, 8
-; RV32I-NEXT: or t3, t3, t2
-; RV32I-NEXT: or t0, t5, t4
-; RV32I-NEXT: or t5, s0, t6
-; RV32I-NEXT: lbu t2, 1(a1)
-; RV32I-NEXT: lbu t4, 0(a1)
+; RV32I-NEXT: or s1, t3, t2
+; RV32I-NEXT: or t2, t5, t4
+; RV32I-NEXT: or t4, s0, t6
+; RV32I-NEXT: lbu t3, 1(a1)
+; RV32I-NEXT: lbu t5, 0(a1)
; RV32I-NEXT: lbu t6, 2(a1)
; RV32I-NEXT: lbu a1, 3(a1)
-; RV32I-NEXT: slli t2, t2, 8
-; RV32I-NEXT: or s0, t2, t4
-; RV32I-NEXT: slli t2, s1, 8
+; RV32I-NEXT: slli t3, t3, 8
+; RV32I-NEXT: or t5, t3, t5
+; RV32I-NEXT: slli t3, t1, 8
; RV32I-NEXT: slli a1, a1, 8
; RV32I-NEXT: or a1, a1, t6
-; RV32I-NEXT: slli t4, a7, 16
-; RV32I-NEXT: slli a7, t3, 16
-; RV32I-NEXT: slli t3, t5, 16
-; RV32I-NEXT: slli t5, a1, 16
-; RV32I-NEXT: or a1, a7, t1
-; RV32I-NEXT: or a7, t5, s0
+; RV32I-NEXT: slli a4, a4, 16
+; RV32I-NEXT: slli s1, s1, 16
+; RV32I-NEXT: slli t4, t4, 16
+; RV32I-NEXT: slli t1, a1, 16
+; RV32I-NEXT: or s5, s1, a7
+; RV32I-NEXT: or a7, t1, t5
; RV32I-NEXT: slli a7, a7, 3
; RV32I-NEXT: srli t1, a7, 5
; RV32I-NEXT: andi t5, a7, 31
; RV32I-NEXT: neg s3, t5
; RV32I-NEXT: beqz t5, .LBB12_2
; RV32I-NEXT: # %bb.1:
-; RV32I-NEXT: sll a4, a1, s3
+; RV32I-NEXT: sll a5, s5, s3
; RV32I-NEXT: .LBB12_2:
-; RV32I-NEXT: or s7, t4, a3
-; RV32I-NEXT: lbu t4, 12(a0)
-; RV32I-NEXT: lbu t6, 19(a0)
-; RV32I-NEXT: slli s1, a6, 8
-; RV32I-NEXT: or a5, t2, a5
-; RV32I-NEXT: or a3, t3, t0
+; RV32I-NEXT: or a4, a4, a3
+; RV32I-NEXT: lbu t6, 12(a0)
+; RV32I-NEXT: lbu s0, 19(a0)
+; RV32I-NEXT: slli s1, t0, 8
+; RV32I-NEXT: or t0, t3, a6
+; RV32I-NEXT: or a1, t4, t2
; RV32I-NEXT: beqz t1, .LBB12_4
; RV32I-NEXT: # %bb.3:
-; RV32I-NEXT: li s0, 0
+; RV32I-NEXT: mv s11, a4
+; RV32I-NEXT: li a4, 0
; RV32I-NEXT: j .LBB12_5
; RV32I-NEXT: .LBB12_4:
-; RV32I-NEXT: srl s0, s7, a7
-; RV32I-NEXT: or s0, s0, a4
+; RV32I-NEXT: mv s11, a4
+; RV32I-NEXT: srl a6, a4, a7
+; RV32I-NEXT: or a4, a6, a5
; RV32I-NEXT: .LBB12_5:
; RV32I-NEXT: li a6, 0
-; RV32I-NEXT: lbu t0, 17(a0)
-; RV32I-NEXT: lbu a4, 18(a0)
-; RV32I-NEXT: slli s4, t6, 8
-; RV32I-NEXT: or s2, s1, t4
-; RV32I-NEXT: slli a5, a5, 16
-; RV32I-NEXT: li s5, 1
-; RV32I-NEXT: sll t6, a3, s3
+; RV32I-NEXT: lbu s2, 17(a0)
+; RV32I-NEXT: lbu a5, 18(a0)
+; RV32I-NEXT: slli s4, s0, 8
+; RV32I-NEXT: or s1, s1, t6
+; RV32I-NEXT: slli t0, t0, 16
+; RV32I-NEXT: li t3, 1
+; RV32I-NEXT: sll s6, a1, s3
; RV32I-NEXT: beqz t5, .LBB12_7
; RV32I-NEXT: # %bb.6:
-; RV32I-NEXT: mv a6, t6
+; RV32I-NEXT: mv a6, s6
; RV32I-NEXT: .LBB12_7:
; RV32I-NEXT: lbu t2, 16(a0)
-; RV32I-NEXT: lbu t3, 23(a0)
-; RV32I-NEXT: slli s1, t0, 8
-; RV32I-NEXT: or t4, s4, a4
-; RV32I-NEXT: srl a4, a1, a7
-; RV32I-NEXT: or a5, a5, s2
-; RV32I-NEXT: bne t1, s5, .LBB12_9
+; RV32I-NEXT: lbu t4, 23(a0)
+; RV32I-NEXT: slli s0, s2, 8
+; RV32I-NEXT: or t6, s4, a5
+; RV32I-NEXT: srl a3, s5, a7
+; RV32I-NEXT: or a5, t0, s1
+; RV32I-NEXT: sw a3, 0(sp) # 4-byte Folded Spill
+; RV32I-NEXT: bne t1, t3, .LBB12_9
; RV32I-NEXT: # %bb.8:
-; RV32I-NEXT: or s0, a4, a6
+; RV32I-NEXT: or a4, a3, a6
; RV32I-NEXT: .LBB12_9:
; RV32I-NEXT: li t0, 0
-; RV32I-NEXT: lbu s5, 21(a0)
+; RV32I-NEXT: lbu s2, 21(a0)
; RV32I-NEXT: lbu a6, 22(a0)
-; RV32I-NEXT: slli s4, t3, 8
-; RV32I-NEXT: or t2, s1, t2
-; RV32I-NEXT: slli s6, t4, 16
-; RV32I-NEXT: li s8, 2
-; RV32I-NEXT: sll t3, a5, s3
+; RV32I-NEXT: slli s1, t4, 8
+; RV32I-NEXT: or t2, s0, t2
+; RV32I-NEXT: slli s4, t6, 16
+; RV32I-NEXT: li a3, 2
+; RV32I-NEXT: sll s8, a5, s3
; RV32I-NEXT: beqz t5, .LBB12_11
; RV32I-NEXT: # %bb.10:
-; RV32I-NEXT: mv t0, t3
+; RV32I-NEXT: mv t0, s8
; RV32I-NEXT: .LBB12_11:
-; RV32I-NEXT: lbu s1, 20(a0)
-; RV32I-NEXT: lbu s2, 27(a0)
-; RV32I-NEXT: slli s5, s5, 8
-; RV32I-NEXT: or s4, s4, a6
-; RV32I-NEXT: srl t4, a3, a7
-; RV32I-NEXT: or a6, s6, t2
-; RV32I-NEXT: bne t1, s8, .LBB12_13
+; RV32I-NEXT: lbu t6, 20(a0)
+; RV32I-NEXT: lbu s0, 27(a0)
+; RV32I-NEXT: slli s2, s2, 8
+; RV32I-NEXT: or s1, s1, a6
+; RV32I-NEXT: srl t3, a1, a7
+; RV32I-NEXT: or a6, s4, t2
+; RV32I-NEXT: sw s5, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT: bne t1, a3, .LBB12_13
; RV32I-NEXT: # %bb.12:
-; RV32I-NEXT: or s0, t4, t0
+; RV32I-NEXT: or a4, t3, t0
; RV32I-NEXT: .LBB12_13:
-; RV32I-NEXT: sw s7, 4(sp) # 4-byte Folded Spill
; RV32I-NEXT: li t2, 0
-; RV32I-NEXT: lbu s6, 25(a0)
+; RV32I-NEXT: lbu s4, 25(a0)
; RV32I-NEXT: lbu t0, 26(a0)
-; RV32I-NEXT: slli s8, s2, 8
-; RV32I-NEXT: or s7, s5, s1
-; RV32I-NEXT: slli s9, s4, 16
-; RV32I-NEXT: sll s11, a6, s3
+; RV32I-NEXT: slli s7, s0, 8
+; RV32I-NEXT: or s5, s2, t6
+; RV32I-NEXT: slli s9, s1, 16
+; RV32I-NEXT: li t6, 3
+; RV32I-NEXT: sll t4, a6, s3
; RV32I-NEXT: beqz t5, .LBB12_15
; RV32I-NEXT: # %bb.14:
-; RV32I-NEXT: mv t2, s11
+; RV32I-NEXT: mv t2, t4
; RV32I-NEXT: .LBB12_15:
-; RV32I-NEXT: lbu s1, 24(a0)
-; RV32I-NEXT: lbu s2, 31(a0)
-; RV32I-NEXT: slli s5, s6, 8
-; RV32I-NEXT: or s4, s8, t0
-; RV32I-NEXT: srl ra, a5, a7
-; RV32I-NEXT: or t0, s9, s7
-; RV32I-NEXT: li s6, 3
-; RV32I-NEXT: bne t1, s6, .LBB12_17
+; RV32I-NEXT: lbu s0, 24(a0)
+; RV32I-NEXT: lbu s1, 31(a0)
+; RV32I-NEXT: slli s4, s4, 8
+; RV32I-NEXT: or s2, s7, t0
+; RV32I-NEXT: srl a3, a5, a7
+; RV32I-NEXT: or t0, s9, s5
+; RV32I-NEXT: li s9, 3
+; RV32I-NEXT: bne t1, t6, .LBB12_17
; RV32I-NEXT: # %bb.16:
-; RV32I-NEXT: or s0, ra, t2
+; RV32I-NEXT: or a4, a3, t2
; RV32I-NEXT: .LBB12_17:
+; RV32I-NEXT: mv t6, t3
; RV32I-NEXT: li t2, 0
; RV32I-NEXT: lbu s7, 29(a0)
-; RV32I-NEXT: lbu s6, 30(a0)
-; RV32I-NEXT: slli s8, s2, 8
-; RV32I-NEXT: or s2, s5, s1
-; RV32I-NEXT: slli s5, s4, 16
-; RV32I-NEXT: li s9, 4
-; RV32I-NEXT: sll s1, t0, s3
-; RV32I-NEXT: sw s1, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT: lbu s5, 30(a0)
+; RV32I-NEXT: slli s1, s1, 8
+; RV32I-NEXT: or s10, s4, s0
+; RV32I-NEXT: slli s2, s2, 16
+; RV32I-NEXT: li a3, 4
+; RV32I-NEXT: sll s0, t0, s3
; RV32I-NEXT: beqz t5, .LBB12_19
; RV32I-NEXT: # %bb.18:
-; RV32I-NEXT: lw t2, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT: mv t2, s0
; RV32I-NEXT: .LBB12_19:
-; RV32I-NEXT: lbu s1, 28(a0)
+; RV32I-NEXT: lbu t3, 28(a0)
; RV32I-NEXT: slli s7, s7, 8
-; RV32I-NEXT: or s4, s8, s6
-; RV32I-NEXT: srl s10, a6, a7
-; RV32I-NEXT: or a0, s5, s2
-; RV32I-NEXT: bne t1, s9, .LBB12_21
+; RV32I-NEXT: or s4, s1, s5
+; RV32I-NEXT: srl s1, a6, a7
+; RV32I-NEXT: or a0, s2, s10
+; RV32I-NEXT: beq t1, a3, .LBB12_21
; RV32I-NEXT: # %bb.20:
-; RV32I-NEXT: or s0, s10, t2
+; RV32I-NEXT: mv a3, s1
+; RV32I-NEXT: j .LBB12_22
; RV32I-NEXT: .LBB12_21:
+; RV32I-NEXT: mv a3, s1
+; RV32I-NEXT: or a4, s1, t2
+; RV32I-NEXT: .LBB12_22:
+; RV32I-NEXT: li s10, 1
; RV32I-NEXT: li s2, 0
-; RV32I-NEXT: or t2, s7, s1
+; RV32I-NEXT: or t2, s7, t3
; RV32I-NEXT: slli s4, s4, 16
-; RV32I-NEXT: li s9, 5
+; RV32I-NEXT: li s1, 5
; RV32I-NEXT: sll s7, a0, s3
-; RV32I-NEXT: beqz t5, .LBB12_23
-; RV32I-NEXT: # %bb.22:
+; RV32I-NEXT: beqz t5, .LBB12_24
+; RV32I-NEXT: # %bb.23:
; RV32I-NEXT: mv s2, s7
-; RV32I-NEXT: .LBB12_23:
-; RV32I-NEXT: srl s8, t0, a7
+; RV32I-NEXT: .LBB12_24:
+; RV32I-NEXT: sw a1, 4(sp) # 4-byte Folded Spill
+; RV32I-NEXT: srl t3, t0, a7
; RV32I-NEXT: or t2, s4, t2
-; RV32I-NEXT: bne t1, s9, .LBB12_25
-; RV32I-NEXT: # %bb.24:
-; RV32I-NEXT: or s0, s8, s2
-; RV32I-NEXT: .LBB12_25:
-; RV32I-NEXT: li s4, 0
+; RV32I-NEXT: beq t1, s1, .LBB12_26
+; RV32I-NEXT: # %bb.25:
+; RV32I-NEXT: mv a1, t3
+; RV32I-NEXT: j .LBB12_27
+; RV32I-NEXT: .LBB12_26:
+; RV32I-NEXT: mv a1, t3
+; RV32I-NEXT: or a4, t3, s2
+; RV32I-NEXT: .LBB12_27:
+; RV32I-NEXT: li t3, 0
; RV32I-NEXT: li s2, 6
; RV32I-NEXT: sll s5, t2, s3
-; RV32I-NEXT: beqz t5, .LBB12_27
-; RV32I-NEXT: # %bb.26:
-; RV32I-NEXT: mv s4, s5
-; RV32I-NEXT: .LBB12_27:
-; RV32I-NEXT: srl s6, a0, a7
-; RV32I-NEXT: bne t1, s2, .LBB12_29
+; RV32I-NEXT: beqz t5, .LBB12_29
; RV32I-NEXT: # %bb.28:
-; RV32I-NEXT: or s0, s6, s4
+; RV32I-NEXT: mv t3, s5
; RV32I-NEXT: .LBB12_29:
-; RV32I-NEXT: li s3, 7
-; RV32I-NEXT: srl s1, t2, a7
-; RV32I-NEXT: mv s4, s1
-; RV32I-NEXT: bne t1, s3, .LBB12_34
+; RV32I-NEXT: srl s3, a0, a7
+; RV32I-NEXT: beq t1, s2, .LBB12_31
; RV32I-NEXT: # %bb.30:
-; RV32I-NEXT: bnez a7, .LBB12_35
+; RV32I-NEXT: mv ra, s3
+; RV32I-NEXT: j .LBB12_32
; RV32I-NEXT: .LBB12_31:
-; RV32I-NEXT: li s0, 0
-; RV32I-NEXT: bnez t5, .LBB12_36
+; RV32I-NEXT: mv ra, s3
+; RV32I-NEXT: or a4, s3, t3
; RV32I-NEXT: .LBB12_32:
-; RV32I-NEXT: li s4, 2
-; RV32I-NEXT: beqz t1, .LBB12_37
-; RV32I-NEXT: .LBB12_33:
-; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: j .LBB12_38
+; RV32I-NEXT: li s3, 7
+; RV32I-NEXT: srl s4, t2, a7
+; RV32I-NEXT: mv t3, s4
+; RV32I-NEXT: beq t1, s3, .LBB12_34
+; RV32I-NEXT: # %bb.33:
+; RV32I-NEXT: mv t3, a4
; RV32I-NEXT: .LBB12_34:
-; RV32I-NEXT: mv s4, s0
-; RV32I-NEXT: beqz a7, .LBB12_31
-; RV32I-NEXT: .LBB12_35:
-; RV32I-NEXT: sw s4, 4(sp) # 4-byte Folded Spill
-; RV32I-NEXT: li s0, 0
-; RV32I-NEXT: beqz t5, .LBB12_32
+; RV32I-NEXT: mv a4, s11
+; RV32I-NEXT: beqz a7, .LBB12_36
+; RV32I-NEXT: # %bb.35:
+; RV32I-NEXT: mv a4, t3
; RV32I-NEXT: .LBB12_36:
-; RV32I-NEXT: mv s0, t6
-; RV32I-NEXT: li s4, 2
-; RV32I-NEXT: bnez t1, .LBB12_33
-; RV32I-NEXT: .LBB12_37:
-; RV32I-NEXT: or a4, a4, s0
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: li s11, 2
+; RV32I-NEXT: beqz t5, .LBB12_38
+; RV32I-NEXT: # %bb.37:
+; RV32I-NEXT: mv t3, s6
; RV32I-NEXT: .LBB12_38:
-; RV32I-NEXT: li s0, 1
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: bnez t5, .LBB12_57
+; RV32I-NEXT: beqz t1, .LBB12_40
; RV32I-NEXT: # %bb.39:
-; RV32I-NEXT: beq t1, s0, .LBB12_58
+; RV32I-NEXT: li s6, 0
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: bnez t5, .LBB12_41
+; RV32I-NEXT: j .LBB12_42
; RV32I-NEXT: .LBB12_40:
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: bnez t5, .LBB12_59
+; RV32I-NEXT: lw s6, 0(sp) # 4-byte Folded Reload
+; RV32I-NEXT: or s6, s6, t3
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: beqz t5, .LBB12_42
; RV32I-NEXT: .LBB12_41:
-; RV32I-NEXT: beq t1, s4, .LBB12_60
+; RV32I-NEXT: mv t3, s8
; RV32I-NEXT: .LBB12_42:
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: bnez t5, .LBB12_61
-; RV32I-NEXT: .LBB12_43:
-; RV32I-NEXT: li s4, 3
-; RV32I-NEXT: bne t1, s4, .LBB12_45
+; RV32I-NEXT: beq t1, s10, .LBB12_58
+; RV32I-NEXT: # %bb.43:
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: bnez t5, .LBB12_59
; RV32I-NEXT: .LBB12_44:
-; RV32I-NEXT: or a4, s10, t6
+; RV32I-NEXT: beq t1, s11, .LBB12_60
; RV32I-NEXT: .LBB12_45:
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: li s4, 4
-; RV32I-NEXT: bnez t5, .LBB12_62
-; RV32I-NEXT: # %bb.46:
-; RV32I-NEXT: beq t1, s4, .LBB12_63
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: bnez t5, .LBB12_61
+; RV32I-NEXT: .LBB12_46:
+; RV32I-NEXT: bne t1, s9, .LBB12_48
; RV32I-NEXT: .LBB12_47:
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: bnez t5, .LBB12_64
+; RV32I-NEXT: or s6, a3, t3
; RV32I-NEXT: .LBB12_48:
-; RV32I-NEXT: beq t1, s9, .LBB12_65
-; RV32I-NEXT: .LBB12_49:
-; RV32I-NEXT: mv t6, s1
-; RV32I-NEXT: bne t1, s2, .LBB12_66
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: li s9, 4
+; RV32I-NEXT: bnez t5, .LBB12_62
+; RV32I-NEXT: # %bb.49:
+; RV32I-NEXT: beq t1, s9, .LBB12_63
; RV32I-NEXT: .LBB12_50:
-; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: bne t1, s3, .LBB12_67
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: bnez t5, .LBB12_64
; RV32I-NEXT: .LBB12_51:
-; RV32I-NEXT: beqz a7, .LBB12_53
+; RV32I-NEXT: beq t1, s1, .LBB12_65
; RV32I-NEXT: .LBB12_52:
-; RV32I-NEXT: mv a1, a4
+; RV32I-NEXT: mv t3, s4
+; RV32I-NEXT: bne t1, s2, .LBB12_66
; RV32I-NEXT: .LBB12_53:
-; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: li t6, 2
-; RV32I-NEXT: beqz t5, .LBB12_55
-; RV32I-NEXT: # %bb.54:
-; RV32I-NEXT: mv a4, t3
+; RV32I-NEXT: li s6, 0
+; RV32I-NEXT: bne t1, s3, .LBB12_67
+; RV32I-NEXT: .LBB12_54:
+; RV32I-NEXT: bnez a7, .LBB12_68
; RV32I-NEXT: .LBB12_55:
-; RV32I-NEXT: beqz t1, .LBB12_68
-; RV32I-NEXT: # %bb.56:
-; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: j .LBB12_69
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: bnez t5, .LBB12_69
+; RV32I-NEXT: .LBB12_56:
+; RV32I-NEXT: beqz t1, .LBB12_70
; RV32I-NEXT: .LBB12_57:
-; RV32I-NEXT: mv t6, t3
-; RV32I-NEXT: bne t1, s0, .LBB12_40
+; RV32I-NEXT: li s6, 0
+; RV32I-NEXT: j .LBB12_71
; RV32I-NEXT: .LBB12_58:
-; RV32I-NEXT: or a4, t4, t6
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: beqz t5, .LBB12_41
+; RV32I-NEXT: or s6, t6, t3
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: beqz t5, .LBB12_44
; RV32I-NEXT: .LBB12_59:
-; RV32I-NEXT: mv t6, s11
-; RV32I-NEXT: bne t1, s4, .LBB12_42
+; RV32I-NEXT: mv t3, t4
+; RV32I-NEXT: bne t1, s11, .LBB12_45
; RV32I-NEXT: .LBB12_60:
-; RV32I-NEXT: or a4, ra, t6
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: beqz t5, .LBB12_43
+; RV32I-NEXT: srl s6, a5, a7
+; RV32I-NEXT: or s6, s6, t3
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: beqz t5, .LBB12_46
; RV32I-NEXT: .LBB12_61:
-; RV32I-NEXT: lw t6, 8(sp) # 4-byte Folded Reload
-; RV32I-NEXT: li s4, 3
-; RV32I-NEXT: beq t1, s4, .LBB12_44
-; RV32I-NEXT: j .LBB12_45
+; RV32I-NEXT: mv t3, s0
+; RV32I-NEXT: beq t1, s9, .LBB12_47
+; RV32I-NEXT: j .LBB12_48
; RV32I-NEXT: .LBB12_62:
-; RV32I-NEXT: mv t6, s7
-; RV32I-NEXT: bne t1, s4, .LBB12_47
+; RV32I-NEXT: mv t3, s7
+; RV32I-NEXT: bne t1, s9, .LBB12_50
; RV32I-NEXT: .LBB12_63:
-; RV32I-NEXT: or a4, s8, t6
-; RV32I-NEXT: li t6, 0
-; RV32I-NEXT: beqz t5, .LBB12_48
+; RV32I-NEXT: or s6, a1, t3
+; RV32I-NEXT: li t3, 0
+; RV32I-NEXT: beqz t5, .LBB12_51
; RV32I-NEXT: .LBB12_64:
-; RV32I-NEXT: mv t6, s5
-; RV32I-NEXT: bne t1, s9, .LBB12_49
+; RV32I-NEXT: mv t3, s5
+; RV32I-NEXT: bne t1, s1, .LBB12_52
; RV32I-NEXT: .LBB12_65:
-; RV32I-NEXT: or a4, s6, t6
-; RV32I-NEXT: mv t6, s1
-; RV32I-NEXT: beq t1, s2, .LBB12_50
+; RV32I-NEXT: or s6, ra, t3
+; RV32I-NEXT: mv t3, s4
+; RV32I-NEXT: beq t1, s2, .LBB12_53
; RV32I-NEXT: .LBB12_66:
-; RV32I-NEXT: mv t6, a4
-; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: beq t1, s3, .LBB12_51
+; RV32I-NEXT: mv t3, s6
+; RV32I-NEXT: li s6, 0
+; RV32I-NEXT: beq t1, s3, .LBB12_54
; RV32I-NEXT: .LBB12_67:
-; RV32I-NEXT: mv a4, t6
-; RV32I-NEXT: bnez a7, .LBB12_52
-; RV32I-NEXT: j .LBB12_53
+; RV32I-NEXT: mv s6, t3
+; RV32I-NEXT: beqz a7, .LBB12_55
; RV32I-NEXT: .LBB12_68:
-; RV32I-NEXT: or a4, t4, a4
-; RV32I-NEXT: .LBB12_69:
-; RV32I-NEXT: li t4, 3
+; RV32I-NEXT: sw s6, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: li t3, 0
-; RV32I-NEXT: bnez t5, .LBB12_84
-; RV32I-NEXT: # %bb.70:
-; RV32I-NEXT: beq t1, s0, .LBB12_85
+; RV32I-NEXT: beqz t5, .LBB12_56
+; RV32I-NEXT: .LBB12_69:
+; RV32I-NEXT: mv t3, s8
+; RV32I-NEXT: bnez t1, .LBB12_57
+; RV32I-NEXT: .LBB12_70:
+; RV32I-NEXT: or s6, t6, t3
; RV32I-NEXT: .LBB12_71:
+; RV32I-NEXT: li t6, 3
; RV32I-NEXT: li t3, 0
; RV32I-NEXT: bnez t5, .LBB12_86
-; RV32I-NEXT: .LBB12_72:
-; RV32I-NEXT: beq t1, t6, .LBB12_87
+; RV32I-NEXT: # %bb.72:
+; RV32I-NEXT: beq t1, s10, .LBB12_87
; RV32I-NEXT: .LBB12_73:
; RV32I-NEXT: li t3, 0
; RV32I-NEXT: bnez t5, .LBB12_88
; RV32I-NEXT: .LBB12_74:
-; RV32I-NEXT: beq t1, t4, .LBB12_89
+; RV32I-NEXT: beq t1, s11, .LBB12_89
; RV32I-NEXT: .LBB12_75:
; RV32I-NEXT: li t3, 0
; RV32I-NEXT: bnez t5, .LBB12_90
; RV32I-NEXT: .LBB12_76:
-; RV32I...
[truncated]
|
; RV32I-NEXT: .LBB12_206: | ||
; RV32I-NEXT: mv t3, t4 | ||
; RV32I-NEXT: bnez a7, .LBB12_189 | ||
; RV32I-NEXT: j .LBB12_190 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code got quite a bit longer. Is it better?
; RV32I-NEXT: .LBB13_206: | ||
; RV32I-NEXT: mv t3, t4 | ||
; RV32I-NEXT: bnez a7, .LBB13_189 | ||
; RV32I-NEXT: j .LBB13_190 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer
Here are dyn instcount diffs for an rva22 build of SPEC 2017:
Looking at the static assembly diff it is large due to lots of very tiny regalloc changes. The obvious outlier is mcf, which I'll need to report back on after having a closer look. |
Some quick static results of this on llvm-test-suite, -march=rva23u64 -O3:
And SPEC CPU 2017:
Overall seems to be an improvement but I'm definitely surprised to see that some cases have an increase in number of reloads. The results in 505.mcf_r match @asb's dynamic results. Would be good to get to the bottom of that. |
I spent some time having a closer look. There's a very specific hot block in spec_qsort that gets a move and a negate that seems to account for a good chunk of dynamic instcount diff: New:
vs old:
I'll get a minimal reproducer so we can decide whether to put this down to bad luck or something we can address in the context of this patch. |
This is a follow up to the recent infrastructure work for to generally support non-trivial rematerialization. This is the first in a small series to enable non-trivially agressively for the RISC-V backend. It deliberately avoids both vector instructions and loads as those seem most likely to expose unexpected interactions.
Note that this isn't ready to land just yet. We need to collect both compile time (in progress), and more perf numbers/stats on at least e.g. spec2017/test-suite. I'm posting it mostly as a placeholder since multiple people were talking about this and I want us to avoid duplicating work.