-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Remove setJumpIsExpensive(). #74647
Conversation
Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and ||. Using i1 and/or or logical select instructions and a single branch. SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive. RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better. If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary. PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive. I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets. Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesMiddle end up optimizations can speculate away the short circuit behavior of C/C++ && and ||. Using i1 and/or or logical select instructions and a single branch. SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive. RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better. If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary. PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive. I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets. Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true. Patch is 41.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74647.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f2ec422b54a92..b06f9b35f7314 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1363,9 +1363,6 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setPrefFunctionAlignment(Subtarget.getPrefFunctionAlignment());
setPrefLoopAlignment(Subtarget.getPrefLoopAlignment());
- // Jumps are expensive, compared to logic
- setJumpIsExpensive();
-
setTargetDAGCombine({ISD::INTRINSIC_VOID, ISD::INTRINSIC_W_CHAIN,
ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});
diff --git a/llvm/test/CodeGen/RISCV/double-previous-failure.ll b/llvm/test/CodeGen/RISCV/double-previous-failure.ll
index d9d3442043a90..aec27b58e1fc4 100644
--- a/llvm/test/CodeGen/RISCV/double-previous-failure.ll
+++ b/llvm/test/CodeGen/RISCV/double-previous-failure.ll
@@ -31,16 +31,17 @@ define i32 @main() nounwind {
; RV32IFD-NEXT: fld fa5, 0(sp)
; RV32IFD-NEXT: lui a0, %hi(.LCPI1_0)
; RV32IFD-NEXT: fld fa4, %lo(.LCPI1_0)(a0)
-; RV32IFD-NEXT: lui a0, %hi(.LCPI1_1)
-; RV32IFD-NEXT: fld fa3, %lo(.LCPI1_1)(a0)
; RV32IFD-NEXT: flt.d a0, fa5, fa4
-; RV32IFD-NEXT: flt.d a1, fa3, fa5
-; RV32IFD-NEXT: or a0, a0, a1
-; RV32IFD-NEXT: beqz a0, .LBB1_2
-; RV32IFD-NEXT: # %bb.1: # %if.then
-; RV32IFD-NEXT: call abort@plt
-; RV32IFD-NEXT: .LBB1_2: # %if.end
+; RV32IFD-NEXT: bnez a0, .LBB1_3
+; RV32IFD-NEXT: # %bb.1: # %entry
+; RV32IFD-NEXT: lui a0, %hi(.LCPI1_1)
+; RV32IFD-NEXT: fld fa4, %lo(.LCPI1_1)(a0)
+; RV32IFD-NEXT: flt.d a0, fa4, fa5
+; RV32IFD-NEXT: bnez a0, .LBB1_3
+; RV32IFD-NEXT: # %bb.2: # %if.end
; RV32IFD-NEXT: call exit@plt
+; RV32IFD-NEXT: .LBB1_3: # %if.then
+; RV32IFD-NEXT: call abort@plt
;
; RV32IZFINXZDINX-LABEL: main:
; RV32IZFINXZDINX: # %bb.0: # %entry
@@ -56,17 +57,18 @@ define i32 @main() nounwind {
; RV32IZFINXZDINX-NEXT: lui a2, %hi(.LCPI1_0)
; RV32IZFINXZDINX-NEXT: lw a3, %lo(.LCPI1_0+4)(a2)
; RV32IZFINXZDINX-NEXT: lw a2, %lo(.LCPI1_0)(a2)
-; RV32IZFINXZDINX-NEXT: lui a4, %hi(.LCPI1_1)
-; RV32IZFINXZDINX-NEXT: lw a5, %lo(.LCPI1_1+4)(a4)
-; RV32IZFINXZDINX-NEXT: lw a4, %lo(.LCPI1_1)(a4)
; RV32IZFINXZDINX-NEXT: flt.d a2, a0, a2
-; RV32IZFINXZDINX-NEXT: flt.d a0, a4, a0
-; RV32IZFINXZDINX-NEXT: or a0, a2, a0
-; RV32IZFINXZDINX-NEXT: beqz a0, .LBB1_2
-; RV32IZFINXZDINX-NEXT: # %bb.1: # %if.then
-; RV32IZFINXZDINX-NEXT: call abort@plt
-; RV32IZFINXZDINX-NEXT: .LBB1_2: # %if.end
+; RV32IZFINXZDINX-NEXT: bnez a2, .LBB1_3
+; RV32IZFINXZDINX-NEXT: # %bb.1: # %entry
+; RV32IZFINXZDINX-NEXT: lui a2, %hi(.LCPI1_1)
+; RV32IZFINXZDINX-NEXT: lw a3, %lo(.LCPI1_1+4)(a2)
+; RV32IZFINXZDINX-NEXT: lw a2, %lo(.LCPI1_1)(a2)
+; RV32IZFINXZDINX-NEXT: flt.d a0, a2, a0
+; RV32IZFINXZDINX-NEXT: bnez a0, .LBB1_3
+; RV32IZFINXZDINX-NEXT: # %bb.2: # %if.end
; RV32IZFINXZDINX-NEXT: call exit@plt
+; RV32IZFINXZDINX-NEXT: .LBB1_3: # %if.then
+; RV32IZFINXZDINX-NEXT: call abort@plt
entry:
%call = call double @test(double 2.000000e+00)
%cmp = fcmp olt double %call, 2.400000e-01
diff --git a/llvm/test/CodeGen/RISCV/select-and.ll b/llvm/test/CodeGen/RISCV/select-and.ll
index 51f411bd96548..5ba489043f0b4 100644
--- a/llvm/test/CodeGen/RISCV/select-and.ll
+++ b/llvm/test/CodeGen/RISCV/select-and.ll
@@ -40,14 +40,15 @@ define signext i32 @if_of_and(i1 zeroext %a, i1 zeroext %b) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: addi sp, sp, -16
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT: and a0, a0, a1
-; RV32I-NEXT: beqz a0, .LBB1_2
-; RV32I-NEXT: # %bb.1: # %if.then
+; RV32I-NEXT: beqz a0, .LBB1_3
+; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: beqz a1, .LBB1_3
+; RV32I-NEXT: # %bb.2: # %if.then
; RV32I-NEXT: call both@plt
-; RV32I-NEXT: j .LBB1_3
-; RV32I-NEXT: .LBB1_2: # %if.else
+; RV32I-NEXT: j .LBB1_4
+; RV32I-NEXT: .LBB1_3: # %if.else
; RV32I-NEXT: call neither@plt
-; RV32I-NEXT: .LBB1_3: # %if.end
+; RV32I-NEXT: .LBB1_4: # %if.end
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
@@ -56,14 +57,15 @@ define signext i32 @if_of_and(i1 zeroext %a, i1 zeroext %b) nounwind {
; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
-; RV64I-NEXT: and a0, a0, a1
-; RV64I-NEXT: beqz a0, .LBB1_2
-; RV64I-NEXT: # %bb.1: # %if.then
+; RV64I-NEXT: beqz a0, .LBB1_3
+; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: beqz a1, .LBB1_3
+; RV64I-NEXT: # %bb.2: # %if.then
; RV64I-NEXT: call both@plt
-; RV64I-NEXT: j .LBB1_3
-; RV64I-NEXT: .LBB1_2: # %if.else
+; RV64I-NEXT: j .LBB1_4
+; RV64I-NEXT: .LBB1_3: # %if.else
; RV64I-NEXT: call neither@plt
-; RV64I-NEXT: .LBB1_3: # %if.end
+; RV64I-NEXT: .LBB1_4: # %if.end
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/select-or.ll b/llvm/test/CodeGen/RISCV/select-or.ll
index 9ba284c611a6b..d378bb4dd563d 100644
--- a/llvm/test/CodeGen/RISCV/select-or.ll
+++ b/llvm/test/CodeGen/RISCV/select-or.ll
@@ -40,14 +40,15 @@ define signext i32 @if_of_or(i1 zeroext %a, i1 zeroext %b) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: addi sp, sp, -16
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT: or a0, a0, a1
-; RV32I-NEXT: beqz a0, .LBB1_2
-; RV32I-NEXT: # %bb.1: # %if.then
-; RV32I-NEXT: call either@plt
-; RV32I-NEXT: j .LBB1_3
-; RV32I-NEXT: .LBB1_2: # %if.else
+; RV32I-NEXT: bnez a0, .LBB1_3
+; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: bnez a1, .LBB1_3
+; RV32I-NEXT: # %bb.2: # %if.else
; RV32I-NEXT: call neither@plt
-; RV32I-NEXT: .LBB1_3: # %if.end
+; RV32I-NEXT: j .LBB1_4
+; RV32I-NEXT: .LBB1_3: # %if.then
+; RV32I-NEXT: call either@plt
+; RV32I-NEXT: .LBB1_4: # %if.end
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
@@ -56,14 +57,15 @@ define signext i32 @if_of_or(i1 zeroext %a, i1 zeroext %b) nounwind {
; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
-; RV64I-NEXT: or a0, a0, a1
-; RV64I-NEXT: beqz a0, .LBB1_2
-; RV64I-NEXT: # %bb.1: # %if.then
-; RV64I-NEXT: call either@plt
-; RV64I-NEXT: j .LBB1_3
-; RV64I-NEXT: .LBB1_2: # %if.else
+; RV64I-NEXT: bnez a0, .LBB1_3
+; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: bnez a1, .LBB1_3
+; RV64I-NEXT: # %bb.2: # %if.else
; RV64I-NEXT: call neither@plt
-; RV64I-NEXT: .LBB1_3: # %if.end
+; RV64I-NEXT: j .LBB1_4
+; RV64I-NEXT: .LBB1_3: # %if.then
+; RV64I-NEXT: call either@plt
+; RV64I-NEXT: .LBB1_4: # %if.end
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/setcc-logic.ll b/llvm/test/CodeGen/RISCV/setcc-logic.ll
index 6a6c1bcd8dec1..14e76d1ff1728 100644
--- a/llvm/test/CodeGen/RISCV/setcc-logic.ll
+++ b/llvm/test/CodeGen/RISCV/setcc-logic.ll
@@ -298,26 +298,22 @@ declare void @bar(...)
define void @and_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_sge_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB13_2
+; RV32I-NEXT: blt a0, a1, .LBB13_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: bne a2, a3, .LBB13_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB13_2:
+; RV32I-NEXT: .LBB13_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_sge_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB13_2
+; RV64I-NEXT: blt a0, a1, .LBB13_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: bne a2, a3, .LBB13_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB13_2:
+; RV64I-NEXT: .LBB13_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp sge i32 %0, %1
%6 = icmp eq i32 %2, %3
@@ -335,26 +331,22 @@ define void @and_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_sle_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB14_2
+; RV32I-NEXT: blt a1, a0, .LBB14_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: bne a2, a3, .LBB14_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB14_2:
+; RV32I-NEXT: .LBB14_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_sle_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a1, a0
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB14_2
+; RV64I-NEXT: blt a1, a0, .LBB14_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: bne a2, a3, .LBB14_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB14_2:
+; RV64I-NEXT: .LBB14_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp sle i32 %0, %1
%6 = icmp eq i32 %2, %3
@@ -372,26 +364,22 @@ define void @and_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_uge_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB15_2
+; RV32I-NEXT: bltu a0, a1, .LBB15_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: bne a2, a3, .LBB15_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB15_2:
+; RV32I-NEXT: .LBB15_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_uge_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: sltu a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB15_2
+; RV64I-NEXT: bltu a0, a1, .LBB15_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: bne a2, a3, .LBB15_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB15_2:
+; RV64I-NEXT: .LBB15_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp uge i32 %0, %1
%6 = icmp eq i32 %2, %3
@@ -409,26 +397,22 @@ define void @and_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_ule_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB16_2
+; RV32I-NEXT: bltu a1, a0, .LBB16_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: bne a2, a3, .LBB16_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB16_2:
+; RV32I-NEXT: .LBB16_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_ule_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: sltu a0, a1, a0
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB16_2
+; RV64I-NEXT: bltu a1, a0, .LBB16_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: bne a2, a3, .LBB16_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB16_2:
+; RV64I-NEXT: .LBB16_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp ule i32 %0, %1
%6 = icmp eq i32 %2, %3
@@ -446,26 +430,22 @@ define void @and_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_sge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_sge_ne:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: seqz a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB17_2
+; RV32I-NEXT: blt a0, a1, .LBB17_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: beq a2, a3, .LBB17_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB17_2:
+; RV32I-NEXT: .LBB17_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_sge_ne:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: seqz a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB17_2
+; RV64I-NEXT: blt a0, a1, .LBB17_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: beq a2, a3, .LBB17_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB17_2:
+; RV64I-NEXT: .LBB17_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp sge i32 %0, %1
%6 = icmp ne i32 %2, %3
@@ -483,26 +463,22 @@ define void @and_sge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_sle_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_sle_ne:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: seqz a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB18_2
+; RV32I-NEXT: blt a1, a0, .LBB18_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: beq a2, a3, .LBB18_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB18_2:
+; RV32I-NEXT: .LBB18_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_sle_ne:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a1, a0
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: seqz a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB18_2
+; RV64I-NEXT: blt a1, a0, .LBB18_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: beq a2, a3, .LBB18_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB18_2:
+; RV64I-NEXT: .LBB18_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp sle i32 %0, %1
%6 = icmp ne i32 %2, %3
@@ -520,26 +496,22 @@ define void @and_sle_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_uge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_uge_ne:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: seqz a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB19_2
+; RV32I-NEXT: bltu a0, a1, .LBB19_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: beq a2, a3, .LBB19_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB19_2:
+; RV32I-NEXT: .LBB19_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_uge_ne:
; RV64I: # %bb.0:
-; RV64I-NEXT: sltu a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: seqz a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB19_2
+; RV64I-NEXT: bltu a0, a1, .LBB19_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: beq a2, a3, .LBB19_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB19_2:
+; RV64I-NEXT: .LBB19_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp uge i32 %0, %1
%6 = icmp ne i32 %2, %3
@@ -557,26 +529,22 @@ define void @and_uge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @and_ule_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: and_ule_ne:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: seqz a1, a2
-; RV32I-NEXT: or a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB20_2
+; RV32I-NEXT: bltu a1, a0, .LBB20_3
; RV32I-NEXT: # %bb.1:
+; RV32I-NEXT: beq a2, a3, .LBB20_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB20_2:
+; RV32I-NEXT: .LBB20_3:
; RV32I-NEXT: tail bar@plt
;
; RV64I-LABEL: and_ule_ne:
; RV64I: # %bb.0:
-; RV64I-NEXT: sltu a0, a1, a0
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: seqz a1, a2
-; RV64I-NEXT: or a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB20_2
+; RV64I-NEXT: bltu a1, a0, .LBB20_3
; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: beq a2, a3, .LBB20_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB20_2:
+; RV64I-NEXT: .LBB20_3:
; RV64I-NEXT: tail bar@plt
%5 = icmp ule i32 %0, %1
%6 = icmp ne i32 %2, %3
@@ -594,27 +562,23 @@ define void @and_ule_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
define void @or_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: or_sge_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: and a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB21_2
+; RV32I-NEXT: bge a0, a1, .LBB21_3
; RV32I-NEXT: # %bb.1:
-; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB21_2:
+; RV32I-NEXT: beq a2, a3, .LBB21_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: tail bar@plt
+; RV32I-NEXT: .LBB21_3:
+; RV32I-NEXT: ret
;
; RV64I-LABEL: or_sge_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: and a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB21_2
+; RV64I-NEXT: bge a0, a1, .LBB21_3
; RV64I-NEXT: # %bb.1:
-; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB21_2:
+; RV64I-NEXT: beq a2, a3, .LBB21_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: tail bar@plt
+; RV64I-NEXT: .LBB21_3:
+; RV64I-NEXT: ret
%5 = icmp sge i32 %0, %1
%6 = icmp eq i32 %2, %3
%7 = or i1 %5, %6
@@ -631,27 +595,23 @@ define void @or_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
define void @or_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: or_sle_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: slt a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: and a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB22_2
+; RV32I-NEXT: bge a1, a0, .LBB22_3
; RV32I-NEXT: # %bb.1:
-; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB22_2:
+; RV32I-NEXT: beq a2, a3, .LBB22_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: tail bar@plt
+; RV32I-NEXT: .LBB22_3:
+; RV32I-NEXT: ret
;
; RV64I-LABEL: or_sle_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: slt a0, a1, a0
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: and a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB22_2
+; RV64I-NEXT: bge a1, a0, .LBB22_3
; RV64I-NEXT: # %bb.1:
-; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB22_2:
+; RV64I-NEXT: beq a2, a3, .LBB22_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: tail bar@plt
+; RV64I-NEXT: .LBB22_3:
+; RV64I-NEXT: ret
%5 = icmp sle i32 %0, %1
%6 = icmp eq i32 %2, %3
%7 = or i1 %5, %6
@@ -668,27 +628,23 @@ define void @or_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
define void @or_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: or_uge_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a0, a1
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: and a0, a1, a0
-; RV32I-NEXT: bnez a0, .LBB23_2
+; RV32I-NEXT: bgeu a0, a1, .LBB23_3
; RV32I-NEXT: # %bb.1:
-; RV32I-NEXT: ret
-; RV32I-NEXT: .LBB23_2:
+; RV32I-NEXT: beq a2, a3, .LBB23_3
+; RV32I-NEXT: # %bb.2:
; RV32I-NEXT: tail bar@plt
+; RV32I-NEXT: .LBB23_3:
+; RV32I-NEXT: ret
;
; RV64I-LABEL: or_uge_eq:
; RV64I: # %bb.0:
-; RV64I-NEXT: sltu a0, a0, a1
-; RV64I-NEXT: xor a2, a2, a3
-; RV64I-NEXT: snez a1, a2
-; RV64I-NEXT: and a0, a1, a0
-; RV64I-NEXT: bnez a0, .LBB23_2
+; RV64I-NEXT: bgeu a0, a1, .LBB23_3
; RV64I-NEXT: # %bb.1:
-; RV64I-NEXT: ret
-; RV64I-NEXT: .LBB23_2:
+; RV64I-NEXT: beq a2, a3, .LBB23_3
+; RV64I-NEXT: # %bb.2:
; RV64I-NEXT: tail bar@plt
+; RV64I-NEXT: .LBB23_3:
+; RV64I-NEXT: ret
%5 = icmp uge i32 %0, %1
%6 = icmp eq i32 %2, %3
%7 = or i1 %5, %6
@@ -705,27 +661,23 @@ define void @or_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
define void @or_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
; RV32I-LABEL: or_ule_eq:
; RV32I: # %bb.0:
-; RV32I-NEXT: sltu a0, a1, a0
-; RV32I-NEXT: xor a2, a2, a3
-; RV32I-NEXT: snez a1, a2
-; RV32I-NEXT: an...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I suspect we're going to want a more fine grained decision here, but I'm fine flipping the default in the meantime.
can it be considered in talks in https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040 systematically? |
How about adding it to setJumpIsExpensive(Subtarget.isJumpExpensive()) So that processors can tune it. |
Do you have a target where you actively want to support that configuration? I agree this is something that could be made configurable, but we're probably best off focusing our efforts around a smaller set of configuration options unless we know there's a case where we really need to diverge. |
@topperc how does this look for you on e.g. SPEC? Running it across the GCC torture suite it's a very common pattern to have a long sequence of --- a/output_rv64imafdc_lp64d_O3/20120808-1.s
+++ b/output_rv64imafdc_lp64d_O3/20120808-1.s
@@ -63,33 +63,37 @@ main: # @main
j .LBB0_2
.LBB0_8: # %for.end
lbu a0, 8(sp)
- lbu a1, 9(sp)
- xori a0, a0, 255
- lbu a2, 10(sp)
- lbu a3, 11(sp)
- xori a1, a1, 253
- or a0, a0, a1
- xori a1, a2, 251
- xori a2, a3, 255
- lbu a3, 12(sp)
- lbu a4, 33(sp)
- or a1, a1, a2
- or a0, a0, a1
- xori a1, a3, 255
- xori a2, a4, 254
- or a1, a1, a2
- or a0, a0, a1
- bnez a0, .LBB0_11
-# %bb.9: # %lor.lhs.false34
+ li a1, 255
+ bne a0, a1, .LBB0_16
+# %bb.9: # %for.end
+ lbu a0, 9(sp)
+ li a1, 253
+ bne a0, a1, .LBB0_16
+# %bb.10: # %for.end
+ lbu a0, 10(sp)
+ li a1, 251
+ bne a0, a1, .LBB0_16
+# %bb.11: # %for.end
+ lbu a1, 11(sp)
+ li a0, 255
+ bne a1, a0, .LBB0_16
+# %bb.12: # %for.end
+ lbu a1, 12(sp)
+ bne a1, a0, .LBB0_16
+# %bb.13: # %for.end
+ lbu a0, 33(sp)
+ li a1, 254
+ bne a0, a1, .LBB0_16
+# %bb.14: # %lor.lhs.false34
lui a0, %hi(cp)
ld a0, %lo(cp)(a0)
lui a1, %hi(d+30)
addi a1, a1, %lo(d+30)
- bne a0, a1, .LBB0_11
-# %bb.10: # %if.end
+ bne a0, a1, .LBB0_16
+# %bb.15: # %if.end
li a0, 0
call exit
-.LBB0_11: # %if.then
+.LBB0_16: # %if.then
call abort
.Lfunc_end0: But of course I wouldn't try to argue the torture suite inputs represent typical code, and of course the tradeoff is different if one of those branches may be taken. Another example would be: -- a/output_rv64imafdc_lp64d_O3/20000622-1.s
+++ b/output_rv64imafdc_lp64d_O3/20000622-1.s
@@ -8,15 +8,18 @@
foo: # @foo
.cfi_startproc
# %bb.0: # %entry
- xori a0, a0, 12
- xori a1, a1, 1
- or a0, a0, a1
- xori a1, a2, 11
- or a0, a0, a1
- bnez a0, .LBB0_2
-# %bb.1: # %if.end
+ li a3, 12
+ bne a0, a3, .LBB0_4
+# %bb.1: # %entry
+ li a0, 1
+ bne a1, a0, .LBB0_4
+# %bb.2: # %entry
+ li a0, 11
+ bne a2, a0, .LBB0_4
+# %bb.3: # %if.end
+ li a0, 0
ret
-.LBB0_2: # %if.then
+.LBB0_4: # %if.then
addi sp, sp, -16
.cfi_def_cfa_offset 16
sd ra, 8(sp) # 8-byte Folded Spill
@@ -44,13 +47,14 @@ bar: # @bar
baz: # @baz
.cfi_startproc
# %bb.0: # %entry
- xori a0, a2, 12
- xori a1, a1, 11
- or a0, a0, a1
- bnez a0, .LBB2_2
-# %bb.1: # %foo.exit
+ li a0, 11
+ bne a1, a0, .LBB2_3
+# %bb.1: # %entry
+ li a0, 12
+ bne a2, a0, .LBB2_3
+# %bb.2: # %foo.exit
ret
-.LBB2_2: # %if.then.i
+.LBB2_3: # %if.then.i Which especially for the diff in the baz function makes me wonder if there's some other tunable somewhere that might control this. |
No, I don't have actually, but some downstream optimizations are based on the assumption that we have |
Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and ||. Using i1 and/or or logical select instructions and a single branch.
SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive.
RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better.
If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary.
PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive.
I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets.
Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.