Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Remove setJumpIsExpensive(). #74647

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Dec 6, 2023

Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and ||. Using i1 and/or or logical select instructions and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.

Middle end up optimizations can speculate away the short circuit
behavior of C/C++ && and ||. Using i1 and/or or logical select instructions
and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple
branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which
makes us better than other targets that use a flag register. RISC-V
also has single instruction compare and branch. So its not clear
from a code size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic
delays the branch resolution until all the loads are resolved even if
there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive.
NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears
to have a MachineIR pass that turns AND/OR of CR bits into multiple
branches. I don't know anything about Lanai and their reason for using
setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced
than this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2023

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

Middle end up optimizations can speculate away the short circuit behavior of C/C++ && and ||. Using i1 and/or or logical select instructions and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which makes us better than other targets that use a flag register. RISC-V also has single instruction compare and branch. So its not clear from a code size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic delays the branch resolution until all the loads are resolved even if there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive. NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears to have a MachineIR pass that turns AND/OR of CR bits into multiple branches. I don't know anything about Lanai and their reason for using setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced than this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm -jump-is-expensive=true.


Patch is 41.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74647.diff

6 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (-3)
  • (modified) llvm/test/CodeGen/RISCV/double-previous-failure.ll (+19-17)
  • (modified) llvm/test/CodeGen/RISCV/select-and.ll (+14-12)
  • (modified) llvm/test/CodeGen/RISCV/select-or.ll (+16-14)
  • (modified) llvm/test/CodeGen/RISCV/setcc-logic.ll (+244-348)
  • (modified) llvm/test/CodeGen/RISCV/zext-with-load-is-free.ll (+23-18)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f2ec422b54a92..b06f9b35f7314 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1363,9 +1363,6 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
   setPrefFunctionAlignment(Subtarget.getPrefFunctionAlignment());
   setPrefLoopAlignment(Subtarget.getPrefLoopAlignment());
 
-  // Jumps are expensive, compared to logic
-  setJumpIsExpensive();
-
   setTargetDAGCombine({ISD::INTRINSIC_VOID, ISD::INTRINSIC_W_CHAIN,
                        ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
                        ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});
diff --git a/llvm/test/CodeGen/RISCV/double-previous-failure.ll b/llvm/test/CodeGen/RISCV/double-previous-failure.ll
index d9d3442043a90..aec27b58e1fc4 100644
--- a/llvm/test/CodeGen/RISCV/double-previous-failure.ll
+++ b/llvm/test/CodeGen/RISCV/double-previous-failure.ll
@@ -31,16 +31,17 @@ define i32 @main() nounwind {
 ; RV32IFD-NEXT:    fld fa5, 0(sp)
 ; RV32IFD-NEXT:    lui a0, %hi(.LCPI1_0)
 ; RV32IFD-NEXT:    fld fa4, %lo(.LCPI1_0)(a0)
-; RV32IFD-NEXT:    lui a0, %hi(.LCPI1_1)
-; RV32IFD-NEXT:    fld fa3, %lo(.LCPI1_1)(a0)
 ; RV32IFD-NEXT:    flt.d a0, fa5, fa4
-; RV32IFD-NEXT:    flt.d a1, fa3, fa5
-; RV32IFD-NEXT:    or a0, a0, a1
-; RV32IFD-NEXT:    beqz a0, .LBB1_2
-; RV32IFD-NEXT:  # %bb.1: # %if.then
-; RV32IFD-NEXT:    call abort@plt
-; RV32IFD-NEXT:  .LBB1_2: # %if.end
+; RV32IFD-NEXT:    bnez a0, .LBB1_3
+; RV32IFD-NEXT:  # %bb.1: # %entry
+; RV32IFD-NEXT:    lui a0, %hi(.LCPI1_1)
+; RV32IFD-NEXT:    fld fa4, %lo(.LCPI1_1)(a0)
+; RV32IFD-NEXT:    flt.d a0, fa4, fa5
+; RV32IFD-NEXT:    bnez a0, .LBB1_3
+; RV32IFD-NEXT:  # %bb.2: # %if.end
 ; RV32IFD-NEXT:    call exit@plt
+; RV32IFD-NEXT:  .LBB1_3: # %if.then
+; RV32IFD-NEXT:    call abort@plt
 ;
 ; RV32IZFINXZDINX-LABEL: main:
 ; RV32IZFINXZDINX:       # %bb.0: # %entry
@@ -56,17 +57,18 @@ define i32 @main() nounwind {
 ; RV32IZFINXZDINX-NEXT:    lui a2, %hi(.LCPI1_0)
 ; RV32IZFINXZDINX-NEXT:    lw a3, %lo(.LCPI1_0+4)(a2)
 ; RV32IZFINXZDINX-NEXT:    lw a2, %lo(.LCPI1_0)(a2)
-; RV32IZFINXZDINX-NEXT:    lui a4, %hi(.LCPI1_1)
-; RV32IZFINXZDINX-NEXT:    lw a5, %lo(.LCPI1_1+4)(a4)
-; RV32IZFINXZDINX-NEXT:    lw a4, %lo(.LCPI1_1)(a4)
 ; RV32IZFINXZDINX-NEXT:    flt.d a2, a0, a2
-; RV32IZFINXZDINX-NEXT:    flt.d a0, a4, a0
-; RV32IZFINXZDINX-NEXT:    or a0, a2, a0
-; RV32IZFINXZDINX-NEXT:    beqz a0, .LBB1_2
-; RV32IZFINXZDINX-NEXT:  # %bb.1: # %if.then
-; RV32IZFINXZDINX-NEXT:    call abort@plt
-; RV32IZFINXZDINX-NEXT:  .LBB1_2: # %if.end
+; RV32IZFINXZDINX-NEXT:    bnez a2, .LBB1_3
+; RV32IZFINXZDINX-NEXT:  # %bb.1: # %entry
+; RV32IZFINXZDINX-NEXT:    lui a2, %hi(.LCPI1_1)
+; RV32IZFINXZDINX-NEXT:    lw a3, %lo(.LCPI1_1+4)(a2)
+; RV32IZFINXZDINX-NEXT:    lw a2, %lo(.LCPI1_1)(a2)
+; RV32IZFINXZDINX-NEXT:    flt.d a0, a2, a0
+; RV32IZFINXZDINX-NEXT:    bnez a0, .LBB1_3
+; RV32IZFINXZDINX-NEXT:  # %bb.2: # %if.end
 ; RV32IZFINXZDINX-NEXT:    call exit@plt
+; RV32IZFINXZDINX-NEXT:  .LBB1_3: # %if.then
+; RV32IZFINXZDINX-NEXT:    call abort@plt
 entry:
   %call = call double @test(double 2.000000e+00)
   %cmp = fcmp olt double %call, 2.400000e-01
diff --git a/llvm/test/CodeGen/RISCV/select-and.ll b/llvm/test/CodeGen/RISCV/select-and.ll
index 51f411bd96548..5ba489043f0b4 100644
--- a/llvm/test/CodeGen/RISCV/select-and.ll
+++ b/llvm/test/CodeGen/RISCV/select-and.ll
@@ -40,14 +40,15 @@ define signext i32 @if_of_and(i1 zeroext %a, i1 zeroext %b) nounwind {
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    addi sp, sp, -16
 ; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    and a0, a0, a1
-; RV32I-NEXT:    beqz a0, .LBB1_2
-; RV32I-NEXT:  # %bb.1: # %if.then
+; RV32I-NEXT:    beqz a0, .LBB1_3
+; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    beqz a1, .LBB1_3
+; RV32I-NEXT:  # %bb.2: # %if.then
 ; RV32I-NEXT:    call both@plt
-; RV32I-NEXT:    j .LBB1_3
-; RV32I-NEXT:  .LBB1_2: # %if.else
+; RV32I-NEXT:    j .LBB1_4
+; RV32I-NEXT:  .LBB1_3: # %if.else
 ; RV32I-NEXT:    call neither@plt
-; RV32I-NEXT:  .LBB1_3: # %if.end
+; RV32I-NEXT:  .LBB1_4: # %if.end
 ; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
@@ -56,14 +57,15 @@ define signext i32 @if_of_and(i1 zeroext %a, i1 zeroext %b) nounwind {
 ; RV64I:       # %bb.0:
 ; RV64I-NEXT:    addi sp, sp, -16
 ; RV64I-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    and a0, a0, a1
-; RV64I-NEXT:    beqz a0, .LBB1_2
-; RV64I-NEXT:  # %bb.1: # %if.then
+; RV64I-NEXT:    beqz a0, .LBB1_3
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    beqz a1, .LBB1_3
+; RV64I-NEXT:  # %bb.2: # %if.then
 ; RV64I-NEXT:    call both@plt
-; RV64I-NEXT:    j .LBB1_3
-; RV64I-NEXT:  .LBB1_2: # %if.else
+; RV64I-NEXT:    j .LBB1_4
+; RV64I-NEXT:  .LBB1_3: # %if.else
 ; RV64I-NEXT:    call neither@plt
-; RV64I-NEXT:  .LBB1_3: # %if.end
+; RV64I-NEXT:  .LBB1_4: # %if.end
 ; RV64I-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/select-or.ll b/llvm/test/CodeGen/RISCV/select-or.ll
index 9ba284c611a6b..d378bb4dd563d 100644
--- a/llvm/test/CodeGen/RISCV/select-or.ll
+++ b/llvm/test/CodeGen/RISCV/select-or.ll
@@ -40,14 +40,15 @@ define signext i32 @if_of_or(i1 zeroext %a, i1 zeroext %b) nounwind {
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    addi sp, sp, -16
 ; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    or a0, a0, a1
-; RV32I-NEXT:    beqz a0, .LBB1_2
-; RV32I-NEXT:  # %bb.1: # %if.then
-; RV32I-NEXT:    call either@plt
-; RV32I-NEXT:    j .LBB1_3
-; RV32I-NEXT:  .LBB1_2: # %if.else
+; RV32I-NEXT:    bnez a0, .LBB1_3
+; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    bnez a1, .LBB1_3
+; RV32I-NEXT:  # %bb.2: # %if.else
 ; RV32I-NEXT:    call neither@plt
-; RV32I-NEXT:  .LBB1_3: # %if.end
+; RV32I-NEXT:    j .LBB1_4
+; RV32I-NEXT:  .LBB1_3: # %if.then
+; RV32I-NEXT:    call either@plt
+; RV32I-NEXT:  .LBB1_4: # %if.end
 ; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
@@ -56,14 +57,15 @@ define signext i32 @if_of_or(i1 zeroext %a, i1 zeroext %b) nounwind {
 ; RV64I:       # %bb.0:
 ; RV64I-NEXT:    addi sp, sp, -16
 ; RV64I-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    or a0, a0, a1
-; RV64I-NEXT:    beqz a0, .LBB1_2
-; RV64I-NEXT:  # %bb.1: # %if.then
-; RV64I-NEXT:    call either@plt
-; RV64I-NEXT:    j .LBB1_3
-; RV64I-NEXT:  .LBB1_2: # %if.else
+; RV64I-NEXT:    bnez a0, .LBB1_3
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    bnez a1, .LBB1_3
+; RV64I-NEXT:  # %bb.2: # %if.else
 ; RV64I-NEXT:    call neither@plt
-; RV64I-NEXT:  .LBB1_3: # %if.end
+; RV64I-NEXT:    j .LBB1_4
+; RV64I-NEXT:  .LBB1_3: # %if.then
+; RV64I-NEXT:    call either@plt
+; RV64I-NEXT:  .LBB1_4: # %if.end
 ; RV64I-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/setcc-logic.ll b/llvm/test/CodeGen/RISCV/setcc-logic.ll
index 6a6c1bcd8dec1..14e76d1ff1728 100644
--- a/llvm/test/CodeGen/RISCV/setcc-logic.ll
+++ b/llvm/test/CodeGen/RISCV/setcc-logic.ll
@@ -298,26 +298,22 @@ declare void @bar(...)
 define void @and_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_sge_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB13_2
+; RV32I-NEXT:    blt a0, a1, .LBB13_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    bne a2, a3, .LBB13_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB13_2:
+; RV32I-NEXT:  .LBB13_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_sge_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB13_2
+; RV64I-NEXT:    blt a0, a1, .LBB13_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    bne a2, a3, .LBB13_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB13_2:
+; RV64I-NEXT:  .LBB13_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp sge i32 %0, %1
   %6 = icmp eq i32 %2, %3
@@ -335,26 +331,22 @@ define void @and_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_sle_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB14_2
+; RV32I-NEXT:    blt a1, a0, .LBB14_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    bne a2, a3, .LBB14_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB14_2:
+; RV32I-NEXT:  .LBB14_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_sle_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a1, a0
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB14_2
+; RV64I-NEXT:    blt a1, a0, .LBB14_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    bne a2, a3, .LBB14_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB14_2:
+; RV64I-NEXT:  .LBB14_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp sle i32 %0, %1
   %6 = icmp eq i32 %2, %3
@@ -372,26 +364,22 @@ define void @and_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_uge_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB15_2
+; RV32I-NEXT:    bltu a0, a1, .LBB15_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    bne a2, a3, .LBB15_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB15_2:
+; RV32I-NEXT:  .LBB15_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_uge_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    sltu a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB15_2
+; RV64I-NEXT:    bltu a0, a1, .LBB15_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    bne a2, a3, .LBB15_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB15_2:
+; RV64I-NEXT:  .LBB15_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp uge i32 %0, %1
   %6 = icmp eq i32 %2, %3
@@ -409,26 +397,22 @@ define void @and_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_ule_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB16_2
+; RV32I-NEXT:    bltu a1, a0, .LBB16_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    bne a2, a3, .LBB16_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB16_2:
+; RV32I-NEXT:  .LBB16_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_ule_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    sltu a0, a1, a0
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB16_2
+; RV64I-NEXT:    bltu a1, a0, .LBB16_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    bne a2, a3, .LBB16_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB16_2:
+; RV64I-NEXT:  .LBB16_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp ule i32 %0, %1
   %6 = icmp eq i32 %2, %3
@@ -446,26 +430,22 @@ define void @and_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_sge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_sge_ne:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    seqz a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB17_2
+; RV32I-NEXT:    blt a0, a1, .LBB17_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    beq a2, a3, .LBB17_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB17_2:
+; RV32I-NEXT:  .LBB17_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_sge_ne:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    seqz a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB17_2
+; RV64I-NEXT:    blt a0, a1, .LBB17_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    beq a2, a3, .LBB17_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB17_2:
+; RV64I-NEXT:  .LBB17_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp sge i32 %0, %1
   %6 = icmp ne i32 %2, %3
@@ -483,26 +463,22 @@ define void @and_sge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_sle_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_sle_ne:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    seqz a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB18_2
+; RV32I-NEXT:    blt a1, a0, .LBB18_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    beq a2, a3, .LBB18_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB18_2:
+; RV32I-NEXT:  .LBB18_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_sle_ne:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a1, a0
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    seqz a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB18_2
+; RV64I-NEXT:    blt a1, a0, .LBB18_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    beq a2, a3, .LBB18_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB18_2:
+; RV64I-NEXT:  .LBB18_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp sle i32 %0, %1
   %6 = icmp ne i32 %2, %3
@@ -520,26 +496,22 @@ define void @and_sle_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_uge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_uge_ne:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    seqz a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB19_2
+; RV32I-NEXT:    bltu a0, a1, .LBB19_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    beq a2, a3, .LBB19_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB19_2:
+; RV32I-NEXT:  .LBB19_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_uge_ne:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    sltu a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    seqz a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB19_2
+; RV64I-NEXT:    bltu a0, a1, .LBB19_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    beq a2, a3, .LBB19_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB19_2:
+; RV64I-NEXT:  .LBB19_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp uge i32 %0, %1
   %6 = icmp ne i32 %2, %3
@@ -557,26 +529,22 @@ define void @and_uge_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @and_ule_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: and_ule_ne:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    seqz a1, a2
-; RV32I-NEXT:    or a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB20_2
+; RV32I-NEXT:    bltu a1, a0, .LBB20_3
 ; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    beq a2, a3, .LBB20_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB20_2:
+; RV32I-NEXT:  .LBB20_3:
 ; RV32I-NEXT:    tail bar@plt
 ;
 ; RV64I-LABEL: and_ule_ne:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    sltu a0, a1, a0
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    seqz a1, a2
-; RV64I-NEXT:    or a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB20_2
+; RV64I-NEXT:    bltu a1, a0, .LBB20_3
 ; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    beq a2, a3, .LBB20_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB20_2:
+; RV64I-NEXT:  .LBB20_3:
 ; RV64I-NEXT:    tail bar@plt
   %5 = icmp ule i32 %0, %1
   %6 = icmp ne i32 %2, %3
@@ -594,27 +562,23 @@ define void @and_ule_ne(i32 signext %0, i32 signext %1, i32 signext %2, i32 sign
 define void @or_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: or_sge_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    and a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB21_2
+; RV32I-NEXT:    bge a0, a1, .LBB21_3
 ; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB21_2:
+; RV32I-NEXT:    beq a2, a3, .LBB21_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    tail bar@plt
+; RV32I-NEXT:  .LBB21_3:
+; RV32I-NEXT:    ret
 ;
 ; RV64I-LABEL: or_sge_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    and a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB21_2
+; RV64I-NEXT:    bge a0, a1, .LBB21_3
 ; RV64I-NEXT:  # %bb.1:
-; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB21_2:
+; RV64I-NEXT:    beq a2, a3, .LBB21_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    tail bar@plt
+; RV64I-NEXT:  .LBB21_3:
+; RV64I-NEXT:    ret
   %5 = icmp sge i32 %0, %1
   %6 = icmp eq i32 %2, %3
   %7 = or i1 %5, %6
@@ -631,27 +595,23 @@ define void @or_sge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
 define void @or_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: or_sle_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    slt a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    and a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB22_2
+; RV32I-NEXT:    bge a1, a0, .LBB22_3
 ; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB22_2:
+; RV32I-NEXT:    beq a2, a3, .LBB22_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    tail bar@plt
+; RV32I-NEXT:  .LBB22_3:
+; RV32I-NEXT:    ret
 ;
 ; RV64I-LABEL: or_sle_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    slt a0, a1, a0
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    and a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB22_2
+; RV64I-NEXT:    bge a1, a0, .LBB22_3
 ; RV64I-NEXT:  # %bb.1:
-; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB22_2:
+; RV64I-NEXT:    beq a2, a3, .LBB22_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    tail bar@plt
+; RV64I-NEXT:  .LBB22_3:
+; RV64I-NEXT:    ret
   %5 = icmp sle i32 %0, %1
   %6 = icmp eq i32 %2, %3
   %7 = or i1 %5, %6
@@ -668,27 +628,23 @@ define void @or_sle_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
 define void @or_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: or_uge_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a0, a1
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    and a0, a1, a0
-; RV32I-NEXT:    bnez a0, .LBB23_2
+; RV32I-NEXT:    bgeu a0, a1, .LBB23_3
 ; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB23_2:
+; RV32I-NEXT:    beq a2, a3, .LBB23_3
+; RV32I-NEXT:  # %bb.2:
 ; RV32I-NEXT:    tail bar@plt
+; RV32I-NEXT:  .LBB23_3:
+; RV32I-NEXT:    ret
 ;
 ; RV64I-LABEL: or_uge_eq:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    sltu a0, a0, a1
-; RV64I-NEXT:    xor a2, a2, a3
-; RV64I-NEXT:    snez a1, a2
-; RV64I-NEXT:    and a0, a1, a0
-; RV64I-NEXT:    bnez a0, .LBB23_2
+; RV64I-NEXT:    bgeu a0, a1, .LBB23_3
 ; RV64I-NEXT:  # %bb.1:
-; RV64I-NEXT:    ret
-; RV64I-NEXT:  .LBB23_2:
+; RV64I-NEXT:    beq a2, a3, .LBB23_3
+; RV64I-NEXT:  # %bb.2:
 ; RV64I-NEXT:    tail bar@plt
+; RV64I-NEXT:  .LBB23_3:
+; RV64I-NEXT:    ret
   %5 = icmp uge i32 %0, %1
   %6 = icmp eq i32 %2, %3
   %7 = or i1 %5, %6
@@ -705,27 +661,23 @@ define void @or_uge_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signe
 define void @or_ule_eq(i32 signext %0, i32 signext %1, i32 signext %2, i32 signext %3) {
 ; RV32I-LABEL: or_ule_eq:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    sltu a0, a1, a0
-; RV32I-NEXT:    xor a2, a2, a3
-; RV32I-NEXT:    snez a1, a2
-; RV32I-NEXT:    an...
[truncated]

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I suspect we're going to want a more fine grained decision here, but I'm fine flipping the default in the meantime.

@zixuan-wu
Copy link
Contributor

can it be considered in talks in https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040 systematically?

@wangpc-pp
Copy link
Contributor

How about adding it to RISCVSubtarget and changing it to something like:

setJumpIsExpensive(Subtarget.isJumpExpensive())

So that processors can tune it.

@asb
Copy link
Contributor

asb commented Dec 7, 2023

How about adding it to RISCVSubtarget and changing it to something like:

setJumpIsExpensive(Subtarget.isJumpExpensive())

So that processors can tune it.

Do you have a target where you actively want to support that configuration? I agree this is something that could be made configurable, but we're probably best off focusing our efforts around a smaller set of configuration options unless we know there's a case where we really need to diverge.

@asb
Copy link
Contributor

asb commented Dec 7, 2023

@topperc how does this look for you on e.g. SPEC? Running it across the GCC torture suite it's a very common pattern to have a long sequence of if (foo_expr) abort(); if (foo2_expr) abort(); ... and there are a number of examples where you had a (perhaps longish) block of loads and ALU operations turned into a series of branches. The single BB version seems better if all branches are not taken (i.e. all bne are evaluated).

--- a/output_rv64imafdc_lp64d_O3/20120808-1.s
+++ b/output_rv64imafdc_lp64d_O3/20120808-1.s
@@ -63,33 +63,37 @@ main:                                   # @main
        j       .LBB0_2
 .LBB0_8:                                # %for.end
        lbu     a0, 8(sp)
-       lbu     a1, 9(sp)
-       xori    a0, a0, 255
-       lbu     a2, 10(sp)
-       lbu     a3, 11(sp)
-       xori    a1, a1, 253
-       or      a0, a0, a1
-       xori    a1, a2, 251
-       xori    a2, a3, 255
-       lbu     a3, 12(sp)
-       lbu     a4, 33(sp)
-       or      a1, a1, a2
-       or      a0, a0, a1
-       xori    a1, a3, 255
-       xori    a2, a4, 254
-       or      a1, a1, a2
-       or      a0, a0, a1
-       bnez    a0, .LBB0_11
-# %bb.9:                                # %lor.lhs.false34
+       li      a1, 255
+       bne     a0, a1, .LBB0_16
+# %bb.9:                                # %for.end
+       lbu     a0, 9(sp)
+       li      a1, 253
+       bne     a0, a1, .LBB0_16
+# %bb.10:                               # %for.end
+       lbu     a0, 10(sp)
+       li      a1, 251
+       bne     a0, a1, .LBB0_16
+# %bb.11:                               # %for.end
+       lbu     a1, 11(sp)
+       li      a0, 255
+       bne     a1, a0, .LBB0_16
+# %bb.12:                               # %for.end
+       lbu     a1, 12(sp)
+       bne     a1, a0, .LBB0_16
+# %bb.13:                               # %for.end
+       lbu     a0, 33(sp)
+       li      a1, 254
+       bne     a0, a1, .LBB0_16
+# %bb.14:                               # %lor.lhs.false34
        lui     a0, %hi(cp)
        ld      a0, %lo(cp)(a0)
        lui     a1, %hi(d+30)
        addi    a1, a1, %lo(d+30)
-       bne     a0, a1, .LBB0_11
-# %bb.10:                               # %if.end
+       bne     a0, a1, .LBB0_16
+# %bb.15:                               # %if.end
        li      a0, 0
        call    exit
-.LBB0_11:                               # %if.then
+.LBB0_16:                               # %if.then
        call    abort
 .Lfunc_end0:

But of course I wouldn't try to argue the torture suite inputs represent typical code, and of course the tradeoff is different if one of those branches may be taken.

Another example would be:

-- a/output_rv64imafdc_lp64d_O3/20000622-1.s
+++ b/output_rv64imafdc_lp64d_O3/20000622-1.s
@@ -8,15 +8,18 @@
 foo:                                    # @foo
        .cfi_startproc
 # %bb.0:                                # %entry
-       xori    a0, a0, 12
-       xori    a1, a1, 1
-       or      a0, a0, a1
-       xori    a1, a2, 11
-       or      a0, a0, a1
-       bnez    a0, .LBB0_2
-# %bb.1:                                # %if.end
+       li      a3, 12
+       bne     a0, a3, .LBB0_4
+# %bb.1:                                # %entry
+       li      a0, 1
+       bne     a1, a0, .LBB0_4
+# %bb.2:                                # %entry
+       li      a0, 11
+       bne     a2, a0, .LBB0_4
+# %bb.3:                                # %if.end
+       li      a0, 0
        ret
-.LBB0_2:                                # %if.then
+.LBB0_4:                                # %if.then
        addi    sp, sp, -16
        .cfi_def_cfa_offset 16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
@@ -44,13 +47,14 @@ bar:                                    # @bar
 baz:                                    # @baz
        .cfi_startproc
 # %bb.0:                                # %entry
-       xori    a0, a2, 12
-       xori    a1, a1, 11
-       or      a0, a0, a1
-       bnez    a0, .LBB2_2
-# %bb.1:                                # %foo.exit
+       li      a0, 11
+       bne     a1, a0, .LBB2_3
+# %bb.1:                                # %entry
+       li      a0, 12
+       bne     a2, a0, .LBB2_3
+# %bb.2:                                # %foo.exit
        ret
-.LBB2_2:                                # %if.then.i
+.LBB2_3:                                # %if.then.i

Which especially for the diff in the baz function makes me wonder if there's some other tunable somewhere that might control this.

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Dec 7, 2023

How about adding it to RISCVSubtarget and changing it to something like:

setJumpIsExpensive(Subtarget.isJumpExpensive())

So that processors can tune it.

Do you have a target where you actively want to support that configuration? I agree this is something that could be made configurable, but we're probably best off focusing our efforts around a smaller set of configuration options unless we know there's a case where we really need to diverge.

No, I don't have actually, but some downstream optimizations are based on the assumption that we have setJumpIsExpensive(true). I'm not opposed to this change, it just makes me nervous about the breakage of some previous optimizations.

@topperc topperc merged commit 2c18570 into llvm:main Dec 13, 2023
4 of 5 checks passed
@topperc topperc deleted the pr/jump-is-not-expensive branch December 13, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants