-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV] Set riscv-fpimm-cost threshold to 3 by default #159352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`-riscv-fp-imm-cost` controls the threshold at which the constant pool is used for float constants rather than generating directly (typically into a GPR followed by an `fmv`). The value used for this knob indicates the number of instructions that can be used to produce the value (otherwise we fall back to the constant pool). Upping to to 3 covers a huge number of additional constants (see <llvm#153402>), e.g. most whole numbers which can be generated through lui+shift+fmv. As in general we struggle with efficient code generation for constant pool accesses, reducing the number of constant pool accesses is beneficial. We are typically replacing a two-instruction sequence (which includes a load) with a three instruction sequence (two simple arithmetic operations plus a fmv), which. The CHECK prefixes for various tests had to be updated to avoid conflicts leading to check lines being dropped altogether (see <llvm#159321> for a change to update_llc_test_checks to aid diagnosing this).
@llvm/pr-subscribers-backend-risc-v Author: Alex Bradbury (asb) Changes
The CHECK prefixes for various tests had to be updated to avoid conflicts leading to check lines being dropped altogether (see <#159321> for a change to update_llc_test_checks to aid diagnosing this). Patch is 1.34 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159352.diff 72 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 10b3f0b213811..9de57a2879d5b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -79,7 +79,7 @@ static cl::opt<int>
FPImmCost(DEBUG_TYPE "-fpimm-cost", cl::Hidden,
cl::desc("Give the maximum number of instructions that we will "
"use for creating a floating-point immediate value"),
- cl::init(2));
+ cl::init(3));
static cl::opt<bool>
ReassocShlAddiAdd("reassoc-shl-addi-add", cl::Hidden,
diff --git a/llvm/test/CodeGen/RISCV/bfloat-convert.ll b/llvm/test/CodeGen/RISCV/bfloat-convert.ll
index 6207a17734d62..73ff888e44b3b 100644
--- a/llvm/test/CodeGen/RISCV/bfloat-convert.ll
+++ b/llvm/test/CodeGen/RISCV/bfloat-convert.ll
@@ -51,13 +51,14 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
; CHECK32ZFBFMIN-LABEL: fcvt_si_bf16_sat:
; CHECK32ZFBFMIN: # %bb.0: # %start
; CHECK32ZFBFMIN-NEXT: fcvt.s.bf16 fa5, fa0
-; CHECK32ZFBFMIN-NEXT: lui a0, %hi(.LCPI1_0)
-; CHECK32ZFBFMIN-NEXT: feq.s a1, fa5, fa5
-; CHECK32ZFBFMIN-NEXT: flw fa4, %lo(.LCPI1_0)(a0)
; CHECK32ZFBFMIN-NEXT: lui a0, 815104
-; CHECK32ZFBFMIN-NEXT: fmv.w.x fa3, a0
-; CHECK32ZFBFMIN-NEXT: fmax.s fa5, fa5, fa3
-; CHECK32ZFBFMIN-NEXT: neg a0, a1
+; CHECK32ZFBFMIN-NEXT: lui a1, 290816
+; CHECK32ZFBFMIN-NEXT: fmv.w.x fa4, a0
+; CHECK32ZFBFMIN-NEXT: feq.s a0, fa5, fa5
+; CHECK32ZFBFMIN-NEXT: addi a1, a1, -512
+; CHECK32ZFBFMIN-NEXT: neg a0, a0
+; CHECK32ZFBFMIN-NEXT: fmax.s fa5, fa5, fa4
+; CHECK32ZFBFMIN-NEXT: fmv.w.x fa4, a1
; CHECK32ZFBFMIN-NEXT: fmin.s fa5, fa5, fa4
; CHECK32ZFBFMIN-NEXT: fcvt.w.s a1, fa5, rtz
; CHECK32ZFBFMIN-NEXT: and a0, a0, a1
@@ -68,12 +69,13 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
; RV32ID-NEXT: fmv.x.w a0, fa0
; RV32ID-NEXT: lui a1, 815104
; RV32ID-NEXT: fmv.w.x fa5, a1
-; RV32ID-NEXT: lui a1, %hi(.LCPI1_0)
+; RV32ID-NEXT: lui a1, 290816
; RV32ID-NEXT: slli a0, a0, 16
-; RV32ID-NEXT: flw fa4, %lo(.LCPI1_0)(a1)
-; RV32ID-NEXT: fmv.w.x fa3, a0
-; RV32ID-NEXT: feq.s a0, fa3, fa3
-; RV32ID-NEXT: fmax.s fa5, fa3, fa5
+; RV32ID-NEXT: addi a1, a1, -512
+; RV32ID-NEXT: fmv.w.x fa4, a0
+; RV32ID-NEXT: feq.s a0, fa4, fa4
+; RV32ID-NEXT: fmax.s fa5, fa4, fa5
+; RV32ID-NEXT: fmv.w.x fa4, a1
; RV32ID-NEXT: neg a0, a0
; RV32ID-NEXT: fmin.s fa5, fa5, fa4
; RV32ID-NEXT: fcvt.w.s a1, fa5, rtz
@@ -83,13 +85,14 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
; CHECK64ZFBFMIN-LABEL: fcvt_si_bf16_sat:
; CHECK64ZFBFMIN: # %bb.0: # %start
; CHECK64ZFBFMIN-NEXT: fcvt.s.bf16 fa5, fa0
-; CHECK64ZFBFMIN-NEXT: lui a0, %hi(.LCPI1_0)
-; CHECK64ZFBFMIN-NEXT: feq.s a1, fa5, fa5
-; CHECK64ZFBFMIN-NEXT: flw fa4, %lo(.LCPI1_0)(a0)
; CHECK64ZFBFMIN-NEXT: lui a0, 815104
-; CHECK64ZFBFMIN-NEXT: fmv.w.x fa3, a0
-; CHECK64ZFBFMIN-NEXT: fmax.s fa5, fa5, fa3
-; CHECK64ZFBFMIN-NEXT: neg a0, a1
+; CHECK64ZFBFMIN-NEXT: lui a1, 290816
+; CHECK64ZFBFMIN-NEXT: fmv.w.x fa4, a0
+; CHECK64ZFBFMIN-NEXT: feq.s a0, fa5, fa5
+; CHECK64ZFBFMIN-NEXT: addi a1, a1, -512
+; CHECK64ZFBFMIN-NEXT: neg a0, a0
+; CHECK64ZFBFMIN-NEXT: fmax.s fa5, fa5, fa4
+; CHECK64ZFBFMIN-NEXT: fmv.w.x fa4, a1
; CHECK64ZFBFMIN-NEXT: fmin.s fa5, fa5, fa4
; CHECK64ZFBFMIN-NEXT: fcvt.l.s a1, fa5, rtz
; CHECK64ZFBFMIN-NEXT: and a0, a0, a1
@@ -100,12 +103,13 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
; RV64ID-NEXT: fmv.x.w a0, fa0
; RV64ID-NEXT: lui a1, 815104
; RV64ID-NEXT: fmv.w.x fa5, a1
-; RV64ID-NEXT: lui a1, %hi(.LCPI1_0)
+; RV64ID-NEXT: lui a1, 290816
; RV64ID-NEXT: slli a0, a0, 16
-; RV64ID-NEXT: flw fa4, %lo(.LCPI1_0)(a1)
-; RV64ID-NEXT: fmv.w.x fa3, a0
-; RV64ID-NEXT: feq.s a0, fa3, fa3
-; RV64ID-NEXT: fmax.s fa5, fa3, fa5
+; RV64ID-NEXT: addi a1, a1, -512
+; RV64ID-NEXT: fmv.w.x fa4, a0
+; RV64ID-NEXT: feq.s a0, fa4, fa4
+; RV64ID-NEXT: fmax.s fa5, fa4, fa5
+; RV64ID-NEXT: fmv.w.x fa4, a1
; RV64ID-NEXT: neg a0, a0
; RV64ID-NEXT: fmin.s fa5, fa5, fa4
; RV64ID-NEXT: fcvt.l.s a1, fa5, rtz
@@ -152,49 +156,53 @@ define i16 @fcvt_ui_bf16(bfloat %a) nounwind {
define i16 @fcvt_ui_bf16_sat(bfloat %a) nounwind {
; CHECK32ZFBFMIN-LABEL: fcvt_ui_bf16_sat:
; CHECK32ZFBFMIN: # %bb.0: # %start
-; CHECK32ZFBFMIN-NEXT: lui a0, %hi(.LCPI3_0)
-; CHECK32ZFBFMIN-NEXT: flw fa5, %lo(.LCPI3_0)(a0)
-; CHECK32ZFBFMIN-NEXT: fcvt.s.bf16 fa4, fa0
-; CHECK32ZFBFMIN-NEXT: fmv.w.x fa3, zero
-; CHECK32ZFBFMIN-NEXT: fmax.s fa4, fa4, fa3
-; CHECK32ZFBFMIN-NEXT: fmin.s fa5, fa4, fa5
+; CHECK32ZFBFMIN-NEXT: fcvt.s.bf16 fa5, fa0
+; CHECK32ZFBFMIN-NEXT: fmv.w.x fa4, zero
+; CHECK32ZFBFMIN-NEXT: lui a0, 292864
+; CHECK32ZFBFMIN-NEXT: fmax.s fa5, fa5, fa4
+; CHECK32ZFBFMIN-NEXT: addi a0, a0, -256
+; CHECK32ZFBFMIN-NEXT: fmv.w.x fa4, a0
+; CHECK32ZFBFMIN-NEXT: fmin.s fa5, fa5, fa4
; CHECK32ZFBFMIN-NEXT: fcvt.wu.s a0, fa5, rtz
; CHECK32ZFBFMIN-NEXT: ret
;
; RV32ID-LABEL: fcvt_ui_bf16_sat:
; RV32ID: # %bb.0: # %start
-; RV32ID-NEXT: lui a0, %hi(.LCPI3_0)
-; RV32ID-NEXT: flw fa5, %lo(.LCPI3_0)(a0)
; RV32ID-NEXT: fmv.x.w a0, fa0
+; RV32ID-NEXT: fmv.w.x fa5, zero
; RV32ID-NEXT: slli a0, a0, 16
; RV32ID-NEXT: fmv.w.x fa4, a0
-; RV32ID-NEXT: fmv.w.x fa3, zero
-; RV32ID-NEXT: fmax.s fa4, fa4, fa3
-; RV32ID-NEXT: fmin.s fa5, fa4, fa5
+; RV32ID-NEXT: lui a0, 292864
+; RV32ID-NEXT: addi a0, a0, -256
+; RV32ID-NEXT: fmax.s fa5, fa4, fa5
+; RV32ID-NEXT: fmv.w.x fa4, a0
+; RV32ID-NEXT: fmin.s fa5, fa5, fa4
; RV32ID-NEXT: fcvt.wu.s a0, fa5, rtz
; RV32ID-NEXT: ret
;
; CHECK64ZFBFMIN-LABEL: fcvt_ui_bf16_sat:
; CHECK64ZFBFMIN: # %bb.0: # %start
-; CHECK64ZFBFMIN-NEXT: lui a0, %hi(.LCPI3_0)
-; CHECK64ZFBFMIN-NEXT: flw fa5, %lo(.LCPI3_0)(a0)
-; CHECK64ZFBFMIN-NEXT: fcvt.s.bf16 fa4, fa0
-; CHECK64ZFBFMIN-NEXT: fmv.w.x fa3, zero
-; CHECK64ZFBFMIN-NEXT: fmax.s fa4, fa4, fa3
-; CHECK64ZFBFMIN-NEXT: fmin.s fa5, fa4, fa5
+; CHECK64ZFBFMIN-NEXT: fcvt.s.bf16 fa5, fa0
+; CHECK64ZFBFMIN-NEXT: fmv.w.x fa4, zero
+; CHECK64ZFBFMIN-NEXT: lui a0, 292864
+; CHECK64ZFBFMIN-NEXT: fmax.s fa5, fa5, fa4
+; CHECK64ZFBFMIN-NEXT: addi a0, a0, -256
+; CHECK64ZFBFMIN-NEXT: fmv.w.x fa4, a0
+; CHECK64ZFBFMIN-NEXT: fmin.s fa5, fa5, fa4
; CHECK64ZFBFMIN-NEXT: fcvt.lu.s a0, fa5, rtz
; CHECK64ZFBFMIN-NEXT: ret
;
; RV64ID-LABEL: fcvt_ui_bf16_sat:
; RV64ID: # %bb.0: # %start
-; RV64ID-NEXT: lui a0, %hi(.LCPI3_0)
-; RV64ID-NEXT: flw fa5, %lo(.LCPI3_0)(a0)
; RV64ID-NEXT: fmv.x.w a0, fa0
+; RV64ID-NEXT: fmv.w.x fa5, zero
; RV64ID-NEXT: slli a0, a0, 16
; RV64ID-NEXT: fmv.w.x fa4, a0
-; RV64ID-NEXT: fmv.w.x fa3, zero
-; RV64ID-NEXT: fmax.s fa4, fa4, fa3
-; RV64ID-NEXT: fmin.s fa5, fa4, fa5
+; RV64ID-NEXT: lui a0, 292864
+; RV64ID-NEXT: addi a0, a0, -256
+; RV64ID-NEXT: fmax.s fa5, fa4, fa5
+; RV64ID-NEXT: fmv.w.x fa4, a0
+; RV64ID-NEXT: fmin.s fa5, fa5, fa4
; RV64ID-NEXT: fcvt.lu.s a0, fa5, rtz
; RV64ID-NEXT: ret
start:
@@ -472,20 +480,21 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
; RV32IZFBFMIN-NEXT: # %bb.1: # %start
; RV32IZFBFMIN-NEXT: mv a2, a1
; RV32IZFBFMIN-NEXT: .LBB10_2: # %start
-; RV32IZFBFMIN-NEXT: lui a1, %hi(.LCPI10_0)
-; RV32IZFBFMIN-NEXT: flw fa5, %lo(.LCPI10_0)(a1)
+; RV32IZFBFMIN-NEXT: lui a1, 389120
+; RV32IZFBFMIN-NEXT: addi a1, a1, -1
+; RV32IZFBFMIN-NEXT: fmv.w.x fa5, a1
; RV32IZFBFMIN-NEXT: flt.s a1, fa5, fs0
; RV32IZFBFMIN-NEXT: beqz a1, .LBB10_4
; RV32IZFBFMIN-NEXT: # %bb.3:
; RV32IZFBFMIN-NEXT: addi a2, a3, -1
; RV32IZFBFMIN-NEXT: .LBB10_4: # %start
; RV32IZFBFMIN-NEXT: feq.s a3, fs0, fs0
-; RV32IZFBFMIN-NEXT: neg a4, a1
-; RV32IZFBFMIN-NEXT: neg a1, s0
+; RV32IZFBFMIN-NEXT: neg a4, s0
+; RV32IZFBFMIN-NEXT: neg a5, a1
; RV32IZFBFMIN-NEXT: neg a3, a3
-; RV32IZFBFMIN-NEXT: and a0, a1, a0
+; RV32IZFBFMIN-NEXT: and a0, a4, a0
; RV32IZFBFMIN-NEXT: and a1, a3, a2
-; RV32IZFBFMIN-NEXT: or a0, a4, a0
+; RV32IZFBFMIN-NEXT: or a0, a5, a0
; RV32IZFBFMIN-NEXT: and a0, a3, a0
; RV32IZFBFMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
; RV32IZFBFMIN-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
@@ -511,20 +520,21 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
; R32IDZFBFMIN-NEXT: # %bb.1: # %start
; R32IDZFBFMIN-NEXT: mv a2, a1
; R32IDZFBFMIN-NEXT: .LBB10_2: # %start
-; R32IDZFBFMIN-NEXT: lui a1, %hi(.LCPI10_0)
-; R32IDZFBFMIN-NEXT: flw fa5, %lo(.LCPI10_0)(a1)
+; R32IDZFBFMIN-NEXT: lui a1, 389120
+; R32IDZFBFMIN-NEXT: addi a1, a1, -1
+; R32IDZFBFMIN-NEXT: fmv.w.x fa5, a1
; R32IDZFBFMIN-NEXT: flt.s a1, fa5, fs0
; R32IDZFBFMIN-NEXT: beqz a1, .LBB10_4
; R32IDZFBFMIN-NEXT: # %bb.3:
; R32IDZFBFMIN-NEXT: addi a2, a3, -1
; R32IDZFBFMIN-NEXT: .LBB10_4: # %start
; R32IDZFBFMIN-NEXT: feq.s a3, fs0, fs0
-; R32IDZFBFMIN-NEXT: neg a4, a1
-; R32IDZFBFMIN-NEXT: neg a1, s0
+; R32IDZFBFMIN-NEXT: neg a4, s0
+; R32IDZFBFMIN-NEXT: neg a5, a1
; R32IDZFBFMIN-NEXT: neg a3, a3
-; R32IDZFBFMIN-NEXT: and a0, a1, a0
+; R32IDZFBFMIN-NEXT: and a0, a4, a0
; R32IDZFBFMIN-NEXT: and a1, a3, a2
-; R32IDZFBFMIN-NEXT: or a0, a4, a0
+; R32IDZFBFMIN-NEXT: or a0, a5, a0
; R32IDZFBFMIN-NEXT: and a0, a3, a0
; R32IDZFBFMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
; R32IDZFBFMIN-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
@@ -552,8 +562,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
; RV32ID-NEXT: # %bb.1: # %start
; RV32ID-NEXT: mv a2, a1
; RV32ID-NEXT: .LBB10_2: # %start
-; RV32ID-NEXT: lui a1, %hi(.LCPI10_0)
-; RV32ID-NEXT: flw fa5, %lo(.LCPI10_0)(a1)
+; RV32ID-NEXT: lui a1, 389120
+; RV32ID-NEXT: addi a1, a1, -1
+; RV32ID-NEXT: fmv.w.x fa5, a1
; RV32ID-NEXT: flt.s a1, fa5, fs0
; RV32ID-NEXT: beqz a1, .LBB10_4
; RV32ID-NEXT: # %bb.3:
@@ -641,30 +652,59 @@ define i64 @fcvt_lu_bf16(bfloat %a) nounwind {
}
define i64 @fcvt_lu_bf16_sat(bfloat %a) nounwind {
-; CHECK32ZFBFMIN-LABEL: fcvt_lu_bf16_sat:
-; CHECK32ZFBFMIN: # %bb.0: # %start
-; CHECK32ZFBFMIN-NEXT: addi sp, sp, -16
-; CHECK32ZFBFMIN-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT: lui a0, %hi(.LCPI12_0)
-; CHECK32ZFBFMIN-NEXT: flw fa5, %lo(.LCPI12_0)(a0)
-; CHECK32ZFBFMIN-NEXT: fcvt.s.bf16 fa0, fa0
-; CHECK32ZFBFMIN-NEXT: fmv.w.x fa4, zero
-; CHECK32ZFBFMIN-NEXT: fle.s a0, fa4, fa0
-; CHECK32ZFBFMIN-NEXT: flt.s a1, fa5, fa0
-; CHECK32ZFBFMIN-NEXT: neg s0, a1
-; CHECK32ZFBFMIN-NEXT: neg s1, a0
-; CHECK32ZFBFMIN-NEXT: call __fixunssfdi
-; CHECK32ZFBFMIN-NEXT: and a0, s1, a0
-; CHECK32ZFBFMIN-NEXT: and a1, s1, a1
-; CHECK32ZFBFMIN-NEXT: or a0, s0, a0
-; CHECK32ZFBFMIN-NEXT: or a1, s0, a1
-; CHECK32ZFBFMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT: lw s1, 4(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT: addi sp, sp, 16
-; CHECK32ZFBFMIN-NEXT: ret
+; RV32IZFBFMIN-LABEL: fcvt_lu_bf16_sat:
+; RV32IZFBFMIN: # %bb.0: # %start
+; RV32IZFBFMIN-NEXT: addi sp, sp, -16
+; RV32IZFBFMIN-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT: fsw fs0, 4(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT: fcvt.s.bf16 fs0, fa0
+; RV32IZFBFMIN-NEXT: fmv.w.x fa5, zero
+; RV32IZFBFMIN-NEXT: fle.s a0, fa5, fs0
+; RV32IZFBFMIN-NEXT: neg s0, a0
+; RV32IZFBFMIN-NEXT: fmv.s fa0, fs0
+; RV32IZFBFMIN-NEXT: call __fixunssfdi
+; RV32IZFBFMIN-NEXT: and a0, s0, a0
+; RV32IZFBFMIN-NEXT: lui a2, 391168
+; RV32IZFBFMIN-NEXT: and a1, s0, a1
+; RV32IZFBFMIN-NEXT: addi a2, a2, -1
+; RV32IZFBFMIN-NEXT: fmv.w.x fa5, a2
+; RV32IZFBFMIN-NEXT: flt.s a2, fa5, fs0
+; RV32IZFBFMIN-NEXT: neg a2, a2
+; RV32IZFBFMIN-NEXT: or a0, a2, a0
+; RV32IZFBFMIN-NEXT: or a1, a2, a1
+; RV32IZFBFMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT: flw fs0, 4(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT: addi sp, sp, 16
+; RV32IZFBFMIN-NEXT: ret
+;
+; R32IDZFBFMIN-LABEL: fcvt_lu_bf16_sat:
+; R32IDZFBFMIN: # %bb.0: # %start
+; R32IDZFBFMIN-NEXT: addi sp, sp, -16
+; R32IDZFBFMIN-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; R32IDZFBFMIN-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
+; R32IDZFBFMIN-NEXT: fsd fs0, 0(sp) # 8-byte Folded Spill
+; R32IDZFBFMIN-NEXT: fcvt.s.bf16 fs0, fa0
+; R32IDZFBFMIN-NEXT: fmv.w.x fa5, zero
+; R32IDZFBFMIN-NEXT: fle.s a0, fa5, fs0
+; R32IDZFBFMIN-NEXT: neg s0, a0
+; R32IDZFBFMIN-NEXT: fmv.s fa0, fs0
+; R32IDZFBFMIN-NEXT: call __fixunssfdi
+; R32IDZFBFMIN-NEXT: and a0, s0, a0
+; R32IDZFBFMIN-NEXT: lui a2, 391168
+; R32IDZFBFMIN-NEXT: and a1, s0, a1
+; R32IDZFBFMIN-NEXT: addi a2, a2, -1
+; R32IDZFBFMIN-NEXT: fmv.w.x fa5, a2
+; R32IDZFBFMIN-NEXT: flt.s a2, fa5, fs0
+; R32IDZFBFMIN-NEXT: neg a2, a2
+; R32IDZFBFMIN-NEXT: or a0, a2, a0
+; R32IDZFBFMIN-NEXT: or a1, a2, a1
+; R32IDZFBFMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; R32IDZFBFMIN-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
+; R32IDZFBFMIN-NEXT: fld fs0, 0(sp) # 8-byte Folded Reload
+; R32IDZFBFMIN-NEXT: addi sp, sp, 16
+; R32IDZFBFMIN-NEXT: ret
;
; RV32ID-LABEL: fcvt_lu_bf16_sat:
; RV32ID: # %bb.0: # %start
@@ -673,15 +713,16 @@ define i64 @fcvt_lu_bf16_sat(bfloat %a) nounwind {
; RV32ID-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32ID-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
; RV32ID-NEXT: fmv.x.w a0, fa0
-; RV32ID-NEXT: lui a1, %hi(.LCPI12_0)
-; RV32ID-NEXT: fmv.w.x fa5, zero
-; RV32ID-NEXT: flw fa4, %lo(.LCPI12_0)(a1)
+; RV32ID-NEXT: lui a1, 391168
; RV32ID-NEXT: slli a0, a0, 16
+; RV32ID-NEXT: addi a1, a1, -1
; RV32ID-NEXT: fmv.w.x fa0, a0
-; RV32ID-NEXT: fle.s a0, fa5, fa0
-; RV32ID-NEXT: flt.s a1, fa4, fa0
-; RV32ID-NEXT: neg s0, a1
-; RV32ID-NEXT: neg s1, a0
+; RV32ID-NEXT: fmv.w.x fa5, a1
+; RV32ID-NEXT: flt.s a0, fa5, fa0
+; RV32ID-NEXT: fmv.w.x fa5, zero
+; RV32ID-NEXT: fle.s a1, fa5, fa0
+; RV32ID-NEXT: neg s0, a0
+; RV32ID-NEXT: neg s1, a1
; RV32ID-NEXT: call __fixunssfdi
; RV32ID-NEXT: and a0, s1, a0
; RV32ID-NEXT: and a1, s1, a1
diff --git a/llvm/test/CodeGen/RISCV/bfloat-imm.ll b/llvm/test/CodeGen/RISCV/bfloat-imm.ll
index 76ff720b1c268..61014891414d8 100644
--- a/llvm/test/CodeGen/RISCV/bfloat-imm.ll
+++ b/llvm/test/CodeGen/RISCV/bfloat-imm.ll
@@ -7,8 +7,9 @@
define bfloat @bfloat_imm() nounwind {
; CHECK-LABEL: bfloat_imm:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI0_0)
-; CHECK-NEXT: flh fa0, %lo(.LCPI0_0)(a0)
+; CHECK-NEXT: lui a0, 4
+; CHECK-NEXT: addi a0, a0, 64
+; CHECK-NEXT: fmv.h.x fa0, a0
; CHECK-NEXT: ret
ret bfloat 3.0
}
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-half.ll b/llvm/test/CodeGen/RISCV/calling-conv-half.ll
index d7957540d1b29..d8e6b7f3ede9a 100644
--- a/llvm/test/CodeGen/RISCV/calling-conv-half.ll
+++ b/llvm/test/CodeGen/RISCV/calling-conv-half.ll
@@ -519,15 +519,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV32-ILP32F: # %bb.0:
; RV32-ILP32F-NEXT: addi sp, sp, -16
; RV32-ILP32F-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ILP32F-NEXT: lui a4, %hi(.LCPI3_0)
+; RV32-ILP32F-NEXT: lui a7, 1048565
; RV32-ILP32F-NEXT: li a0, 1
; RV32-ILP32F-NEXT: li a1, 2
; RV32-ILP32F-NEXT: li a2, 3
; RV32-ILP32F-NEXT: li a3, 4
-; RV32-ILP32F-NEXT: flw fa0, %lo(.LCPI3_0)(a4)
; RV32-ILP32F-NEXT: li a4, 5
; RV32-ILP32F-NEXT: li a5, 6
; RV32-ILP32F-NEXT: li a6, 7
+; RV32-ILP32F-NEXT: addi a7, a7, -1792
+; RV32-ILP32F-NEXT: fmv.w.x fa0, a7
; RV32-ILP32F-NEXT: li a7, 8
; RV32-ILP32F-NEXT: call callee_half_on_stack
; RV32-ILP32F-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
@@ -538,15 +539,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV64-LP64F: # %bb.0:
; RV64-LP64F-NEXT: addi sp, sp, -16
; RV64-LP64F-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
-; RV64-LP64F-NEXT: lui a4, %hi(.LCPI3_0)
+; RV64-LP64F-NEXT: lui a7, 1048565
; RV64-LP64F-NEXT: li a0, 1
; RV64-LP64F-NEXT: li a1, 2
; RV64-LP64F-NEXT: li a2, 3
; RV64-LP64F-NEXT: li a3, 4
-; RV64-LP64F-NEXT: flw fa0, %lo(.LCPI3_0)(a4)
; RV64-LP64F-NEXT: li a4, 5
; RV64-LP64F-NEXT: li a5, 6
; RV64-LP64F-NEXT: li a6, 7
+; RV64-LP64F-NEXT: addi a7, a7, -1792
+; RV64-LP64F-NEXT: fmv.w.x fa0, a7
; RV64-LP64F-NEXT: li a7, 8
; RV64-LP64F-NEXT: call callee_half_on_stack
; RV64-LP64F-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
@@ -557,15 +559,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV32-ILP32ZFHMIN: # %bb.0:
; RV32-ILP32ZFHMIN-NEXT: addi sp, sp, -16
; RV32-ILP32ZFHMIN-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ILP32ZFHMIN-NEXT: lui a4, %hi(.LCPI3_0)
+; RV32-ILP32ZFHMIN-NEXT: lui a7, 5
; RV32-ILP32ZFHMIN-NEXT: li a0, 1
; RV32-ILP32ZFHMIN-NEXT: li a1, 2
; RV32-ILP32ZFHMIN-NEXT: li a2, 3
; RV32-ILP32ZFHMIN-NEXT: li a3, 4
-; RV32-ILP32ZFHMIN-NEXT: flh fa0, %lo(.LCPI3_0)(a4)
; RV32-ILP32ZFHMIN-NEXT: li a4, 5
; RV32-ILP32ZFHMIN-NEXT: li a5, 6
; RV32-ILP32ZFHMIN-NEXT: li a6, 7
+; RV32-ILP32ZFHMIN-NEXT: addi a7, a7, -1792
+; RV32-ILP32ZFHMIN-NEXT: fmv.h.x fa0, a7
; RV32-ILP32ZFHMIN-NEXT: li a7, 8
; RV32-ILP32ZFHMIN-NEXT: call callee_half_on_stack
; RV32-ILP32ZFHMIN-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
@@ -576,15 +579,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV64-LP64ZFHMIN: # %bb.0:
; RV64-LP64ZFHMIN-NEXT: addi sp, sp, -16
; RV64-LP64ZFHMIN-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
-; RV64-LP64ZFHMIN-NEXT: lui a4, %hi(.LCPI3_0)
+; RV64-LP64ZFHMIN-NEXT: lui a7, 5
; RV64-LP64ZFHMIN-NEXT: li a0, 1
; RV64-LP64ZFHMIN-NEXT: li a1, 2
; RV64-LP64ZFHMIN-NEXT: li a2, 3
; RV64-LP64ZFHMIN-NEXT: li a3, 4
-; RV64-LP64ZFHMIN-NEXT: flh fa0, %lo(.LCPI3_0)(a4)
; RV64-LP64ZFHMIN-NEXT: li a4, 5
; RV64-LP64ZFHMIN-NEXT: li a5, 6
; RV64-LP64ZFHMIN-NEXT: li a6, 7
+; RV64-LP64ZFHMIN-NEXT: addi a7, a7, -1792
+; RV64-LP64ZFHMIN-NEXT: fmv.h.x fa0, a7
; RV64-LP64ZFHMIN-NEXT: li a7, 8
; RV64-LP64ZFHMIN-NEXT: call callee_half_on_stack
; RV64-LP64ZFHMIN-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
@@ -595,15 +599,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV32-ZFH-ILP32: # %bb.0:
; RV32-ZFH-ILP32-NEXT: addi sp, sp, -16
; RV32-ZFH-ILP32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ZFH-ILP32-NEXT: lui a4, %hi(.LCPI3_0)
+; RV32-ZFH-ILP32-NEXT: lui a7, 5
; RV32-ZFH-ILP32-NEXT: li a0, 1
; RV32-ZFH-ILP32-NEXT: li a1, 2
; RV32-ZFH-ILP32-NEXT: li a2, 3
; RV32-ZFH-ILP32-NEXT: li a3, 4
-; RV32-ZFH-ILP32-NEXT: flh fa5, %lo(.LCPI3_0)(a4)
; RV32-ZFH-ILP32-NEXT: li a4, 5
; RV32-ZFH-ILP32-NEXT: li a5, 6
; RV32-ZFH-ILP32-NEXT: li a6, 7
+; RV32-ZFH-ILP32-NEXT: addi a7, a7, -1792
+; RV32-ZFH-ILP32-NEXT: fmv.h.x fa5, a7
; RV32-ZFH-ILP32-NEXT: li a7, 8
; RV32-ZFH-ILP32-NEXT: fsh fa5, 0(sp)
; RV32-ZFH-ILP32-NEXT: call callee_half_on_stack
@@ -615,15 +620,16 @@ define i32 @caller_half_on_stack() nounwind {
; RV32-ZFH-ILP32F: # %bb.0:
; RV32-ZFH-ILP32F-NEXT: addi sp, sp, -16
; RV32-ZFH-ILP32F-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ZFH-ILP32F-NEXT: lui a4, %hi(.LCPI3_0)
+; RV32-ZFH-ILP32F-NEXT: lui a7, 5
; RV32-ZFH-ILP32F-NEXT: li a0, 1
; RV32-ZFH-ILP32F-NEXT: li a1, 2
; RV32-ZFH-ILP32F-NEXT: li a2, 3
; RV32-ZFH-ILP32F-NEXT: li a3, 4
-; RV32-ZFH-ILP32F-NEXT: flh fa0, %lo(.LCPI3_0)(a4)
; RV32-ZFH-ILP32F-NEXT: li a4, 5
; RV32-ZFH-ILP32F-NEXT: li a5, 6
; RV32-ZFH-ILP32F-NEXT: li a6, 7
+; RV32-ZFH-ILP32F-NEXT: addi a7, a7, -1792
+; RV32-ZFH-ILP32F-NEXT: fmv.h.x fa0, a7
; RV32-ZFH-ILP32F-...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cl::desc("Give the maximum number of instructions that we will " | ||
"use for creating a floating-point immediate value"), | ||
cl::init(2)); | ||
cl::init(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking why we can't just use RISCVSubtarget::getMaxBuildIntsCost() - 1
(minus 1 because we need a conversion from int to float)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to account for Zfinx/Zdinx as well which doesn't need the conversion. I see this as a good suggestion for what a more "principled" constant to use might be, which might be worth exploring as a followup.
For completeness, the dynamic instcount impact on SPEC (rva22u64) are as follows. So always less than 1%, up a little in some benchmarks down in others. In general you'd expect this to be a small increase given constant pool accesses are typically two instructions (but as noted elsewhere, the fact we have issues with constant pool accesses means that reducing the number of constant pool entries can be a win).
|
As discussed in llvm#153402, we have inefficiences in handling constant pool access that are difficult to address. Using an IR pass to promote double constants to a global allows a higher degree of control of code generation for these accesses, resulting in improved performance on benchmarks that might otherwise have high register pressure due to accessing constant pool values separately rather than via a common base. Directly promoting double constants to separate global values and relying on the global merger to do a sensible thing would be one potential avenue to explore, but it is _not_ done in this version of the patch because: * The global merger pass needs fixes. For instance it claims to be a function pass, yet all of the work is done in initialisation. This means that attempts by backends to schedule it after a given module pass don't actually work as expected. * The heuristics used can impact codegen unexpectedly, so I worry that tweaking it to get the behaviour desired for promoted constants may lead to other issues. This may be completely tractable though. Now that llvm#159352 has landed, the impact on terms if dynamically executed instructions is slightly smaller (as we are starting from a better baseline), but still worthwhile in lbm and nab from SPEC. Results below are for rva22u64: ``` Benchmark Baseline This PR Diff (%) ============================================================ 500.perlbench_r 180667466583 180667466661 0.00% 502.gcc_r 221281439537 221277561043 -0.00% 505.mcf_r 134656203905 134656204017 0.00% 508.namd_r 217646645213 217616374477 -0.01% 510.parest_r 291730242760 291917069933 0.06% 511.povray_r 30982459833 31101871667 0.39% 519.lbm_r 91217999812 89029313608 -2.40% 520.omnetpp_r 137705551722 138044390554 0.25% 523.xalancbmk_r 284733326286 284728940808 -0.00% 525.x264_r 379107521545 379100249676 -0.00% 526.blender_r 659391437704 659446918261 0.01% 531.deepsjeng_r 350038121655 350038121654 -0.00% 538.imagick_r 238568679271 238560769465 -0.00% 541.leela_r 405654701351 405660852862 0.00% 544.nab_r 398215801713 391380811065 -1.72% 557.xz_r 129832192046 129832192047 0.00% ```
-riscv-fp-imm-cost
controls the threshold at which the constant pool is used for float constants rather than generating directly (typically into a GPR followed by anfmv
). The value used for this knob indicates the number of instructions that can be used to produce the value (otherwise we fall back to the constant pool). Upping to to 3 covers a huge number of additional constants (see#153402), e.g. most whole numbers which can be generated through lui+shift+fmv. As in general we struggle with efficient code generation for constant pool accesses, reducing the number of constant pool accesses is beneficial. We are typically replacing a two-instruction sequence (which includes a load) with a three instruction sequence (two simple arithmetic operations plus a fmv), which.
The CHECK prefixes for various tests had to be updated to avoid conflicts leading to check lines being dropped altogether (see #159321 for a change to update_llc_test_checks to aid diagnosing this).