Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Be more aggressive about forming floating point constants #68433

Closed
wants to merge 4 commits into from

Conversation

preames
Copy link
Collaborator

@preames preames commented Oct 6, 2023

We were being very conservative about forming floating point constants via an integer materialization sequence and a fmv. With the default threshold of 2, we'd only do so if the bit sequence could be produced with a single instruction (LUI, ADDI, or sometimes BSETI).

This change removes the separate threshold entirely, and ties the floating point expansion costing to the integer costing threshold. The effect of this is that the default threshold increases by 2, and that more sequences are materialized via integers - avoiding constant pool loads. This is sufficient to cover all constants of types fp16, bf16, and fp32. Many f64 constants are covered as well, but not all.

One downside of this change is that double constants for ELEN=64 configurations on rv32 can't be formed via the same integer sequences. This causes a forking in the test coverage which is more than a tad ugly. Ideas on how to reduce this, or restructure tests to avoid it are more than welcome.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2023

@llvm/pr-subscribers-backend-risc-v

Changes

We were being very conservative about forming floating point constants via an integer materialization sequence and a fmv. With the default threshold of 2, we'd only do so if the bit sequence could be produced with a single instruction (LUI, ADDI, or sometimes BSETI).

This change removes the separate threshold entirely, and ties the floating point expansion costing to the integer costing threshold. The effect of this is that the default threshold increases by 2, and that more sequences are materialized via integers - avoiding constant pool loads. This is sufficient to cover all constants of types fp16, bf16, and fp32. Many f64 constants are covered as well, but not all.

One downside of this change is that double constants for ELEN=64 configurations on rv32 can't be formed via the same integer sequences. This causes a forking in the test coverage which is more than a tad ugly. Ideas on how to reduce this, or restructure tests to avoid it are more than welcome.


Patch is 1.05 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/68433.diff

66 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-8)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+118-77)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-imm.ll (+5-4)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-half.ll (+27-18)
  • (modified) llvm/test/CodeGen/RISCV/codemodel-lowering.ll (+8-7)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+54-44)
  • (modified) llvm/test/CodeGen/RISCV/double-imm.ll (+41-11)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics.ll (+30-24)
  • (modified) llvm/test/CodeGen/RISCV/double-round-conv.ll (+25-20)
  • (modified) llvm/test/CodeGen/RISCV/double-zfa.ll (+74-30)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+52-46)
  • (modified) llvm/test/CodeGen/RISCV/float-imm.ll (+8-6)
  • (modified) llvm/test/CodeGen/RISCV/float-round-conv-sat.ll (+50-40)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+529-447)
  • (modified) llvm/test/CodeGen/RISCV/half-imm.ll (+20-23)
  • (modified) llvm/test/CodeGen/RISCV/half-intrinsics.ll (+30-24)
  • (modified) llvm/test/CodeGen/RISCV/half-round-conv-sat.ll (+200-170)
  • (modified) llvm/test/CodeGen/RISCV/half-round-conv.ll (+195-180)
  • (modified) llvm/test/CodeGen/RISCV/half-zfa-fli.ll (+36-24)
  • (modified) llvm/test/CodeGen/RISCV/repeated-fp-divisors.ll (+3-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+578-284)
  • (modified) llvm/test/CodeGen/RISCV/rvv/double-round-conv.ll (+96-64)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fceil-sdnode.ll (+140-70)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ffloor-sdnode.ll (+140-70)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+572-279)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+1124-279)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll (+15-58)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+75-37)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+630-112)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i-sat.ll (+58-52)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-costrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround.ll (+266-72)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven.ll (+266-72)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll (+18-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+24-267)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+512-249)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+1124-279)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+24-275)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+24-275)
  • (modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+578-284)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll (+18-76)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fnearbyint-sdnode.ll (+18-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fptosi-sat.ll (-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fptoui-sat.ll (+12-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/frint-sdnode.ll (+124-62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fround-costrained-sdnode.ll (+156-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fround-sdnode.ll (+140-70)
  • (modified) llvm/test/CodeGen/RISCV/rvv/froundeven-constrained-sdnode.ll (+18-76)
  • (modified) llvm/test/CodeGen/RISCV/rvv/froundeven-sdnode.ll (+18-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll (+18-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ftrunc-sdnode.ll (+18-60)
  • (modified) llvm/test/CodeGen/RISCV/rvv/half-round-conv.ll (+77-322)
  • (modified) llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll (+36-293)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rint-vp.ll (+1020-258)
  • (modified) llvm/test/CodeGen/RISCV/rvv/round-vp.ll (+36-282)
  • (modified) llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll (+36-282)
  • (modified) llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll (+36-282)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp-combine.ll (-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/zfbfmin.ll (+3-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index bd4150c87eabbd0..4d71798a967a144 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -68,12 +68,6 @@ static cl::opt<unsigned> NumRepeatedDivisors(
              "transformation to multiplications by the reciprocal"),
     cl::init(2));
 
-static cl::opt<int>
-    FPImmCost(DEBUG_TYPE "-fpimm-cost", cl::Hidden,
-              cl::desc("Give the maximum number of instructions that we will "
-                       "use for creating a floating-point immediate value"),
-              cl::init(2));
-
 RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
                                          const RISCVSubtarget &STI)
     : TargetLowering(TM), Subtarget(STI) {
@@ -2026,10 +2020,10 @@ bool RISCVTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
 
   // Building an integer and then converting requires a fmv at the end of
   // the integer sequence.
-  const int Cost =
+  const unsigned Cost =
     1 + RISCVMatInt::getIntMatCost(Imm.bitcastToAPInt(), Subtarget.getXLen(),
                                    Subtarget.getFeatureBits());
-  return Cost <= FPImmCost;
+  return Cost <= Subtarget.getMaxBuildIntsCost();
 }
 
 // TODO: This is very conservative.
diff --git a/llvm/test/CodeGen/RISCV/bfloat-convert.ll b/llvm/test/CodeGen/RISCV/bfloat-convert.ll
index 8a0c4240d161bfb..5debb81e6d47447 100644
--- a/llvm/test/CodeGen/RISCV/bfloat-convert.ll
+++ b/llvm/test/CodeGen/RISCV/bfloat-convert.ll
@@ -55,11 +55,12 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
 ; CHECK32ZFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; CHECK32ZFBFMIN-NEXT:    feq.s a0, fa5, fa5
 ; CHECK32ZFBFMIN-NEXT:    neg a0, a0
-; CHECK32ZFBFMIN-NEXT:    lui a1, %hi(.LCPI1_0)
-; CHECK32ZFBFMIN-NEXT:    flw fa4, %lo(.LCPI1_0)(a1)
 ; CHECK32ZFBFMIN-NEXT:    lui a1, 815104
-; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa3, a1
-; CHECK32ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa3
+; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa4, a1
+; CHECK32ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa4
+; CHECK32ZFBFMIN-NEXT:    lui a1, 290816
+; CHECK32ZFBFMIN-NEXT:    addi a1, a1, -512
+; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa4, a1
 ; CHECK32ZFBFMIN-NEXT:    fmin.s fa5, fa5, fa4
 ; CHECK32ZFBFMIN-NEXT:    fcvt.w.s a1, fa5, rtz
 ; CHECK32ZFBFMIN-NEXT:    and a0, a0, a1
@@ -71,12 +72,13 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
 ; RV32ID-NEXT:    slli a0, a0, 16
 ; RV32ID-NEXT:    fmv.w.x fa5, a0
 ; RV32ID-NEXT:    feq.s a0, fa5, fa5
-; RV32ID-NEXT:    lui a1, %hi(.LCPI1_0)
-; RV32ID-NEXT:    flw fa4, %lo(.LCPI1_0)(a1)
-; RV32ID-NEXT:    lui a1, 815104
-; RV32ID-NEXT:    fmv.w.x fa3, a1
-; RV32ID-NEXT:    fmax.s fa5, fa5, fa3
 ; RV32ID-NEXT:    neg a0, a0
+; RV32ID-NEXT:    lui a1, 815104
+; RV32ID-NEXT:    fmv.w.x fa4, a1
+; RV32ID-NEXT:    fmax.s fa5, fa5, fa4
+; RV32ID-NEXT:    lui a1, 290816
+; RV32ID-NEXT:    addi a1, a1, -512
+; RV32ID-NEXT:    fmv.w.x fa4, a1
 ; RV32ID-NEXT:    fmin.s fa5, fa5, fa4
 ; RV32ID-NEXT:    fcvt.w.s a1, fa5, rtz
 ; RV32ID-NEXT:    and a0, a0, a1
@@ -86,12 +88,13 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
 ; CHECK64ZFBFMIN:       # %bb.0: # %start
 ; CHECK64ZFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; CHECK64ZFBFMIN-NEXT:    feq.s a0, fa5, fa5
-; CHECK64ZFBFMIN-NEXT:    lui a1, %hi(.LCPI1_0)
-; CHECK64ZFBFMIN-NEXT:    flw fa4, %lo(.LCPI1_0)(a1)
-; CHECK64ZFBFMIN-NEXT:    lui a1, 815104
-; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa3, a1
-; CHECK64ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa3
 ; CHECK64ZFBFMIN-NEXT:    neg a0, a0
+; CHECK64ZFBFMIN-NEXT:    lui a1, 815104
+; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa4, a1
+; CHECK64ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa4
+; CHECK64ZFBFMIN-NEXT:    lui a1, 290816
+; CHECK64ZFBFMIN-NEXT:    addi a1, a1, -512
+; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa4, a1
 ; CHECK64ZFBFMIN-NEXT:    fmin.s fa5, fa5, fa4
 ; CHECK64ZFBFMIN-NEXT:    fcvt.l.s a1, fa5, rtz
 ; CHECK64ZFBFMIN-NEXT:    and a0, a0, a1
@@ -105,12 +108,13 @@ define i16 @fcvt_si_bf16_sat(bfloat %a) nounwind {
 ; RV64ID-NEXT:    slli a0, a0, 16
 ; RV64ID-NEXT:    fmv.w.x fa5, a0
 ; RV64ID-NEXT:    feq.s a0, fa5, fa5
-; RV64ID-NEXT:    lui a1, %hi(.LCPI1_0)
-; RV64ID-NEXT:    flw fa4, %lo(.LCPI1_0)(a1)
-; RV64ID-NEXT:    lui a1, 815104
-; RV64ID-NEXT:    fmv.w.x fa3, a1
-; RV64ID-NEXT:    fmax.s fa5, fa5, fa3
 ; RV64ID-NEXT:    neg a0, a0
+; RV64ID-NEXT:    lui a1, 815104
+; RV64ID-NEXT:    fmv.w.x fa4, a1
+; RV64ID-NEXT:    fmax.s fa5, fa5, fa4
+; RV64ID-NEXT:    lui a1, 290816
+; RV64ID-NEXT:    addi a1, a1, -512
+; RV64ID-NEXT:    fmv.w.x fa4, a1
 ; RV64ID-NEXT:    fmin.s fa5, fa5, fa4
 ; RV64ID-NEXT:    fcvt.l.s a1, fa5, rtz
 ; RV64ID-NEXT:    and a0, a0, a1
@@ -158,12 +162,13 @@ define i16 @fcvt_ui_bf16(bfloat %a) nounwind {
 define i16 @fcvt_ui_bf16_sat(bfloat %a) nounwind {
 ; CHECK32ZFBFMIN-LABEL: fcvt_ui_bf16_sat:
 ; CHECK32ZFBFMIN:       # %bb.0: # %start
-; CHECK32ZFBFMIN-NEXT:    lui a0, %hi(.LCPI3_0)
-; CHECK32ZFBFMIN-NEXT:    flw fa5, %lo(.LCPI3_0)(a0)
-; CHECK32ZFBFMIN-NEXT:    fcvt.s.bf16 fa4, fa0
-; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa3, zero
-; CHECK32ZFBFMIN-NEXT:    fmax.s fa4, fa4, fa3
-; CHECK32ZFBFMIN-NEXT:    fmin.s fa5, fa4, fa5
+; CHECK32ZFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
+; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa4, zero
+; CHECK32ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa4
+; CHECK32ZFBFMIN-NEXT:    lui a0, 292864
+; CHECK32ZFBFMIN-NEXT:    addi a0, a0, -256
+; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa4, a0
+; CHECK32ZFBFMIN-NEXT:    fmin.s fa5, fa5, fa4
 ; CHECK32ZFBFMIN-NEXT:    fcvt.wu.s a0, fa5, rtz
 ; CHECK32ZFBFMIN-NEXT:    ret
 ;
@@ -171,38 +176,41 @@ define i16 @fcvt_ui_bf16_sat(bfloat %a) nounwind {
 ; RV32ID:       # %bb.0: # %start
 ; RV32ID-NEXT:    fmv.x.w a0, fa0
 ; RV32ID-NEXT:    slli a0, a0, 16
-; RV32ID-NEXT:    lui a1, %hi(.LCPI3_0)
-; RV32ID-NEXT:    flw fa5, %lo(.LCPI3_0)(a1)
+; RV32ID-NEXT:    fmv.w.x fa5, a0
+; RV32ID-NEXT:    fmv.w.x fa4, zero
+; RV32ID-NEXT:    fmax.s fa5, fa5, fa4
+; RV32ID-NEXT:    lui a0, 292864
+; RV32ID-NEXT:    addi a0, a0, -256
 ; RV32ID-NEXT:    fmv.w.x fa4, a0
-; RV32ID-NEXT:    fmv.w.x fa3, zero
-; RV32ID-NEXT:    fmax.s fa4, fa4, fa3
-; RV32ID-NEXT:    fmin.s fa5, fa4, fa5
+; RV32ID-NEXT:    fmin.s fa5, fa5, fa4
 ; RV32ID-NEXT:    fcvt.wu.s a0, fa5, rtz
 ; RV32ID-NEXT:    ret
 ;
 ; CHECK64ZFBFMIN-LABEL: fcvt_ui_bf16_sat:
 ; CHECK64ZFBFMIN:       # %bb.0: # %start
-; CHECK64ZFBFMIN-NEXT:    lui a0, %hi(.LCPI3_0)
-; CHECK64ZFBFMIN-NEXT:    flw fa5, %lo(.LCPI3_0)(a0)
-; CHECK64ZFBFMIN-NEXT:    fcvt.s.bf16 fa4, fa0
-; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa3, zero
-; CHECK64ZFBFMIN-NEXT:    fmax.s fa4, fa4, fa3
-; CHECK64ZFBFMIN-NEXT:    fmin.s fa5, fa4, fa5
+; CHECK64ZFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
+; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa4, zero
+; CHECK64ZFBFMIN-NEXT:    fmax.s fa5, fa5, fa4
+; CHECK64ZFBFMIN-NEXT:    lui a0, 292864
+; CHECK64ZFBFMIN-NEXT:    addi a0, a0, -256
+; CHECK64ZFBFMIN-NEXT:    fmv.w.x fa4, a0
+; CHECK64ZFBFMIN-NEXT:    fmin.s fa5, fa5, fa4
 ; CHECK64ZFBFMIN-NEXT:    fcvt.lu.s a0, fa5, rtz
 ; CHECK64ZFBFMIN-NEXT:    ret
 ;
 ; RV64ID-LABEL: fcvt_ui_bf16_sat:
 ; RV64ID:       # %bb.0: # %start
-; RV64ID-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV64ID-NEXT:    flw fa5, %lo(.LCPI3_0)(a0)
 ; RV64ID-NEXT:    fmv.x.w a0, fa0
 ; RV64ID-NEXT:    slli a0, a0, 48
 ; RV64ID-NEXT:    srli a0, a0, 48
 ; RV64ID-NEXT:    slli a0, a0, 16
+; RV64ID-NEXT:    fmv.w.x fa5, a0
+; RV64ID-NEXT:    fmv.w.x fa4, zero
+; RV64ID-NEXT:    fmax.s fa5, fa5, fa4
+; RV64ID-NEXT:    lui a0, 292864
+; RV64ID-NEXT:    addi a0, a0, -256
 ; RV64ID-NEXT:    fmv.w.x fa4, a0
-; RV64ID-NEXT:    fmv.w.x fa3, zero
-; RV64ID-NEXT:    fmax.s fa4, fa4, fa3
-; RV64ID-NEXT:    fmin.s fa5, fa4, fa5
+; RV64ID-NEXT:    fmin.s fa5, fa5, fa4
 ; RV64ID-NEXT:    fcvt.lu.s a0, fa5, rtz
 ; RV64ID-NEXT:    ret
 start:
@@ -492,8 +500,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
 ; RV32IZFBFMIN-NEXT:  # %bb.1: # %start
 ; RV32IZFBFMIN-NEXT:    mv a2, a1
 ; RV32IZFBFMIN-NEXT:  .LBB10_2: # %start
-; RV32IZFBFMIN-NEXT:    lui a1, %hi(.LCPI10_0)
-; RV32IZFBFMIN-NEXT:    flw fa5, %lo(.LCPI10_0)(a1)
+; RV32IZFBFMIN-NEXT:    lui a1, 389120
+; RV32IZFBFMIN-NEXT:    addi a1, a1, -1
+; RV32IZFBFMIN-NEXT:    fmv.w.x fa5, a1
 ; RV32IZFBFMIN-NEXT:    flt.s a3, fa5, fs0
 ; RV32IZFBFMIN-NEXT:    beqz a3, .LBB10_4
 ; RV32IZFBFMIN-NEXT:  # %bb.3:
@@ -502,9 +511,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
 ; RV32IZFBFMIN-NEXT:    feq.s a1, fs0, fs0
 ; RV32IZFBFMIN-NEXT:    neg a4, a1
 ; RV32IZFBFMIN-NEXT:    and a1, a4, a2
+; RV32IZFBFMIN-NEXT:    neg a2, s0
+; RV32IZFBFMIN-NEXT:    and a0, a2, a0
 ; RV32IZFBFMIN-NEXT:    neg a2, a3
-; RV32IZFBFMIN-NEXT:    neg a3, s0
-; RV32IZFBFMIN-NEXT:    and a0, a3, a0
 ; RV32IZFBFMIN-NEXT:    or a0, a2, a0
 ; RV32IZFBFMIN-NEXT:    and a0, a4, a0
 ; RV32IZFBFMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -531,8 +540,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
 ; R32IDZFBFMIN-NEXT:  # %bb.1: # %start
 ; R32IDZFBFMIN-NEXT:    mv a2, a1
 ; R32IDZFBFMIN-NEXT:  .LBB10_2: # %start
-; R32IDZFBFMIN-NEXT:    lui a1, %hi(.LCPI10_0)
-; R32IDZFBFMIN-NEXT:    flw fa5, %lo(.LCPI10_0)(a1)
+; R32IDZFBFMIN-NEXT:    lui a1, 389120
+; R32IDZFBFMIN-NEXT:    addi a1, a1, -1
+; R32IDZFBFMIN-NEXT:    fmv.w.x fa5, a1
 ; R32IDZFBFMIN-NEXT:    flt.s a3, fa5, fs0
 ; R32IDZFBFMIN-NEXT:    beqz a3, .LBB10_4
 ; R32IDZFBFMIN-NEXT:  # %bb.3:
@@ -541,9 +551,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
 ; R32IDZFBFMIN-NEXT:    feq.s a1, fs0, fs0
 ; R32IDZFBFMIN-NEXT:    neg a4, a1
 ; R32IDZFBFMIN-NEXT:    and a1, a4, a2
+; R32IDZFBFMIN-NEXT:    neg a2, s0
+; R32IDZFBFMIN-NEXT:    and a0, a2, a0
 ; R32IDZFBFMIN-NEXT:    neg a2, a3
-; R32IDZFBFMIN-NEXT:    neg a3, s0
-; R32IDZFBFMIN-NEXT:    and a0, a3, a0
 ; R32IDZFBFMIN-NEXT:    or a0, a2, a0
 ; R32IDZFBFMIN-NEXT:    and a0, a4, a0
 ; R32IDZFBFMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -572,8 +582,9 @@ define i64 @fcvt_l_bf16_sat(bfloat %a) nounwind {
 ; RV32ID-NEXT:  # %bb.1: # %start
 ; RV32ID-NEXT:    mv a2, a1
 ; RV32ID-NEXT:  .LBB10_2: # %start
-; RV32ID-NEXT:    lui a1, %hi(.LCPI10_0)
-; RV32ID-NEXT:    flw fa5, %lo(.LCPI10_0)(a1)
+; RV32ID-NEXT:    lui a1, 389120
+; RV32ID-NEXT:    addi a1, a1, -1
+; RV32ID-NEXT:    fmv.w.x fa5, a1
 ; RV32ID-NEXT:    flt.s a3, fa5, fs0
 ; RV32ID-NEXT:    beqz a3, .LBB10_4
 ; RV32ID-NEXT:  # %bb.3:
@@ -665,30 +676,59 @@ define i64 @fcvt_lu_bf16(bfloat %a) nounwind {
 }
 
 define i64 @fcvt_lu_bf16_sat(bfloat %a) nounwind {
-; CHECK32ZFBFMIN-LABEL: fcvt_lu_bf16_sat:
-; CHECK32ZFBFMIN:       # %bb.0: # %start
-; CHECK32ZFBFMIN-NEXT:    addi sp, sp, -16
-; CHECK32ZFBFMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; CHECK32ZFBFMIN-NEXT:    lui a0, %hi(.LCPI12_0)
-; CHECK32ZFBFMIN-NEXT:    flw fa5, %lo(.LCPI12_0)(a0)
-; CHECK32ZFBFMIN-NEXT:    fcvt.s.bf16 fa0, fa0
-; CHECK32ZFBFMIN-NEXT:    flt.s a0, fa5, fa0
-; CHECK32ZFBFMIN-NEXT:    neg s0, a0
-; CHECK32ZFBFMIN-NEXT:    fmv.w.x fa5, zero
-; CHECK32ZFBFMIN-NEXT:    fle.s a0, fa5, fa0
-; CHECK32ZFBFMIN-NEXT:    neg s1, a0
-; CHECK32ZFBFMIN-NEXT:    call __fixunssfdi@plt
-; CHECK32ZFBFMIN-NEXT:    and a0, s1, a0
-; CHECK32ZFBFMIN-NEXT:    or a0, s0, a0
-; CHECK32ZFBFMIN-NEXT:    and a1, s1, a1
-; CHECK32ZFBFMIN-NEXT:    or a1, s0, a1
-; CHECK32ZFBFMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
-; CHECK32ZFBFMIN-NEXT:    addi sp, sp, 16
-; CHECK32ZFBFMIN-NEXT:    ret
+; RV32IZFBFMIN-LABEL: fcvt_lu_bf16_sat:
+; RV32IZFBFMIN:       # %bb.0: # %start
+; RV32IZFBFMIN-NEXT:    addi sp, sp, -16
+; RV32IZFBFMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT:    fsw fs0, 4(sp) # 4-byte Folded Spill
+; RV32IZFBFMIN-NEXT:    fcvt.s.bf16 fs0, fa0
+; RV32IZFBFMIN-NEXT:    fmv.w.x fa5, zero
+; RV32IZFBFMIN-NEXT:    fle.s a0, fa5, fs0
+; RV32IZFBFMIN-NEXT:    neg s0, a0
+; RV32IZFBFMIN-NEXT:    fmv.s fa0, fs0
+; RV32IZFBFMIN-NEXT:    call __fixunssfdi@plt
+; RV32IZFBFMIN-NEXT:    and a0, s0, a0
+; RV32IZFBFMIN-NEXT:    lui a2, 391168
+; RV32IZFBFMIN-NEXT:    addi a2, a2, -1
+; RV32IZFBFMIN-NEXT:    fmv.w.x fa5, a2
+; RV32IZFBFMIN-NEXT:    flt.s a2, fa5, fs0
+; RV32IZFBFMIN-NEXT:    neg a2, a2
+; RV32IZFBFMIN-NEXT:    or a0, a2, a0
+; RV32IZFBFMIN-NEXT:    and a1, s0, a1
+; RV32IZFBFMIN-NEXT:    or a1, a2, a1
+; RV32IZFBFMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT:    flw fs0, 4(sp) # 4-byte Folded Reload
+; RV32IZFBFMIN-NEXT:    addi sp, sp, 16
+; RV32IZFBFMIN-NEXT:    ret
+;
+; R32IDZFBFMIN-LABEL: fcvt_lu_bf16_sat:
+; R32IDZFBFMIN:       # %bb.0: # %start
+; R32IDZFBFMIN-NEXT:    addi sp, sp, -16
+; R32IDZFBFMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; R32IDZFBFMIN-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; R32IDZFBFMIN-NEXT:    fsd fs0, 0(sp) # 8-byte Folded Spill
+; R32IDZFBFMIN-NEXT:    fcvt.s.bf16 fs0, fa0
+; R32IDZFBFMIN-NEXT:    fmv.w.x fa5, zero
+; R32IDZFBFMIN-NEXT:    fle.s a0, fa5, fs0
+; R32IDZFBFMIN-NEXT:    neg s0, a0
+; R32IDZFBFMIN-NEXT:    fmv.s fa0, fs0
+; R32IDZFBFMIN-NEXT:    call __fixunssfdi@plt
+; R32IDZFBFMIN-NEXT:    and a0, s0, a0
+; R32IDZFBFMIN-NEXT:    lui a2, 391168
+; R32IDZFBFMIN-NEXT:    addi a2, a2, -1
+; R32IDZFBFMIN-NEXT:    fmv.w.x fa5, a2
+; R32IDZFBFMIN-NEXT:    flt.s a2, fa5, fs0
+; R32IDZFBFMIN-NEXT:    neg a2, a2
+; R32IDZFBFMIN-NEXT:    or a0, a2, a0
+; R32IDZFBFMIN-NEXT:    and a1, s0, a1
+; R32IDZFBFMIN-NEXT:    or a1, a2, a1
+; R32IDZFBFMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; R32IDZFBFMIN-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; R32IDZFBFMIN-NEXT:    fld fs0, 0(sp) # 8-byte Folded Reload
+; R32IDZFBFMIN-NEXT:    addi sp, sp, 16
+; R32IDZFBFMIN-NEXT:    ret
 ;
 ; RV32ID-LABEL: fcvt_lu_bf16_sat:
 ; RV32ID:       # %bb.0: # %start
@@ -696,11 +736,12 @@ define i64 @fcvt_lu_bf16_sat(bfloat %a) nounwind {
 ; RV32ID-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32ID-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
 ; RV32ID-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; RV32ID-NEXT:    lui a0, %hi(.LCPI12_0)
-; RV32ID-NEXT:    flw fa5, %lo(.LCPI12_0)(a0)
 ; RV32ID-NEXT:    fmv.x.w a0, fa0
 ; RV32ID-NEXT:    slli a0, a0, 16
 ; RV32ID-NEXT:    fmv.w.x fa0, a0
+; RV32ID-NEXT:    lui a0, 391168
+; RV32ID-NEXT:    addi a0, a0, -1
+; RV32ID-NEXT:    fmv.w.x fa5, a0
 ; RV32ID-NEXT:    flt.s a0, fa5, fa0
 ; RV32ID-NEXT:    neg s0, a0
 ; RV32ID-NEXT:    fmv.w.x fa5, zero
diff --git a/llvm/test/CodeGen/RISCV/bfloat-imm.ll b/llvm/test/CodeGen/RISCV/bfloat-imm.ll
index cd4e960b5a062a0..ed2c6f59f8fd576 100644
--- a/llvm/test/CodeGen/RISCV/bfloat-imm.ll
+++ b/llvm/test/CodeGen/RISCV/bfloat-imm.ll
@@ -1,14 +1,15 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
 ; RUN: llc -mtriple=riscv32 -mattr=+experimental-zfbfmin -verify-machineinstrs \
-; RUN:   -target-abi ilp32f < %s | FileCheck %s
+; RUN:   -target-abi ilp32f < %s | FileCheck --check-prefixes=CHECK %s
 ; RUN: llc -mtriple=riscv64 -mattr=+experimental-zfbfmin -verify-machineinstrs \
-; RUN:   -target-abi lp64f < %s | FileCheck %s
+; RUN:   -target-abi lp64f < %s | FileCheck --check-prefixes=CHECK %s
 
 define bfloat @bfloat_imm() nounwind {
 ; CHECK-LABEL: bfloat_imm:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
-; CHECK-NEXT:    flh fa0, %lo(.LCPI0_0)(a0)
+; CHECK-NEXT:    lui a0, 4
+; CHECK-NEXT:    addi a0, a0, 64
+; CHECK-NEXT:    fmv.h.x fa0, a0
 ; CHECK-NEXT:    ret
   ret bfloat 3.0
 }
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-half.ll b/llvm/test/CodeGen/RISCV/calling-conv-half.ll
index 6587f0c8c5af7bf..ac0aa7b620c616f 100644
--- a/llvm/test/CodeGen/RISCV/calling-conv-half.ll
+++ b/llvm/test/CodeGen/RISCV/calling-conv-half.ll
@@ -415,8 +415,9 @@ define i32 @caller_half_on_stack() nounwind {
 ; RV32-ILP32F:       # %bb.0:
 ; RV32-ILP32F-NEXT:    addi sp, sp, -16
 ; RV32-ILP32F-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ILP32F-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV32-ILP32F-NEXT:    flw fa0, %lo(.LCPI3_0)(a0)
+; RV32-ILP32F-NEXT:    lui a0, 1048565
+; RV32-ILP32F-NEXT:    addi a0, a0, -1792
+; RV32-ILP32F-NEXT:    fmv.w.x fa0, a0
 ; RV32-ILP32F-NEXT:    li a0, 1
 ; RV32-ILP32F-NEXT:    li a1, 2
 ; RV32-ILP32F-NEXT:    li a2, 3
@@ -434,8 +435,9 @@ define i32 @caller_half_on_stack() nounwind {
 ; RV64-LP64F:       # %bb.0:
 ; RV64-LP64F-NEXT:    addi sp, sp, -16
 ; RV64-LP64F-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64-LP64F-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV64-LP64F-NEXT:    flw fa0, %lo(.LCPI3_0)(a0)
+; RV64-LP64F-NEXT:    lui a0, 1048565
+; RV64-LP64F-NEXT:    addi a0, a0, -1792
+; RV64-LP64F-NEXT:    fmv.w.x fa0, a0
 ; RV64-LP64F-NEXT:    li a0, 1
 ; RV64-LP64F-NEXT:    li a1, 2
 ; RV64-LP64F-NEXT:    li a2, 3
@@ -453,8 +455,9 @@ define i32 @caller_half_on_stack() nounwind {
 ; RV32-ILP32ZFHMIN:       # %bb.0:
 ; RV32-ILP32ZFHMIN-NEXT:    addi sp, sp, -16
 ; RV32-ILP32ZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-ILP32ZFHMIN-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV32-ILP32ZFHMIN-NEXT:    flh fa0, %lo(.LCPI3_0)(a0)
+; RV32-ILP32ZFHMIN-NEXT:    lui a0, 5
+; RV32-ILP32ZFHMIN-NEXT:    addi a0, a0, -1792
+; RV32-ILP32ZFHMIN-NEXT:    fmv.h.x fa0, a0
 ; RV32-ILP32ZFHMIN-NEXT:    li a0, 1
 ; RV32-ILP32ZFHMIN-NEXT:    li a1, 2
 ; RV32-ILP32ZFHMIN-NEXT:    li a2, 3
@@ -472,8 +475,9 @@ define i32 @caller_half_on_stack() nounwind {
 ; RV64-LP64ZFHMIN:       # %bb.0:
 ; RV64-LP64ZFHMIN-NEXT:    addi sp, sp, -16
 ; RV64-LP64ZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64-LP64ZFHMIN-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV64-LP64ZFHMIN-NEXT:    flh fa0, %lo(.LCPI3_0)(a0)
+; RV64-LP64ZFHMIN-NEXT:    lui a0, 5
+; RV64-LP64ZFHMIN-NEXT:    addi a0, a0, -1792
+; RV64-LP64ZFHMIN-NEXT:    fmv.h.x fa0, a0
 ; RV64-LP64ZFHMIN-NEXT:    li a0, 1
 ; RV64-LP64ZFHMIN-NEXT:    li a1, 2
 ; RV64-LP64ZFHMIN-NEXT:    li a2, 3
@@ -511,33 +515,38 @@ define half @callee_half_ret() nounwind {
 ;
 ; RV64IF-LABEL: callee_half_ret:
 ; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lui a0, %hi(.LCPI4_0)
-; RV64IF-NEXT:    flw fa5, %lo(.LCPI4_0)(a0)
+; RV64IF-NEXT:    lui a0, 1048564
+; RV64IF-NEXT:    addi a0, a0, -1024
+; RV64IF-NEXT:    fmv.w.x fa5, a0
 ; RV64IF-NEXT:    fmv.x.w a0, fa5
 ; RV64IF-NEXT:    ret
 ;
 ; RV32-ILP32F-LABEL: callee_half_ret:
 ; RV32-ILP32F:       # %bb.0:
-; RV32-ILP32F-NEXT:    lui a0, %hi(.LCPI4_0)
-; RV32-ILP32F-NEXT:    flw fa0, %lo(.LCPI4_0)(a0)
+; RV32-ILP32F-NEXT:    lui a0, 1048564
+; RV32-ILP32F-NEXT:    addi a0, a0, -1024
+; RV32-ILP32F-NEXT:    fmv.w.x fa0, a0
 ; RV32-ILP32F-NEXT:    ret
 ;
 ; RV64-LP64F-LABEL: callee_half_ret:
 ; RV64-LP64F:       # %bb.0:
-; RV64-LP64F-NEXT:    lui a0, %hi(.LCPI4_0)
-; RV64-LP64F-NEXT:    flw fa0, %lo(.LCPI4_0)(a0)
+; RV64-LP64F-NEXT:    lui a0, 1048564
+; RV64-LP64F-NEXT:    addi a0, a0, -1024
+; RV64-LP64F-NEXT:    fmv.w.x fa0, a0
 ; RV64-LP64F-NEXT:    ret
 ;
 ; RV32-ILP32ZFHMIN-LABEL: callee_half_ret:
 ; RV32-ILP32ZFHMIN:       # %bb.0:
-; RV32-ILP32ZFHMIN-NEXT:    lui a0, %hi(.LCPI4_0)
-; RV32-ILP32ZFHMIN-NEXT:    flh fa0, %lo(.LCPI4_0)(a0)
+; RV32-ILP32ZFHMIN-NEXT:    li a0, 15
+; RV32-ILP32ZFHMIN-NEXT:    slli a0, a0, 10
+; RV32-ILP32ZFHMIN-NEXT:    fmv.h.x fa0, a0
 ; RV32-ILP32ZFHMIN-NEXT:    ret
 ;
 ; RV64-LP64ZFHMIN-LABEL: callee_half_ret:
 ; RV64-LP64ZFHMIN:       # %bb.0:
-; RV64-LP64ZFHMIN-NEXT:    lui a0, %hi(.LCPI4_0)
-; RV64-LP64ZFHMIN-NEXT:    flh fa0, %lo(.LCPI4_0)(a0)
+; RV64-LP64ZFHMIN-NEXT:    li a0, 15
+; RV64-LP64ZFHMIN-NEXT:    slli a0, a0, 10
+; RV64-LP64ZFHMIN-NEXT:    fmv.h.x fa0, a0
 ; RV64-LP64ZFHMIN-NEXT:    ret
   ret half 1.0
 }
diff --git a/llvm/test/CodeGen/RISCV/codemodel-lowering.ll b/llvm/test/CodeGen/RISCV/codemodel-lowering.ll
index 617155b31976187..562f2fd0c270cf7 100644
--- a/llvm/test/CodeGen/RISCV/codemodel-lowering.ll
+++ b/llvm/test/CodeGen/RISCV/codemodel-lowering.ll
@@ -124,16 +124,17 @@ indirectgoto:
 define float @lower_constantpool(float %a) nounwind {
 ; RV32I-SMALL-LABEL: lower_constantpool:
 ; RV32I-SMALL:       # %bb.0:
-; RV32I-SMALL-NEXT:    lui a0, %hi(.LCPI3_0)
-; RV32I-SMA...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@asb
Copy link
Contributor

asb commented Oct 6, 2023

A load directly to FPR (especially if you can assume it's likely cached) vs a GPR to FPR move isn't trivial to reason about and of course is very microarch dependent, but this is probably a sensible default. @topperc - any insight on how this would impact the SiFive microarchitectures?

@preames
Copy link
Collaborator Author

preames commented Oct 6, 2023

Throwing this out there for consideration.

The vast majority of the vector test diff duplication are for a single double constant: 4503599627370496. This number happens to be precisely representable as a single precision value as well, and thus we could use the sequence. LUI + FMV.S.X + FCVT.D.S to form it. If we preferred this form on both rv32 and rv64, it'd make the test diffs collapse.

Not sure that's a good idea though as the cost of the FMV + FCVT may be higher than the slli + fmv we'd use in this change.

@preames preames force-pushed the pr-riscv-fp-constant-mat-threshold branch from 3e13514 to 1a175b1 Compare October 24, 2023 21:02
@preames
Copy link
Collaborator Author

preames commented Oct 26, 2023

ping

@@ -1,14 +1,15 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=riscv32 -mattr=+experimental-zfbfmin -verify-machineinstrs \
; RUN: -target-abi ilp32f < %s | FileCheck %s
; RUN: -target-abi ilp32f < %s | FileCheck --check-prefixes=CHECK %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add --check-prefixes here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. I think this was a bad rebase. New version pending shortly.

int64_t Val = Imm.bitcastToAPInt().getSExtValue();
RISCVMatInt::InstSeq Seq =
RISCVMatInt::generateInstSeq(Val, Subtarget.getFeatureBits());
return Seq.size() + 1 <= Subtarget.getMaxBuildIntsCost();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop the +1 for Z*inx?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but let's do that in a separate commit to keep diffs understandable.

We were being very conservative about forming floating point constants via an integer materialization sequence and a fmv.  With the default threshold of 2, we'd only do so if the bit sequence could be produced with a single instruction (LUI, ADDI, or sometimes BSETI).

This change removes the separate threshold entirely, and ties the floating point expansion costing to the integer costing threshold.  The effect of this is that the default threshold increases by 2, and that more sequences are materialized via integers - avoiding constant pool loads.  This is sufficient to cover all constants of types fp16, bf16, and fp32.  Many f64 constants are covered as welll, but not all.

One downside of this change is that double constants for ELEN=64 configurations on rv32 can't be formed via the same integer sequences.  This causes a forking in the test coverage which is more than a tad ugly.  Ideas on how to reduce this, or restructure tests to avoid it are more than welcome.
Since CompressionCost was not set, the code is equivalent.
@preames preames force-pushed the pr-riscv-fp-constant-mat-threshold branch from 1a175b1 to de38ca2 Compare October 26, 2023 20:40
@@ -8920,6 +8914,17 @@ SDValue RISCVTargetLowering::lowerINSERT_SUBVECTOR(SDValue Op,
return DAG.getBitcast(Op.getValueType(), SubVec);
}

// Shrink down Vec so we're performing the slideup on a smaller LMUL.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got an extra commit in here?

@preames
Copy link
Collaborator Author

preames commented Feb 2, 2024

Abandoning due to lack of progress in review and low priority.

@preames preames closed this Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants