Skip to content

Conversation

@4vtomat
Copy link
Member

@4vtomat 4vtomat commented Nov 11, 2025

Following #112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa

@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Brandon Wu (4vtomat)

Changes

Following #112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa


Patch is 298.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/167541.diff

19 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+276-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+276-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum-vp.ll (+184-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum-vp.ll (+184-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+242-34)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+226-34)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+276-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+276-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+276-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+65-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfadd-vp.ll (+299-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfdiv-vp.ll (+299-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmax-vp.ll (+303-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmin-vp.ll (+303-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmul-vp.ll (+299-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmuladd-vp.ll (+174-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfsqrt-vp.ll (+120-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfsub-vp.ll (+299-4)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 637f1943b8511..4a7d903af6c0c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1621,7 +1621,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
           if (!isTypeLegal(F32VecVT))
             continue;
           setOperationPromotedToType(ZvfhminZvfbfminPromoteOps, VT, F32VecVT);
-          // TODO: Promote VP ops to fp32.
+          setOperationPromotedToType(ZvfhminZvfbfminPromoteVPOps, VT, F32VecVT);
           continue;
         }
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
index 4b42c517379ad..602fe5d8a634b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
@@ -1,13 +1,221 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+v -target-abi=ilp32d \
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+zvfbfmin,+v -target-abi=ilp32d \
 ; RUN:   -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFH,RV32ZVFH
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+v -target-abi=lp64d \
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+zvfbfmin,+v -target-abi=lp64d \
 ; RUN:   -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFH,RV64ZVFH
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+v -target-abi=ilp32d \
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+zvfbfmin,+v -target-abi=ilp32d \
 ; RUN:   -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFHMIN,RV32ZVFHMIN
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+v -target-abi=lp64d \
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+zvfbfmin,+v -target-abi=lp64d \
 ; RUN:   -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFHMIN,RV64ZVFHMIN
 
+
+define <2 x bfloat> @vceil_vv_v2bf16(<2 x bfloat> %va, <2 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v2bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    vmv1r.v v9, v0
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vmv1r.v v8, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v11, v10, v0.t
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT:    vmflt.vf v8, v11, fa5, v0.t
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vmv1r.v v0, v8
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfcvt.x.f.v v11, v10, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v11, v11, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v10, v11, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v0, v9
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT:    ret
+  %v = call <2 x bfloat> @llvm.vp.ceil.v2bf16(<2 x bfloat> %va, <2 x i1> %m, i32 %evl)
+  ret <2 x bfloat> %v
+}
+
+define <2 x bfloat> @vceil_vv_v2bf16_unmasked(<2 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v2bf16_unmasked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v8, v9
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vmflt.vf v0, v8, fa5
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vfcvt.x.f.v v8, v9, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v9, v8, v9, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v9
+; CHECK-NEXT:    ret
+  %v = call <2 x bfloat> @llvm.vp.ceil.v2bf16(<2 x bfloat> %va, <2 x i1> splat (i1 true), i32 %evl)
+  ret <2 x bfloat> %v
+}
+
+define <4 x bfloat> @vceil_vv_v4bf16(<4 x bfloat> %va, <4 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    vmv1r.v v9, v0
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vmv1r.v v8, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v11, v10, v0.t
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT:    vmflt.vf v8, v11, fa5, v0.t
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vmv.v.v v0, v8
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfcvt.x.f.v v11, v10, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v11, v11, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v10, v11, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v0, v9
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT:    ret
+  %v = call <4 x bfloat> @llvm.vp.ceil.v4bf16(<4 x bfloat> %va, <4 x i1> %m, i32 %evl)
+  ret <4 x bfloat> %v
+}
+
+define <4 x bfloat> @vceil_vv_v4bf16_unmasked(<4 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v4bf16_unmasked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v8, v9
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vmflt.vf v0, v8, fa5
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vfcvt.x.f.v v8, v9, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v9, v8, v9, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v9
+; CHECK-NEXT:    ret
+  %v = call <4 x bfloat> @llvm.vp.ceil.v4bf16(<4 x bfloat> %va, <4 x i1> splat (i1 true), i32 %evl)
+  ret <4 x bfloat> %v
+}
+
+define <8 x bfloat> @vceil_vv_v8bf16(<8 x bfloat> %va, <8 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    vmv1r.v v9, v0
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vmv1r.v v8, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v10, v0.t
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT:    vmflt.vf v8, v12, fa5, v0.t
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vmv1r.v v0, v8
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfcvt.x.f.v v12, v10, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v10, v12, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v0, v9
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT:    ret
+  %v = call <8 x bfloat> @llvm.vp.ceil.v8bf16(<8 x bfloat> %va, <8 x i1> %m, i32 %evl)
+  ret <8 x bfloat> %v
+}
+
+define <8 x bfloat> @vceil_vv_v8bf16_unmasked(<8 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v8bf16_unmasked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v8, v10
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vmflt.vf v0, v8, fa5
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vfcvt.x.f.v v8, v10, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v10, v8, v10, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v10
+; CHECK-NEXT:    ret
+  %v = call <8 x bfloat> @llvm.vp.ceil.v8bf16(<8 x bfloat> %va, <8 x i1> splat (i1 true), i32 %evl)
+  ret <8 x bfloat> %v
+}
+
+define <16 x bfloat> @vceil_vv_v16bf16(<16 x bfloat> %va, <16 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v16bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    vmv1r.v v10, v0
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v12, v8, v0.t
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vmv1r.v v8, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v12, v0.t
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT:    vmflt.vf v8, v16, fa5, v0.t
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vmv1r.v v0, v8
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfcvt.x.f.v v16, v12, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v12, v16, v12, v0.t
+; CHECK-NEXT:    vmv1r.v v0, v10
+; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v12, v0.t
+; CHECK-NEXT:    ret
+  %v = call <16 x bfloat> @llvm.vp.ceil.v16bf16(<16 x bfloat> %va, <16 x i1> %m, i32 %evl)
+  ret <16 x bfloat> %v
+}
+
+define <16 x bfloat> @vceil_vv_v16bf16_unmasked(<16 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v16bf16_unmasked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    vfwcvtbf16.f.f.v v12, v8
+; CHECK-NEXT:    lui a0, 307200
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v8, v12
+; CHECK-NEXT:    fmv.w.x fa5, a0
+; CHECK-NEXT:    vmflt.vf v0, v8, fa5
+; CHECK-NEXT:    fsrmi a0, 3
+; CHECK-NEXT:    vfcvt.x.f.v v8, v12, v0.t
+; CHECK-NEXT:    fsrm a0
+; CHECK-NEXT:    vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT:    vfsgnj.vv v12, v8, v12, v0.t
+; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfncvtbf16.f.f.w v8, v12
+; CHECK-NEXT:    ret
+  %v = call <16 x bfloat> @llvm.vp.ceil.v16bf16(<16 x bfloat> %va, <16 x i1> splat (i1 true), i32 %evl)
+  ret <16 x bfloat> %v
+}
 declare <2 x half> @llvm.vp.ceil.v2f16(<2 x half>, <2 x i1>, i32)
 
 define <2 x half> @vp_ceil_v2f16(<2 x half> %va, <2 x i1> %m, i32 zeroext %evl) {
@@ -539,8 +747,8 @@ declare <2 x double> @llvm.vp.ceil.v2f64(<2 x double>, <2 x i1>, i32)
 define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %evl) {
 ; RV32ZVFH-LABEL: vp_ceil_v2f64:
 ; RV32ZVFH:       # %bb.0:
-; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI16_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI16_0)(a1)
+; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI24_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI24_0)(a1)
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; RV32ZVFH-NEXT:    vfabs.v v9, v8, v0.t
 ; RV32ZVFH-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -574,8 +782,8 @@ define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %e
 ;
 ; RV32ZVFHMIN-LABEL: vp_ceil_v2f64:
 ; RV32ZVFHMIN:       # %bb.0:
-; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI16_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI16_0)(a1)
+; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI24_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI24_0)(a1)
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; RV32ZVFHMIN-NEXT:    vfabs.v v9, v8, v0.t
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -613,8 +821,8 @@ define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %e
 define <2 x double> @vp_ceil_v2f64_unmasked(<2 x double> %va, i32 zeroext %evl) {
 ; RV32ZVFH-LABEL: vp_ceil_v2f64_unmasked:
 ; RV32ZVFH:       # %bb.0:
-; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI17_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI17_0)(a1)
+; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI25_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI25_0)(a1)
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; RV32ZVFH-NEXT:    vfabs.v v9, v8
 ; RV32ZVFH-NEXT:    vmflt.vf v0, v9, fa5
@@ -644,8 +852,8 @@ define <2 x double> @vp_ceil_v2f64_unmasked(<2 x double> %va, i32 zeroext %evl)
 ;
 ; RV32ZVFHMIN-LABEL: vp_ceil_v2f64_unmasked:
 ; RV32ZVFHMIN:       # %bb.0:
-; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI17_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI17_0)(a1)
+; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI25_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI25_0)(a1)
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; RV32ZVFHMIN-NEXT:    vfabs.v v9, v8
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v0, v9, fa5
@@ -683,8 +891,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
 ; RV32ZVFH:       # %bb.0:
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
 ; RV32ZVFH-NEXT:    vmv1r.v v10, v0
-; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI18_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI18_0)(a0)
+; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI26_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI26_0)(a0)
 ; RV32ZVFH-NEXT:    vfabs.v v12, v8, v0.t
 ; RV32ZVFH-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
 ; RV32ZVFH-NEXT:    vmflt.vf v10, v12, fa5, v0.t
@@ -722,8 +930,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
 ; RV32ZVFHMIN:       # %bb.0:
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
 ; RV32ZVFHMIN-NEXT:    vmv1r.v v10, v0
-; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI18_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI18_0)(a0)
+; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI26_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI26_0)(a0)
 ; RV32ZVFHMIN-NEXT:    vfabs.v v12, v8, v0.t
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v10, v12, fa5, v0.t
@@ -763,8 +971,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
 define <4 x double> @vp_ceil_v4f64_unmasked(<4 x double> %va, i32 zeroext %evl) {
 ; RV32ZVFH-LABEL: vp_ceil_v4f64_unmasked:
 ; RV32ZVFH:       # %bb.0:
-; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI19_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI19_0)(a1)
+; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI27_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI27_0)(a1)
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
 ; RV32ZVFH-NEXT:    vfabs.v v10, v8
 ; RV32ZVFH-NEXT:    vmflt.vf v0, v10, fa5
@@ -794,8 +1002,8 @@ define <4 x double> @vp_ceil_v4f64_unmasked(<4 x double> %va, i32 zeroext %evl)
 ;
 ; RV32ZVFHMIN-LABEL: vp_ceil_v4f64_unmasked:
 ; RV32ZVFHMIN:       # %bb.0:
-; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI19_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI19_0)(a1)
+; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI27_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI27_0)(a1)
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
 ; RV32ZVFHMIN-NEXT:    vfabs.v v10, v8
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v0, v10, fa5
@@ -833,8 +1041,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
 ; RV32ZVFH:       # %bb.0:
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
 ; RV32ZVFH-NEXT:    vmv1r.v v12, v0
-; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI20_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI20_0)(a0)
+; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI28_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI28_0)(a0)
 ; RV32ZVFH-NEXT:    vfabs.v v16, v8, v0.t
 ; RV32ZVFH-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
 ; RV32ZVFH-NEXT:    vmflt.vf v12, v16, fa5, v0.t
@@ -872,8 +1080,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
 ; RV32ZVFHMIN:       # %bb.0:
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
 ; RV32ZVFHMIN-NEXT:    vmv1r.v v12, v0
-; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI20_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI20_0)(a0)
+; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI28_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI28_0)(a0)
 ; RV32ZVFHMIN-NEXT:    vfabs.v v16, v8, v0.t
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v12, v16, fa5, v0.t
@@ -913,8 +1121,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
 define <8 x double> @vp_ceil_v8f64_unmasked(<8 x double> %va, i32 zeroext %evl) {
 ; RV32ZVFH-LABEL: vp_ceil_v8f64_unmasked:
 ; RV32ZVFH:       # %bb.0:
-; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI21_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI21_0)(a1)
+; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI29_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI29_0)(a1)
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
 ; RV32ZVFH-NEXT:    vfabs.v v12, v8
 ; RV32ZVFH-NEXT:    vmflt.vf v0, v12, fa5
@@ -944,8 +1152,8 @@ define <8 x double> @vp_ceil_v8f64_unmasked(<8 x double> %va, i32 zeroext %evl)
 ;
 ; RV32ZVFHMIN-LABEL: vp_ceil_v8f64_unmasked:
 ; RV32ZVFHMIN:       # %bb.0:
-; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI21_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI21_0)(a1)
+; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI29_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI29_0)(a1)
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
 ; RV32ZVFHMIN-NEXT:    vfabs.v v12, v8
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v0, v12, fa5
@@ -983,8 +1191,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
 ; RV32ZVFH:       # %bb.0:
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFH-NEXT:    vmv1r.v v16, v0
-; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI22_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI22_0)(a0)
+; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI30_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI30_0)(a0)
 ; RV32ZVFH-NEXT:    vfabs.v v24, v8, v0.t
 ; RV32ZVFH-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
 ; RV32ZVFH-NEXT:    vmflt.vf v16, v24, fa5, v0.t
@@ -1022,8 +1230,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
 ; RV32ZVFHMIN:       # %bb.0:
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFHMIN-NEXT:    vmv1r.v v16, v0
-; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI22_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI22_0)(a0)
+; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI30_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI30_0)(a0)
 ; RV32ZVFHMIN-NEXT:    vfabs.v v24, v8, v0.t
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v16, v24, fa5, v0.t
@@ -1063,8 +1271,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
 define <15 x double> @vp_ceil_v15f64_unmasked(<15 x double> %va, i32 zeroext %evl) {
 ; RV32ZVFH-LABEL: vp_ceil_v15f64_unmasked:
 ; RV32ZVFH:       # %bb.0:
-; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI23_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI23_0)(a1)
+; RV32ZVFH-NEXT:    lui a1, %hi(.LCPI31_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI31_0)(a1)
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFH-NEXT:    vfabs.v v16, v8
 ; RV32ZVFH-NEXT:    vmflt.vf v0, v16, fa5
@@ -1094,8 +1302,8 @@ define <15 x double> @vp_ceil_v15f64_unmasked(<15 x double> %va, i32 zeroext %ev
 ;
 ; RV32ZVFHMIN-LABEL: vp_ceil_v15f64_unmasked:
 ; RV32ZVFHMIN:       # %bb.0:
-; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI23_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI23_0)(a1)
+; RV32ZVFHMIN-NEXT:    lui a1, %hi(.LCPI31_0)
+; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI31_0)(a1)
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFHMIN-NEXT:    vfabs.v v16, v8
 ; RV32ZVFHMIN-NEXT:    vmflt.vf v0, v16, fa5
@@ -1133,8 +1341,8 @@ define <16 x double> @vp_ceil_v16f64(<16 x double> %va, <16 x i1> %m, i32 zeroex
 ; RV32ZVFH:       # %bb.0:
 ; RV32ZVFH-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFH-NEXT:    vmv1r.v v16, v0
-; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI24_0)
-; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI24_0)(a0)
+; RV32ZVFH-NEXT:    lui a0, %hi(.LCPI32_0)
+; RV32ZVFH-NEXT:    fld fa5, %lo(.LCPI32_0)(a0)
 ; RV32ZVFH-NEXT:    vfabs.v v24, v8, v0.t
 ; RV32ZVFH-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
 ; RV32ZVFH-NEXT:    vmflt.vf v16, v24, fa5, v0.t
@@ -1172,8 +1380,8 @@ define <16 x double> @vp_ceil_v16f64(<16 x double> %va, <16 x i1> %m, i32 zeroex
 ; RV32ZVFHMIN:       # %bb.0:
 ; RV32ZVFHMIN-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32ZVFHMIN-NEXT:    vmv1r.v v16, v0
-; RV32ZVFHMIN-NEXT:    lui a0, %hi(.LCPI24_0)
-; RV32ZVFHMIN-NEXT:    fld fa5, %lo(.LCPI24_0)(a0)
+; RV32ZV...
[truncated]

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but FYI I think long term the plan is that we will eventually remove support for these VP intrinsics in the RISC-V backend: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

Is this something that you were carrying downstream? Happy to have this patch upstream if it makes merging easier later on

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but FYI I think long term the plan is that we will eventually remove support for these VP intrinsics in the RISC-V backend: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

Is this something that you were carrying downstream? Happy to have this patch upstream if it makes merging easier later on

continue;
setOperationPromotedToType(ZvfhminZvfbfminPromoteOps, VT, F32VecVT);
// TODO: Promote VP ops to fp32.
setOperationPromotedToType(ZvfhminZvfbfminPromoteVPOps, VT, F32VecVT);
Copy link
Collaborator

@topperc topperc Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing tests for VP_REDUCE_FMINIMUM and VP_REDUCE_FMAXIMUM?

@4vtomat
Copy link
Member Author

4vtomat commented Nov 12, 2025

LGTM but FYI I think long term the plan is that we will eventually remove support for these VP intrinsics in the RISC-V backend: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

Is this something that you were carrying downstream? Happy to have this patch upstream if it makes merging easier later on

Umm basically I'm doing Zvfbfa codegen and it includes normal instructions and vp intrinsics, but yeah I can just ignore vp intrinsics for now if they're gonna be removed shortly

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing approval until question about VP reductions is answered. VP reductions are not being removed so we do need to support them.

@4vtomat
Copy link
Member Author

4vtomat commented Nov 13, 2025

Removing approval until question about VP reductions is answered. VP reductions are not being removed so we do need to support them.

I think we can just add ZvfhminZvfbfminPromoteVPOps but add tests only for reduction ops so that future remove of vp operations will have less effort

Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa
This patch enable all of VP ops in ZvfhminZvfbfminPromoteVPOps but only
add tests for reduce min/max/minimum/maximum since others might be
removed soon:
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999#p-350673-intrinsics-considered-trivial-3
@4vtomat 4vtomat force-pushed the bf16_fixed_promote_vp_op branch from c1f7897 to 9340c4d Compare November 13, 2025 09:04
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@4vtomat 4vtomat merged commit 5e4cdd6 into llvm:main Nov 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants