-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[VP][RISCV] Enable promotion on fixed-length vp intrinsics with zvfbfmin #167541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-risc-v Author: Brandon Wu (4vtomat) ChangesFollowing #112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa Patch is 298.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/167541.diff 19 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 637f1943b8511..4a7d903af6c0c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1621,7 +1621,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
if (!isTypeLegal(F32VecVT))
continue;
setOperationPromotedToType(ZvfhminZvfbfminPromoteOps, VT, F32VecVT);
- // TODO: Promote VP ops to fp32.
+ setOperationPromotedToType(ZvfhminZvfbfminPromoteVPOps, VT, F32VecVT);
continue;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
index 4b42c517379ad..602fe5d8a634b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll
@@ -1,13 +1,221 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+v -target-abi=ilp32d \
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+zvfbfmin,+v -target-abi=ilp32d \
; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFH,RV32ZVFH
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+v -target-abi=lp64d \
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+zvfbfmin,+v -target-abi=lp64d \
; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFH,RV64ZVFH
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+v -target-abi=ilp32d \
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+zvfbfmin,+v -target-abi=ilp32d \
; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFHMIN,RV32ZVFHMIN
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+v -target-abi=lp64d \
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+zvfbfmin,+v -target-abi=lp64d \
; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,ZVFHMIN,RV64ZVFHMIN
+
+define <2 x bfloat> @vceil_vv_v2bf16(<2 x bfloat> %va, <2 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v2bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vmv1r.v v9, v0
+; CHECK-NEXT: vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vmv1r.v v8, v0
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT: vfabs.v v11, v10, v0.t
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT: vmflt.vf v8, v11, fa5, v0.t
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vmv1r.v v0, v8
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT: vfcvt.x.f.v v11, v10, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v11, v11, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT: vfsgnj.vv v10, v11, v10, v0.t
+; CHECK-NEXT: vmv1r.v v0, v9
+; CHECK-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT: ret
+ %v = call <2 x bfloat> @llvm.vp.ceil.v2bf16(<2 x bfloat> %va, <2 x i1> %m, i32 %evl)
+ ret <2 x bfloat> %v
+}
+
+define <2 x bfloat> @vceil_vv_v2bf16_unmasked(<2 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v2bf16_unmasked:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT: vfabs.v v8, v9
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vmflt.vf v0, v8, fa5
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vfcvt.x.f.v v8, v9, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT: vfsgnj.vv v9, v8, v9, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v9
+; CHECK-NEXT: ret
+ %v = call <2 x bfloat> @llvm.vp.ceil.v2bf16(<2 x bfloat> %va, <2 x i1> splat (i1 true), i32 %evl)
+ ret <2 x bfloat> %v
+}
+
+define <4 x bfloat> @vceil_vv_v4bf16(<4 x bfloat> %va, <4 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v4bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT: vmv1r.v v9, v0
+; CHECK-NEXT: vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vmv1r.v v8, v0
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT: vfabs.v v11, v10, v0.t
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT: vmflt.vf v8, v11, fa5, v0.t
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vmv.v.v v0, v8
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT: vfcvt.x.f.v v11, v10, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v11, v11, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT: vfsgnj.vv v10, v11, v10, v0.t
+; CHECK-NEXT: vmv1r.v v0, v9
+; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT: ret
+ %v = call <4 x bfloat> @llvm.vp.ceil.v4bf16(<4 x bfloat> %va, <4 x i1> %m, i32 %evl)
+ ret <4 x bfloat> %v
+}
+
+define <4 x bfloat> @vceil_vv_v4bf16_unmasked(<4 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v4bf16_unmasked:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT: vfabs.v v8, v9
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vmflt.vf v0, v8, fa5
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vfcvt.x.f.v v8, v9, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu
+; CHECK-NEXT: vfsgnj.vv v9, v8, v9, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v9
+; CHECK-NEXT: ret
+ %v = call <4 x bfloat> @llvm.vp.ceil.v4bf16(<4 x bfloat> %va, <4 x i1> splat (i1 true), i32 %evl)
+ ret <4 x bfloat> %v
+}
+
+define <8 x bfloat> @vceil_vv_v8bf16(<8 x bfloat> %va, <8 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v8bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT: vmv1r.v v9, v0
+; CHECK-NEXT: vfwcvtbf16.f.f.v v10, v8, v0.t
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vmv1r.v v8, v0
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT: vfabs.v v12, v10, v0.t
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT: vmflt.vf v8, v12, fa5, v0.t
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vmv1r.v v0, v8
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT: vfcvt.x.f.v v12, v10, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v12, v12, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT: vfsgnj.vv v10, v12, v10, v0.t
+; CHECK-NEXT: vmv1r.v v0, v9
+; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v10, v0.t
+; CHECK-NEXT: ret
+ %v = call <8 x bfloat> @llvm.vp.ceil.v8bf16(<8 x bfloat> %va, <8 x i1> %m, i32 %evl)
+ ret <8 x bfloat> %v
+}
+
+define <8 x bfloat> @vceil_vv_v8bf16_unmasked(<8 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v8bf16_unmasked:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v10, v8
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT: vfabs.v v8, v10
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vmflt.vf v0, v8, fa5
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vfcvt.x.f.v v8, v10, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, mu
+; CHECK-NEXT: vfsgnj.vv v10, v8, v10, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v10
+; CHECK-NEXT: ret
+ %v = call <8 x bfloat> @llvm.vp.ceil.v8bf16(<8 x bfloat> %va, <8 x i1> splat (i1 true), i32 %evl)
+ ret <8 x bfloat> %v
+}
+
+define <16 x bfloat> @vceil_vv_v16bf16(<16 x bfloat> %va, <16 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v16bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT: vmv1r.v v10, v0
+; CHECK-NEXT: vfwcvtbf16.f.f.v v12, v8, v0.t
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vmv1r.v v8, v0
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT: vfabs.v v16, v12, v0.t
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT: vmflt.vf v8, v16, fa5, v0.t
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vmv1r.v v0, v8
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT: vfcvt.x.f.v v16, v12, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v16, v16, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT: vfsgnj.vv v12, v16, v12, v0.t
+; CHECK-NEXT: vmv1r.v v0, v10
+; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v12, v0.t
+; CHECK-NEXT: ret
+ %v = call <16 x bfloat> @llvm.vp.ceil.v16bf16(<16 x bfloat> %va, <16 x i1> %m, i32 %evl)
+ ret <16 x bfloat> %v
+}
+
+define <16 x bfloat> @vceil_vv_v16bf16_unmasked(<16 x bfloat> %va, i32 zeroext %evl) {
+; CHECK-LABEL: vceil_vv_v16bf16_unmasked:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v12, v8
+; CHECK-NEXT: lui a0, 307200
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT: vfabs.v v8, v12
+; CHECK-NEXT: fmv.w.x fa5, a0
+; CHECK-NEXT: vmflt.vf v0, v8, fa5
+; CHECK-NEXT: fsrmi a0, 3
+; CHECK-NEXT: vfcvt.x.f.v v8, v12, v0.t
+; CHECK-NEXT: fsrm a0
+; CHECK-NEXT: vfcvt.f.x.v v8, v8, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, mu
+; CHECK-NEXT: vfsgnj.vv v12, v8, v12, v0.t
+; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT: vfncvtbf16.f.f.w v8, v12
+; CHECK-NEXT: ret
+ %v = call <16 x bfloat> @llvm.vp.ceil.v16bf16(<16 x bfloat> %va, <16 x i1> splat (i1 true), i32 %evl)
+ ret <16 x bfloat> %v
+}
declare <2 x half> @llvm.vp.ceil.v2f16(<2 x half>, <2 x i1>, i32)
define <2 x half> @vp_ceil_v2f16(<2 x half> %va, <2 x i1> %m, i32 zeroext %evl) {
@@ -539,8 +747,8 @@ declare <2 x double> @llvm.vp.ceil.v2f64(<2 x double>, <2 x i1>, i32)
define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %evl) {
; RV32ZVFH-LABEL: vp_ceil_v2f64:
; RV32ZVFH: # %bb.0:
-; RV32ZVFH-NEXT: lui a1, %hi(.LCPI16_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI16_0)(a1)
+; RV32ZVFH-NEXT: lui a1, %hi(.LCPI24_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI24_0)(a1)
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m1, ta, ma
; RV32ZVFH-NEXT: vfabs.v v9, v8, v0.t
; RV32ZVFH-NEXT: vsetvli zero, zero, e64, m1, ta, mu
@@ -574,8 +782,8 @@ define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %e
;
; RV32ZVFHMIN-LABEL: vp_ceil_v2f64:
; RV32ZVFHMIN: # %bb.0:
-; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI16_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI16_0)(a1)
+; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI24_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI24_0)(a1)
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m1, ta, ma
; RV32ZVFHMIN-NEXT: vfabs.v v9, v8, v0.t
; RV32ZVFHMIN-NEXT: vsetvli zero, zero, e64, m1, ta, mu
@@ -613,8 +821,8 @@ define <2 x double> @vp_ceil_v2f64(<2 x double> %va, <2 x i1> %m, i32 zeroext %e
define <2 x double> @vp_ceil_v2f64_unmasked(<2 x double> %va, i32 zeroext %evl) {
; RV32ZVFH-LABEL: vp_ceil_v2f64_unmasked:
; RV32ZVFH: # %bb.0:
-; RV32ZVFH-NEXT: lui a1, %hi(.LCPI17_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI17_0)(a1)
+; RV32ZVFH-NEXT: lui a1, %hi(.LCPI25_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI25_0)(a1)
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m1, ta, ma
; RV32ZVFH-NEXT: vfabs.v v9, v8
; RV32ZVFH-NEXT: vmflt.vf v0, v9, fa5
@@ -644,8 +852,8 @@ define <2 x double> @vp_ceil_v2f64_unmasked(<2 x double> %va, i32 zeroext %evl)
;
; RV32ZVFHMIN-LABEL: vp_ceil_v2f64_unmasked:
; RV32ZVFHMIN: # %bb.0:
-; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI17_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI17_0)(a1)
+; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI25_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI25_0)(a1)
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m1, ta, ma
; RV32ZVFHMIN-NEXT: vfabs.v v9, v8
; RV32ZVFHMIN-NEXT: vmflt.vf v0, v9, fa5
@@ -683,8 +891,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
; RV32ZVFH: # %bb.0:
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; RV32ZVFH-NEXT: vmv1r.v v10, v0
-; RV32ZVFH-NEXT: lui a0, %hi(.LCPI18_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI18_0)(a0)
+; RV32ZVFH-NEXT: lui a0, %hi(.LCPI26_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI26_0)(a0)
; RV32ZVFH-NEXT: vfabs.v v12, v8, v0.t
; RV32ZVFH-NEXT: vsetvli zero, zero, e64, m2, ta, mu
; RV32ZVFH-NEXT: vmflt.vf v10, v12, fa5, v0.t
@@ -722,8 +930,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
; RV32ZVFHMIN: # %bb.0:
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; RV32ZVFHMIN-NEXT: vmv1r.v v10, v0
-; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI18_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI18_0)(a0)
+; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI26_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI26_0)(a0)
; RV32ZVFHMIN-NEXT: vfabs.v v12, v8, v0.t
; RV32ZVFHMIN-NEXT: vsetvli zero, zero, e64, m2, ta, mu
; RV32ZVFHMIN-NEXT: vmflt.vf v10, v12, fa5, v0.t
@@ -763,8 +971,8 @@ define <4 x double> @vp_ceil_v4f64(<4 x double> %va, <4 x i1> %m, i32 zeroext %e
define <4 x double> @vp_ceil_v4f64_unmasked(<4 x double> %va, i32 zeroext %evl) {
; RV32ZVFH-LABEL: vp_ceil_v4f64_unmasked:
; RV32ZVFH: # %bb.0:
-; RV32ZVFH-NEXT: lui a1, %hi(.LCPI19_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI19_0)(a1)
+; RV32ZVFH-NEXT: lui a1, %hi(.LCPI27_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI27_0)(a1)
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; RV32ZVFH-NEXT: vfabs.v v10, v8
; RV32ZVFH-NEXT: vmflt.vf v0, v10, fa5
@@ -794,8 +1002,8 @@ define <4 x double> @vp_ceil_v4f64_unmasked(<4 x double> %va, i32 zeroext %evl)
;
; RV32ZVFHMIN-LABEL: vp_ceil_v4f64_unmasked:
; RV32ZVFHMIN: # %bb.0:
-; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI19_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI19_0)(a1)
+; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI27_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI27_0)(a1)
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; RV32ZVFHMIN-NEXT: vfabs.v v10, v8
; RV32ZVFHMIN-NEXT: vmflt.vf v0, v10, fa5
@@ -833,8 +1041,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
; RV32ZVFH: # %bb.0:
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m4, ta, ma
; RV32ZVFH-NEXT: vmv1r.v v12, v0
-; RV32ZVFH-NEXT: lui a0, %hi(.LCPI20_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI20_0)(a0)
+; RV32ZVFH-NEXT: lui a0, %hi(.LCPI28_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI28_0)(a0)
; RV32ZVFH-NEXT: vfabs.v v16, v8, v0.t
; RV32ZVFH-NEXT: vsetvli zero, zero, e64, m4, ta, mu
; RV32ZVFH-NEXT: vmflt.vf v12, v16, fa5, v0.t
@@ -872,8 +1080,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
; RV32ZVFHMIN: # %bb.0:
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m4, ta, ma
; RV32ZVFHMIN-NEXT: vmv1r.v v12, v0
-; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI20_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI20_0)(a0)
+; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI28_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI28_0)(a0)
; RV32ZVFHMIN-NEXT: vfabs.v v16, v8, v0.t
; RV32ZVFHMIN-NEXT: vsetvli zero, zero, e64, m4, ta, mu
; RV32ZVFHMIN-NEXT: vmflt.vf v12, v16, fa5, v0.t
@@ -913,8 +1121,8 @@ define <8 x double> @vp_ceil_v8f64(<8 x double> %va, <8 x i1> %m, i32 zeroext %e
define <8 x double> @vp_ceil_v8f64_unmasked(<8 x double> %va, i32 zeroext %evl) {
; RV32ZVFH-LABEL: vp_ceil_v8f64_unmasked:
; RV32ZVFH: # %bb.0:
-; RV32ZVFH-NEXT: lui a1, %hi(.LCPI21_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI21_0)(a1)
+; RV32ZVFH-NEXT: lui a1, %hi(.LCPI29_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI29_0)(a1)
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m4, ta, ma
; RV32ZVFH-NEXT: vfabs.v v12, v8
; RV32ZVFH-NEXT: vmflt.vf v0, v12, fa5
@@ -944,8 +1152,8 @@ define <8 x double> @vp_ceil_v8f64_unmasked(<8 x double> %va, i32 zeroext %evl)
;
; RV32ZVFHMIN-LABEL: vp_ceil_v8f64_unmasked:
; RV32ZVFHMIN: # %bb.0:
-; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI21_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI21_0)(a1)
+; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI29_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI29_0)(a1)
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m4, ta, ma
; RV32ZVFHMIN-NEXT: vfabs.v v12, v8
; RV32ZVFHMIN-NEXT: vmflt.vf v0, v12, fa5
@@ -983,8 +1191,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
; RV32ZVFH: # %bb.0:
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFH-NEXT: vmv1r.v v16, v0
-; RV32ZVFH-NEXT: lui a0, %hi(.LCPI22_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI22_0)(a0)
+; RV32ZVFH-NEXT: lui a0, %hi(.LCPI30_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI30_0)(a0)
; RV32ZVFH-NEXT: vfabs.v v24, v8, v0.t
; RV32ZVFH-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; RV32ZVFH-NEXT: vmflt.vf v16, v24, fa5, v0.t
@@ -1022,8 +1230,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
; RV32ZVFHMIN: # %bb.0:
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFHMIN-NEXT: vmv1r.v v16, v0
-; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI22_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI22_0)(a0)
+; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI30_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI30_0)(a0)
; RV32ZVFHMIN-NEXT: vfabs.v v24, v8, v0.t
; RV32ZVFHMIN-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; RV32ZVFHMIN-NEXT: vmflt.vf v16, v24, fa5, v0.t
@@ -1063,8 +1271,8 @@ define <15 x double> @vp_ceil_v15f64(<15 x double> %va, <15 x i1> %m, i32 zeroex
define <15 x double> @vp_ceil_v15f64_unmasked(<15 x double> %va, i32 zeroext %evl) {
; RV32ZVFH-LABEL: vp_ceil_v15f64_unmasked:
; RV32ZVFH: # %bb.0:
-; RV32ZVFH-NEXT: lui a1, %hi(.LCPI23_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI23_0)(a1)
+; RV32ZVFH-NEXT: lui a1, %hi(.LCPI31_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI31_0)(a1)
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFH-NEXT: vfabs.v v16, v8
; RV32ZVFH-NEXT: vmflt.vf v0, v16, fa5
@@ -1094,8 +1302,8 @@ define <15 x double> @vp_ceil_v15f64_unmasked(<15 x double> %va, i32 zeroext %ev
;
; RV32ZVFHMIN-LABEL: vp_ceil_v15f64_unmasked:
; RV32ZVFHMIN: # %bb.0:
-; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI23_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI23_0)(a1)
+; RV32ZVFHMIN-NEXT: lui a1, %hi(.LCPI31_0)
+; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI31_0)(a1)
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFHMIN-NEXT: vfabs.v v16, v8
; RV32ZVFHMIN-NEXT: vmflt.vf v0, v16, fa5
@@ -1133,8 +1341,8 @@ define <16 x double> @vp_ceil_v16f64(<16 x double> %va, <16 x i1> %m, i32 zeroex
; RV32ZVFH: # %bb.0:
; RV32ZVFH-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFH-NEXT: vmv1r.v v16, v0
-; RV32ZVFH-NEXT: lui a0, %hi(.LCPI24_0)
-; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI24_0)(a0)
+; RV32ZVFH-NEXT: lui a0, %hi(.LCPI32_0)
+; RV32ZVFH-NEXT: fld fa5, %lo(.LCPI32_0)(a0)
; RV32ZVFH-NEXT: vfabs.v v24, v8, v0.t
; RV32ZVFH-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; RV32ZVFH-NEXT: vmflt.vf v16, v24, fa5, v0.t
@@ -1172,8 +1380,8 @@ define <16 x double> @vp_ceil_v16f64(<16 x double> %va, <16 x i1> %m, i32 zeroex
; RV32ZVFHMIN: # %bb.0:
; RV32ZVFHMIN-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32ZVFHMIN-NEXT: vmv1r.v v16, v0
-; RV32ZVFHMIN-NEXT: lui a0, %hi(.LCPI24_0)
-; RV32ZVFHMIN-NEXT: fld fa5, %lo(.LCPI24_0)(a0)
+; RV32ZV...
[truncated]
|
topperc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but FYI I think long term the plan is that we will eventually remove support for these VP intrinsics in the RISC-V backend: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999
Is this something that you were carrying downstream? Happy to have this patch upstream if it makes merging easier later on
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but FYI I think long term the plan is that we will eventually remove support for these VP intrinsics in the RISC-V backend: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999
Is this something that you were carrying downstream? Happy to have this patch upstream if it makes merging easier later on
| continue; | ||
| setOperationPromotedToType(ZvfhminZvfbfminPromoteOps, VT, F32VecVT); | ||
| // TODO: Promote VP ops to fp32. | ||
| setOperationPromotedToType(ZvfhminZvfbfminPromoteVPOps, VT, F32VecVT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing tests for VP_REDUCE_FMINIMUM and VP_REDUCE_FMAXIMUM?
Umm basically I'm doing Zvfbfa codegen and it includes normal instructions and vp intrinsics, but yeah I can just ignore vp intrinsics for now if they're gonna be removed shortly |
topperc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing approval until question about VP reductions is answered. VP reductions are not being removed so we do need to support them.
I think we can just add |
Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa This patch enable all of VP ops in ZvfhminZvfbfminPromoteVPOps but only add tests for reduce min/max/minimum/maximum since others might be removed soon: https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999#p-350673-intrinsics-considered-trivial-3
c1f7897 to
9340c4d
Compare
topperc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Following #112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa