-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][SelectionDAG] Expand v1f64-typed sin,cos,pow,log,exp intrinsics #83745
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Takuya Shimizu (hazohelet) ChangesThis patch makes NEON-enabled AArch64 backend expand the Full diff: https://github.com/llvm/llvm-project/pull/83745.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7f80e877cb2406..193386e70808cc 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1084,6 +1084,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
ISD::FMUL, ISD::FDIV, ISD::FMA,
ISD::FNEG, ISD::FABS, ISD::FCEIL,
ISD::FSQRT, ISD::FFLOOR, ISD::FNEARBYINT,
+ ISD::FSIN, ISD::FCOS, ISD::FPOW,
+ ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
+ ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
ISD::FRINT, ISD::FROUND, ISD::FROUNDEVEN,
ISD::FTRUNC, ISD::FMINNUM, ISD::FMAXNUM,
ISD::FMINIMUM, ISD::FMAXIMUM, ISD::STRICT_FADD,
diff --git a/llvm/test/CodeGen/AArch64/fexplog.ll b/llvm/test/CodeGen/AArch64/fexplog.ll
index e3c0ced79f07a6..79f980723c1d4e 100644
--- a/llvm/test/CodeGen/AArch64/fexplog.ll
+++ b/llvm/test/CodeGen/AArch64/fexplog.ll
@@ -36,6 +36,19 @@ entry:
ret half %c
}
+define <1 x double> @exp_v1f64(<1 x double> %x) {
+; CHECK-LABEL: exp_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl exp
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.exp.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @exp_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: exp_v2f64:
; CHECK-SD: // %bb.0: // %entry
@@ -1295,6 +1308,19 @@ entry:
ret half %c
}
+define <1 x double> @exp2_v1f64(<1 x double> %x) {
+; CHECK-LABEL: exp2_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl exp2
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.exp2.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @exp2_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: exp2_v2f64:
; CHECK-SD: // %bb.0: // %entry
@@ -2554,6 +2580,19 @@ entry:
ret half %c
}
+define <1 x double> @log_v1f64(<1 x double> %x) {
+; CHECK-LABEL: log_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl log
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.log.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @log_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: log_v2f64:
; CHECK-SD: // %bb.0: // %entry
@@ -3813,6 +3852,19 @@ entry:
ret half %c
}
+define <1 x double> @log2_v1f64(<1 x double> %x) {
+; CHECK-LABEL: log2_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl log2
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.log2.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @log2_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: log2_v2f64:
; CHECK-SD: // %bb.0: // %entry
@@ -5072,6 +5124,19 @@ entry:
ret half %c
}
+define <1 x double> @log10_v1f64(<1 x double> %x) {
+; CHECK-LABEL: log10_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl log10
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.log10.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @log10_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: log10_v2f64:
; CHECK-SD: // %bb.0: // %entry
diff --git a/llvm/test/CodeGen/AArch64/fpow.ll b/llvm/test/CodeGen/AArch64/fpow.ll
index 1dd5450c271cbe..65d7c203f0807c 100644
--- a/llvm/test/CodeGen/AArch64/fpow.ll
+++ b/llvm/test/CodeGen/AArch64/fpow.ll
@@ -37,6 +37,21 @@ entry:
ret half %c
}
+define <1 x double> @pow_v1f64(<1 x double> %x) {
+; CHECK-LABEL: pow_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: adrp x8, .LCPI3_0
+; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI3_0]
+; CHECK-NEXT: bl pow
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.pow.v1f64(<1 x double> %x, <1 x double> <double 3.140000e+00>)
+ ret <1 x double> %c
+}
+
define <2 x double> @pow_v2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-SD-LABEL: pow_v2f64:
; CHECK-SD: // %bb.0: // %entry
diff --git a/llvm/test/CodeGen/AArch64/fsincos.ll b/llvm/test/CodeGen/AArch64/fsincos.ll
index 2c76d969d6efe1..704ec9a5b66255 100644
--- a/llvm/test/CodeGen/AArch64/fsincos.ll
+++ b/llvm/test/CodeGen/AArch64/fsincos.ll
@@ -36,6 +36,19 @@ entry:
ret half %c
}
+define <1 x double> @sin_v1f64(<1 x double> %x) {
+; CHECK-LABEL: sin_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl sin
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.sin.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @sin_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: sin_v2f64:
; CHECK-SD: // %bb.0: // %entry
@@ -1295,6 +1308,19 @@ entry:
ret half %c
}
+define <1 x double> @cos_v1f64(<1 x double> %x) {
+; CHECK-LABEL: cos_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl cos
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %c = call <1 x double> @llvm.cos.v1f64(<1 x double> %x)
+ ret <1 x double> %c
+}
+
define <2 x double> @cos_v2f64(<2 x double> %a) {
; CHECK-SD-LABEL: cos_v2f64:
; CHECK-SD: // %bb.0: // %entry
diff --git a/llvm/test/CodeGen/AArch64/llvm.exp10.ll b/llvm/test/CodeGen/AArch64/llvm.exp10.ll
index 70df88ba9f8985..0ff260000b17c7 100644
--- a/llvm/test/CodeGen/AArch64/llvm.exp10.ll
+++ b/llvm/test/CodeGen/AArch64/llvm.exp10.ll
@@ -537,11 +537,18 @@ define double @exp10_f64(double %x) {
ret double %r
}
-; FIXME: Broken
-; define <1 x double> @exp10_v1f64(<1 x double> %x) {
-; %r = call <1 x double> @llvm.exp10.v1f64(<1 x double> %x)
-; ret <1 x double> %r
-; }
+define <1 x double> @exp10_v1f64(<1 x double> %x) {
+; CHECK-LABEL: exp10_v1f64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl exp10
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %r = call <1 x double> @llvm.exp10.v1f64(<1 x double> %x)
+ ret <1 x double> %r
+}
define <2 x double> @exp10_v2f64(<2 x double> %x) {
; SDAG-LABEL: exp10_v2f64:
|
You can test this locally with the following command:git-clang-format --diff 0ef61ed54dca2e974928c55b2144b57d4c4ff621 bc3d9da74bf435f1c6ded6f90d80c46cf89f9f4c -- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2cb70a876c..b030c3e152 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1125,24 +1125,54 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
if (Subtarget->hasNEON()) {
// FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
// silliness like this:
- for (auto Op :
- {ISD::SELECT, ISD::SELECT_CC,
- ISD::BR_CC, ISD::FADD, ISD::FSUB,
- ISD::FMUL, ISD::FDIV, ISD::FMA,
- ISD::FNEG, ISD::FABS, ISD::FCEIL,
- ISD::FSQRT, ISD::FFLOOR, ISD::FNEARBYINT,
- ISD::FSIN, ISD::FCOS, ISD::FPOW,
- ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
- ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
- ISD::FRINT, ISD::FROUND, ISD::FROUNDEVEN,
- ISD::FTRUNC, ISD::FMINNUM, ISD::FMAXNUM,
- ISD::FMINIMUM, ISD::FMAXIMUM, ISD::STRICT_FADD,
- ISD::STRICT_FSUB, ISD::STRICT_FMUL, ISD::STRICT_FDIV,
- ISD::STRICT_FMA, ISD::STRICT_FCEIL, ISD::STRICT_FFLOOR,
- ISD::STRICT_FSQRT, ISD::STRICT_FRINT, ISD::STRICT_FNEARBYINT,
- ISD::STRICT_FROUND, ISD::STRICT_FTRUNC, ISD::STRICT_FROUNDEVEN,
- ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM, ISD::STRICT_FMINIMUM,
- ISD::STRICT_FMAXIMUM})
+ for (auto Op : {ISD::SELECT,
+ ISD::SELECT_CC,
+ ISD::BR_CC,
+ ISD::FADD,
+ ISD::FSUB,
+ ISD::FMUL,
+ ISD::FDIV,
+ ISD::FMA,
+ ISD::FNEG,
+ ISD::FABS,
+ ISD::FCEIL,
+ ISD::FSQRT,
+ ISD::FFLOOR,
+ ISD::FNEARBYINT,
+ ISD::FSIN,
+ ISD::FCOS,
+ ISD::FPOW,
+ ISD::FLOG,
+ ISD::FLOG2,
+ ISD::FLOG10,
+ ISD::FEXP,
+ ISD::FEXP2,
+ ISD::FEXP10,
+ ISD::FRINT,
+ ISD::FROUND,
+ ISD::FROUNDEVEN,
+ ISD::FTRUNC,
+ ISD::FMINNUM,
+ ISD::FMAXNUM,
+ ISD::FMINIMUM,
+ ISD::FMAXIMUM,
+ ISD::STRICT_FADD,
+ ISD::STRICT_FSUB,
+ ISD::STRICT_FMUL,
+ ISD::STRICT_FDIV,
+ ISD::STRICT_FMA,
+ ISD::STRICT_FCEIL,
+ ISD::STRICT_FFLOOR,
+ ISD::STRICT_FSQRT,
+ ISD::STRICT_FRINT,
+ ISD::STRICT_FNEARBYINT,
+ ISD::STRICT_FROUND,
+ ISD::STRICT_FTRUNC,
+ ISD::STRICT_FROUNDEVEN,
+ ISD::STRICT_FMINNUM,
+ ISD::STRICT_FMAXNUM,
+ ISD::STRICT_FMINIMUM,
+ ISD::STRICT_FMAXIMUM})
setOperationAction(Op, MVT::v1f64, Expand);
for (auto Op :
|
I'm surprised we got this far without needing these already. Can we add strict versions of the nodes at the same time? And maybe nodes like fcbrt, even if they can't be easily tested. |
I think current additions suffice. The strict versions are already expanded elsewhere: As to fcbrt, as per the code in (https://github.com/llvm/llvm-project/blob/d3ec8c2a25f43225efe997569925aa57324db0dd/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L17273-L17280) and tests, it looks to be generated with only f32 and f64 types as of now. So vector operands are never generated for fcbrt nodes. |
Hi - It was mostly for symmetry and testing that I was hoping they could be added. Can you add the test functions from the godbolt link to one of the existing strictfp tests? Then this LGTM, thanks. |
I see. Thanks for the quick response. |
06d4a35
to
bc3d9da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
This patch makes NEON-enabled AArch64 backend expand the
sin, cos, pow, log, log2, log10, exp, exp2, exp10
intrinsics forv1f64
data type, all of which caused selection failure before this patch.Fixes #83729