-
Notifications
You must be signed in to change notification settings - Fork 11.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Remove redundant FDIV Combine. #91924
[AArch64] Remove redundant FDIV Combine. #91924
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesNEON fixed-point SCVTF instruction is not available in Streaming-SVE Full diff: https://github.com/llvm/llvm-project/pull/91924.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7344387ffe552..a76dc7b140b07 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17912,7 +17912,7 @@ static SDValue performFpToIntCombine(SDNode *N, SelectionDAG &DAG,
static SDValue performFDivCombine(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,
const AArch64Subtarget *Subtarget) {
- if (!Subtarget->hasNEON())
+ if (!Subtarget->isNeonAvailable())
return SDValue();
SDValue Op = N->getOperand(0);
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
index c2d6ed4e9ccf9..c20e8521375a9 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
@@ -214,6 +214,24 @@ define <2 x float> @fdiv_v2f32(<2 x float> %op1, <2 x float> %op2) {
ret <2 x float> %res
}
+; Test that we don't optimise this using a NEON instruction, when
+; NEON is not available.
+define <2 x float> @fdiv_v232_pow2(<2 x i32> %in) {
+; CHECK-LABEL: fdiv_v232_pow2:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.s, vl2
+; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
+; CHECK-NEXT: fmov z1.s, #16.00000000
+; CHECK-NEXT: scvtf z0.s, p0/m, z0.s
+; CHECK-NEXT: fdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-NEXT: ret
+entry:
+ %vcvt.i = sitofp <2 x i32> %in to <2 x float>
+ %div.i = fdiv <2 x float> %vcvt.i, <float 16.0, float 16.0>
+ ret <2 x float> %div.i
+}
+
define <4 x float> @fdiv_v4f32(<4 x float> %op1, <4 x float> %op2) {
; CHECK-LABEL: fdiv_v4f32:
; CHECK: // %bb.0:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just delete this transform? It's specifically checking for a divide by a power of two... but instcombine always transforms that to a multiply, so this will basically never trigger.
The target combine is no longer required because InstCombine will transform the DIV by a power of 2 into a multiply, so in practice this case will never trigger. Additionally, the generated code would have been incorrect for streaming(-compatible) functions, because it assumed NEON was available.
9d4307a
to
f744db3
Compare
Good point! I've repurposed the PR to instead remove the combine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see aarch64_neon_vcvtfxs2fp
used anywhere else so I guess this means we're missing the equivalent fmul
combine, which seems unfortunate but not your fault.
The target combine is no longer required because InstCombine will
transform the DIV by a power of 2 into a multiply, so in practice
this case will never trigger.
Additionally, the generated code would have been incorrect for
streaming(-compatible) functions, because it assumed NEON was available.