-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for adding the vector {1, 1, 1, 1} #160882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-powerpc Author: None (Himadhith) ChangesThis patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for In this optimized version we replace Full diff: https://github.com/llvm/llvm-project/pull/160882.diff 1 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index 4e5165bfcda55..dc850d2470cfd 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -3627,6 +3627,10 @@ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
immSExt5NonZero:$A, immSExt5NonZero:$A)),
(v4i32 (VSPLTISW imm:$A))>;
+// Optimise for vector of 1s addition operation
+def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
+ (VSUBUWM $A, (v4i32 (COPY_TO_REGCLASS (XXLEQVOnes), VSRC)))>;
+
// Splat loads.
def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
(v8i16 (VSPLTHs 3, (MTVSRWZ (LHZX ForceXForm:$A))))>;
|
6018f73
to
40edcce
Compare
8079d5c
to
da6de91
Compare
I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference. |
Yes as soon as the NFC patch gets merged I will rebase and the file should reflect the changes. Should I keep this as a draft till then? |
da6de91
to
5de66e2
Compare
…ector of -1s is cheaper than vector of 1s
(v4i32 (VSPLTISW imm:$A))>; | ||
|
||
// Optimize for vector of 1s addition operation | ||
def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work only for v4i32
vector types? Why not v2i64, v8i16 and v16i8
types?
; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s | ||
; followed by subtraction operation. | ||
; Optimized version of vector addition with {1,1,1,1} by replacing `vspltisw + vadduwm` with 'xxleqv + vsubuwm' | ||
define dso_local noundef <4 x i32> @test1(<4 x i32> %a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment. Support v2i64, v8i16 and v16i8
types as well ?
This patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for
A + vector {1, 1, 1, 1}
.In this optimized version we replace
vspltisw (4 cycles)
withxxleqv (2 cycles)
using the following identity:A - (-1) = A + 1
.