Skip to content

Conversation

Himadhith
Copy link
Contributor

This patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for A + vector {1, 1, 1, 1}.

In this optimized version we replace vspltisw (4 cycles) with xxleqv (2 cycles) using the following identity:
A - (-1) = A + 1.

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2025

@llvm/pr-subscribers-backend-powerpc

Author: None (Himadhith)

Changes

This patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for A + vector {1, 1, 1, 1}.

In this optimized version we replace vspltisw (4 cycles) with xxleqv (2 cycles) using the following identity:
A - (-1) = A + 1.


Full diff: https://github.com/llvm/llvm-project/pull/160882.diff

1 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCInstrVSX.td (+4)
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index 4e5165bfcda55..dc850d2470cfd 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -3627,6 +3627,10 @@ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
                                immSExt5NonZero:$A, immSExt5NonZero:$A)),
           (v4i32 (VSPLTISW imm:$A))>;
 
+// Optimise for vector of 1s addition operation
+def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
+          (VSUBUWM $A, (v4i32 (COPY_TO_REGCLASS (XXLEQVOnes), VSRC)))>;
+
 // Splat loads.
 def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
           (v8i16 (VSPLTHs 3, (MTVSRWZ (LHZX ForceXForm:$A))))>;

@lei137
Copy link
Contributor

lei137 commented Sep 26, 2025

I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference.

@Himadhith
Copy link
Contributor Author

Himadhith commented Sep 26, 2025

I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference.

Yes as soon as the NFC patch gets merged I will rebase and the file should reflect the changes. Should I keep this as a draft till then?

(v4i32 (VSPLTISW imm:$A))>;

// Optimize for vector of 1s addition operation
def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work only for v4i32 vector types? Why not v2i64, v8i16 and v16i8 types?

; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
; followed by subtraction operation.
; Optimized version of vector addition with {1,1,1,1} by replacing `vspltisw + vadduwm` with 'xxleqv + vsubuwm'
define dso_local noundef <4 x i32> @test1(<4 x i32> %a) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment. Support v2i64, v8i16 and v16i8 types as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants