Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/lib/Target/PowerPC/PPCInstrVSX.td
Original file line number Diff line number Diff line change
Expand Up @@ -3613,6 +3613,10 @@ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
immSExt5NonZero:$A, immSExt5NonZero:$A)),
(v4i32 (VSPLTISW imm:$A))>;

// Optimize for vector of 1s addition operation
def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work only for v4i32 vector types? Why not v2i64, v8i16 and v16i8 types?

(VSUBUWM $A, (v4i32 (COPY_TO_REGCLASS (XXLEQVOnes), VSRC)))>;

// Splat loads.
def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
(v8i16 (VSPLTHs 3, (MTVSRWZ (LHZX ForceXForm:$A))))>;
Expand Down
8 changes: 3 additions & 5 deletions llvm/test/CodeGen/PowerPC/vector-all-ones.ll
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,12 @@
; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 -mtriple=powerpc-ibm-aix \
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s

; Currently the generated code uses `vspltisw` to generate vector of 1s followed by add operation.
; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
; followed by subtraction operation.
; Optimized version of vector addition with {1,1,1,1} by replacing `vspltisw + vadduwm` with 'xxleqv + vsubuwm'
define dso_local noundef <4 x i32> @test1(<4 x i32> %a) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment. Support v2i64, v8i16 and v16i8 types as well ?

; CHECK-LABEL: test1:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vspltisw v3, 1
; CHECK-NEXT: vadduwm v2, v2, v3
; CHECK-NEXT: xxleqv v3, v3, v3
; CHECK-NEXT: vsubuwm v2, v2, v3
; CHECK-NEXT: blr
entry:
%add = add <4 x i32> %a, splat (i32 1)
Expand Down