-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NFC] Lockdown instructions of vspltisw for addition of vector of 1s #160476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-powerpc Author: None (Himadhith) ChangesThis NFC patch looks to lock down the instruction generated for the operation of Full diff: https://github.com/llvm/llvm-project/pull/160476.diff 1 Files Affected:
diff --git a/llvm/test/CodeGen/PowerPC/vector-all-ones.ll b/llvm/test/CodeGen/PowerPC/vector-all-ones.ll
new file mode 100644
index 0000000000000..7ad41482ffe81
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/vector-all-ones.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64LE
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64-ibm-aix \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc-ibm-aix \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_32
+
+; Currently the generated code uses `vspltisw` to generate vector of 1s followed by add operation.
+; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
+; followed by subtraction operation.
+define dso_local <4 x i32> @test1(<4 x i32> %a) {
+; POWERPC_64LE-LABEL: test1:
+; POWERPC_64LE: # %bb.0: # %entry
+; POWERPC_64LE-NEXT: vspltisw v3, 1
+; POWERPC_64LE-NEXT: stxv v2, -16(r1)
+; POWERPC_64LE-NEXT: vadduwm v2, v2, v3
+; POWERPC_64LE-NEXT: stxv v3, -32(r1)
+; POWERPC_64LE-NEXT: blr
+;
+; POWERPC_64-LABEL: test1:
+; POWERPC_64: # %bb.0: # %entry
+; POWERPC_64-NEXT: vspltisw v3, 1
+; POWERPC_64-NEXT: stxv v2, -16(r1)
+; POWERPC_64-NEXT: vadduwm v2, v2, v3
+; POWERPC_64-NEXT: stxv v3, -32(r1)
+; POWERPC_64-NEXT: blr
+;
+; POWERPC_32-LABEL: test1:
+; POWERPC_32: # %bb.0: # %entry
+; POWERPC_32-NEXT: vspltisw v3, 1
+; POWERPC_32-NEXT: stxv v2, -16(r1)
+; POWERPC_32-NEXT: vadduwm v2, v2, v3
+; POWERPC_32-NEXT: stxv v3, -32(r1)
+; POWERPC_32-NEXT: blr
+entry:
+ %a.addr = alloca <4 x i32>, align 16
+ %b = alloca <4 x i32>, align 16
+ store <4 x i32> %a, ptr %a.addr, align 16
+ store <4 x i32> splat (i32 1), ptr %b, align 16
+ %0 = load <4 x i32>, ptr %a.addr, align 16
+ %1 = load <4 x i32>, ptr %b, align 16
+ %add = add <4 x i32> %0, %1
+ ret <4 x i32> %add
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Is there an existing PR that will optimize this code gen? |
75f2131
to
f154ac2
Compare
[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for adding the vector {1, 1, 1, 1} |
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64 | ||
|
||
; RUN: llc -verify-machineinstrs -O3 -mtriple=powerpc-ibm-aix \ | ||
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3 set of checks are the same AFAICT... is there a reason we need all 3 to be explicit vs just using the default CHECK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea good point, thanks. No explicit reason for needing all 3 checks, I will change it to use the default CHECK.
f154ac2
to
908c6b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated your PR title since you repeated NFC in it.
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/15238 Here is the relevant piece of the build log for the reference
|
…lvm#160476) This NFC patch looks to lock down the instruction generated for the operation of `A + vector {1, 1, 1, 1}` in which the current code emits `vspltisw`. It can be made better with the use of a `2 cycle` instruction `xxleqv` over the current `4 cycle vspltisw`. --------- Co-authored-by: himadhith <himadhith.v@ibm.com>
This NFC patch looks to lock down the instruction generated for the operation of
A + vector {1, 1, 1, 1}
in which the current code emitsvspltisw
.It can be made better with the use of a
2 cycle
instructionxxleqv
over the current4 cycle vspltisw
.