-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PPC] generate stxvw4x/lxvw4x on P7 #87049
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-powerpc Author: Chen Zheng (chenzheng1030) ChangesMy understanding is that we should also use P8 ISA and P7 ISA indicates that these two instructions are able to handle the unaligned addresses well. Patch is 23.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87049.diff 10 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 7436b202fba0d9..289e0bc29c4a55 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -15965,8 +15965,8 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
Align ABIAlignment = DAG.getDataLayout().getABITypeAlign(Ty);
if (LD->isUnindexed() && VT.isVector() &&
((Subtarget.hasAltivec() && ISD::isNON_EXTLoad(N) &&
- // P8 and later hardware should just use LOAD.
- !Subtarget.hasP8Vector() &&
+ // Hardware has VSX should just use LOAD.
+ !Subtarget.hasVSX() &&
(VT == MVT::v16i8 || VT == MVT::v8i16 || VT == MVT::v4i32 ||
VT == MVT::v4f32))) &&
LD->getAlign() < ABIAlignment) {
@@ -17250,8 +17250,7 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
EVT PPCTargetLowering::getOptimalMemOpType(
const MemOp &Op, const AttributeList &FuncAttributes) const {
if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
- // We should use Altivec/VSX loads and stores when available. For unaligned
- // addresses, unaligned VSX loads are only fast starting with the P8.
+ // We should use Altivec/VSX loads and stores when available.
if (Subtarget.hasAltivec() && Op.size() >= 16) {
if (Op.isMemset() && Subtarget.hasVSX()) {
uint64_t TailSize = Op.size() % 16;
@@ -17263,7 +17262,7 @@ EVT PPCTargetLowering::getOptimalMemOpType(
}
return MVT::v4i32;
}
- if (Op.isAligned(Align(16)) || Subtarget.hasP8Vector())
+ if (Op.isAligned(Align(16)) || Subtarget.hasVSX())
return MVT::v4i32;
}
}
diff --git a/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills-mir.ll b/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills-mir.ll
index 7c45958a1c2ff9..d927b9edb74d1d 100644
--- a/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills-mir.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills-mir.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
; RUN: llc -verify-machineinstrs -mcpu=pwr7 -mattr=+altivec -vec-extabi \
; RUN: -stop-after=machine-cp -mtriple powerpc-ibm-aix-xcoff < %s | \
; RUN: FileCheck %s --check-prefix=MIR32
@@ -12,21 +12,16 @@
@__const.caller.t = private unnamed_addr constant %struct.Test { double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00 }, align 8
define double @caller() {
-
; MIR32-LABEL: name: caller
; MIR32: bb.0.entry:
- ; MIR32-NEXT: renamable $r3 = LI 0
- ; MIR32-NEXT: renamable $r4 = LIS 16392
- ; MIR32-NEXT: STW killed renamable $r4, 180, $r1 :: (store (s32) into unknown-address + 24)
- ; MIR32-NEXT: renamable $r4 = LIS 16384
- ; MIR32-NEXT: STW renamable $r3, 184, $r1 :: (store (s32) into unknown-address + 28)
- ; MIR32-NEXT: STW renamable $r3, 176, $r1 :: (store (s32) into unknown-address + 20)
- ; MIR32-NEXT: STW killed renamable $r4, 172, $r1 :: (store (s32) into unknown-address + 16)
- ; MIR32-NEXT: STW renamable $r3, 168, $r1 :: (store (s32) into unknown-address + 12)
- ; MIR32-NEXT: renamable $r4 = LIS 16368
- ; MIR32-NEXT: STW killed renamable $r4, 164, $r1 :: (store (s32) into unknown-address + 8)
- ; MIR32-NEXT: STW renamable $r3, 160, $r1 :: (store (s32) into unknown-address + 4)
- ; MIR32-NEXT: STW killed renamable $r3, 156, $r1 :: (store (s32))
+ ; MIR32-NEXT: renamable $r3 = LWZtoc @__const.caller.t, $r2 :: (load (s32) from got)
+ ; MIR32-NEXT: renamable $r4 = LI 16
+ ; MIR32-NEXT: renamable $vsl0 = LXVW4X renamable $r3, killed renamable $r4 :: (load (s128) from unknown-address + 16, align 8)
+ ; MIR32-NEXT: renamable $r4 = LI 172
+ ; MIR32-NEXT: STXVW4X killed renamable $vsl0, $r1, killed renamable $r4 :: (store (s128) into unknown-address + 16, align 4)
+ ; MIR32-NEXT: renamable $vsl0 = LXVW4X $zero, killed renamable $r3 :: (load (s128), align 8)
+ ; MIR32-NEXT: renamable $r3 = LI 156
+ ; MIR32-NEXT: STXVW4X killed renamable $vsl0, $r1, killed renamable $r3 :: (store (s128), align 4)
; MIR32-NEXT: ADJCALLSTACKDOWN 188, 0, implicit-def dead $r1, implicit $r1
; MIR32-NEXT: renamable $vsl0 = XXLXORz
; MIR32-NEXT: renamable $r3 = LI 136
@@ -78,32 +73,30 @@ define double @caller() {
;
; MIR64-LABEL: name: caller
; MIR64: bb.0.entry:
- ; MIR64-NEXT: renamable $x3 = LI8 2049
- ; MIR64-NEXT: renamable $x4 = LI8 1
- ; MIR64-NEXT: renamable $x3 = RLDIC killed renamable $x3, 51, 1
- ; MIR64-NEXT: STD killed renamable $x3, 216, $x1 :: (store (s64) into unknown-address + 24, align 4)
- ; MIR64-NEXT: renamable $x3 = LI8 1023
- ; MIR64-NEXT: renamable $x4 = RLDIC killed renamable $x4, 62, 1
- ; MIR64-NEXT: STD killed renamable $x4, 208, $x1 :: (store (s64) into unknown-address + 16, align 4)
- ; MIR64-NEXT: renamable $x4 = LI8 0
- ; MIR64-NEXT: STD renamable $x4, 192, $x1 :: (store (s64), align 4)
- ; MIR64-NEXT: renamable $x3 = RLDIC killed renamable $x3, 52, 2
- ; MIR64-NEXT: STD killed renamable $x3, 200, $x1 :: (store (s64) into unknown-address + 8, align 4)
+ ; MIR64-NEXT: renamable $x3 = LDtoc @__const.caller.t, $x2 :: (load (s64) from got)
+ ; MIR64-NEXT: renamable $x4 = LI8 16
+ ; MIR64-NEXT: renamable $vsl0 = LXVW4X renamable $x3, killed renamable $x4 :: (load (s128) from unknown-address + 16, align 8)
+ ; MIR64-NEXT: renamable $x4 = LI8 208
+ ; MIR64-NEXT: STXVW4X killed renamable $vsl0, $x1, killed renamable $x4 :: (store (s128) into unknown-address + 16, align 4)
+ ; MIR64-NEXT: renamable $vsl0 = LXVW4X $zero8, killed renamable $x3 :: (load (s128), align 8)
+ ; MIR64-NEXT: renamable $x3 = LI8 192
+ ; MIR64-NEXT: STXVW4X killed renamable $vsl0, $x1, killed renamable $x3 :: (store (s128), align 4)
; MIR64-NEXT: ADJCALLSTACKDOWN 224, 0, implicit-def dead $r1, implicit $r1
; MIR64-NEXT: renamable $vsl0 = XXLXORz
; MIR64-NEXT: renamable $x3 = LI8 160
+ ; MIR64-NEXT: renamable $x4 = LI8 144
; MIR64-NEXT: STXVW4X renamable $vsl0, $x1, killed renamable $x3 :: (store (s128), align 8)
- ; MIR64-NEXT: renamable $x3 = LI8 144
- ; MIR64-NEXT: STXVW4X renamable $vsl0, $x1, killed renamable $x3 :: (store (s128), align 8)
+ ; MIR64-NEXT: STXVW4X renamable $vsl0, $x1, killed renamable $x4 :: (store (s128), align 8)
; MIR64-NEXT: renamable $x3 = LI8 128
+ ; MIR64-NEXT: renamable $x4 = LDtocCPT %const.0, $x2 :: (load (s64) from got)
; MIR64-NEXT: STXVW4X killed renamable $vsl0, $x1, killed renamable $x3 :: (store (s128), align 8)
- ; MIR64-NEXT: renamable $x3 = LDtocCPT %const.0, $x2 :: (load (s64) from got)
- ; MIR64-NEXT: renamable $vsl0 = LXVD2X $zero8, killed renamable $x3 :: (load (s128) from constant-pool)
; MIR64-NEXT: renamable $x3 = LI8 80
+ ; MIR64-NEXT: renamable $vsl0 = LXVD2X $zero8, killed renamable $x4 :: (load (s128) from constant-pool)
+ ; MIR64-NEXT: renamable $x4 = LI8 512
; MIR64-NEXT: STXVD2X killed renamable $vsl0, $x1, killed renamable $x3 :: (store (s128))
- ; MIR64-NEXT: renamable $x3 = LI8 512
- ; MIR64-NEXT: STD killed renamable $x3, 184, $x1 :: (store (s64))
- ; MIR64-NEXT: STD killed renamable $x4, 176, $x1 :: (store (s64))
+ ; MIR64-NEXT: renamable $x3 = LI8 0
+ ; MIR64-NEXT: STD killed renamable $x4, 184, $x1 :: (store (s64))
+ ; MIR64-NEXT: STD killed renamable $x3, 176, $x1 :: (store (s64))
; MIR64-NEXT: $f1 = XXLXORdpz
; MIR64-NEXT: $f2 = XXLXORdpz
; MIR64-NEXT: $v2 = XXLXORz
@@ -112,8 +105,8 @@ define double @caller() {
; MIR64-NEXT: $v5 = XXLXORz
; MIR64-NEXT: $v6 = XXLXORz
; MIR64-NEXT: $x3 = LI8 128
- ; MIR64-NEXT: $x4 = LI8 256
; MIR64-NEXT: $v7 = XXLXORz
+ ; MIR64-NEXT: $x4 = LI8 256
; MIR64-NEXT: $v8 = XXLXORz
; MIR64-NEXT: $v9 = XXLXORz
; MIR64-NEXT: $v10 = XXLXORz
@@ -136,7 +129,7 @@ define double @caller() {
; MIR64-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $f1
entry:
%call = tail call double @callee(i32 signext 128, i32 signext 256, double 0.000000e+00, double 0.000000e+00, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 0.000000e+00, double 0.000000e+00>, <2 x double> <double 2.400000e+01, double 2.500000e+01>, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, i32 signext 512, ptr nonnull byval(%struct.Test) align 4 @__const.caller.t)
- ret double %call
+ ret double %call
}
declare double @callee(i32 signext, i32 signext, double, double, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, <2 x double>, double, double, double, double, double, double, double, double, double, double, double, i32 signext, ptr byval(%struct.Test) align 8)
diff --git a/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills.ll b/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills.ll
index 66f88b4e3d5ab3..91e7d4094fc344 100644
--- a/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-vec-arg-spills.ll
@@ -15,56 +15,52 @@ define double @caller() {
; 32BIT: # %bb.0: # %entry
; 32BIT-NEXT: mflr 0
; 32BIT-NEXT: stwu 1, -192(1)
-; 32BIT-NEXT: lis 4, 16392
+; 32BIT-NEXT: lwz 3, L..C0(2) # @__const.caller.t
+; 32BIT-NEXT: li 4, 16
; 32BIT-NEXT: stw 0, 200(1)
-; 32BIT-NEXT: li 3, 0
-; 32BIT-NEXT: xxlxor 0, 0, 0
; 32BIT-NEXT: xxlxor 1, 1, 1
-; 32BIT-NEXT: stw 4, 180(1)
-; 32BIT-NEXT: lis 4, 16384
-; 32BIT-NEXT: stw 3, 184(1)
-; 32BIT-NEXT: stw 3, 176(1)
-; 32BIT-NEXT: stw 4, 172(1)
-; 32BIT-NEXT: lis 4, 16368
-; 32BIT-NEXT: stw 3, 168(1)
-; 32BIT-NEXT: stw 3, 160(1)
-; 32BIT-NEXT: stw 4, 164(1)
-; 32BIT-NEXT: stw 3, 156(1)
-; 32BIT-NEXT: li 3, 136
-; 32BIT-NEXT: li 4, 120
; 32BIT-NEXT: xxlxor 2, 2, 2
-; 32BIT-NEXT: stxvw4x 0, 1, 3
-; 32BIT-NEXT: li 3, 104
-; 32BIT-NEXT: stxvw4x 0, 1, 4
-; 32BIT-NEXT: li 4, 88
-; 32BIT-NEXT: stxvw4x 0, 1, 3
-; 32BIT-NEXT: stxvw4x 0, 1, 4
-; 32BIT-NEXT: lwz 4, L..C0(2) # %const.0
-; 32BIT-NEXT: li 3, 72
-; 32BIT-NEXT: stxvw4x 0, 1, 3
-; 32BIT-NEXT: li 3, 48
+; 32BIT-NEXT: lxvw4x 0, 3, 4
+; 32BIT-NEXT: li 4, 172
; 32BIT-NEXT: xxlxor 34, 34, 34
; 32BIT-NEXT: xxlxor 35, 35, 35
-; 32BIT-NEXT: lxvd2x 0, 0, 4
-; 32BIT-NEXT: li 4, 512
+; 32BIT-NEXT: stxvw4x 0, 1, 4
+; 32BIT-NEXT: li 4, 120
; 32BIT-NEXT: xxlxor 36, 36, 36
; 32BIT-NEXT: xxlxor 37, 37, 37
; 32BIT-NEXT: xxlxor 38, 38, 38
+; 32BIT-NEXT: lxvw4x 0, 0, 3
+; 32BIT-NEXT: li 3, 156
; 32BIT-NEXT: xxlxor 39, 39, 39
+; 32BIT-NEXT: stxvw4x 0, 1, 3
+; 32BIT-NEXT: xxlxor 0, 0, 0
+; 32BIT-NEXT: li 3, 136
; 32BIT-NEXT: xxlxor 40, 40, 40
; 32BIT-NEXT: xxlxor 41, 41, 41
+; 32BIT-NEXT: stxvw4x 0, 1, 3
+; 32BIT-NEXT: li 3, 104
+; 32BIT-NEXT: stxvw4x 0, 1, 4
+; 32BIT-NEXT: li 4, 88
; 32BIT-NEXT: xxlxor 42, 42, 42
-; 32BIT-NEXT: stxvd2x 0, 1, 3
-; 32BIT-NEXT: stw 4, 152(1)
-; 32BIT-NEXT: li 3, 128
-; 32BIT-NEXT: li 4, 256
; 32BIT-NEXT: xxlxor 43, 43, 43
; 32BIT-NEXT: xxlxor 44, 44, 44
+; 32BIT-NEXT: stxvw4x 0, 1, 3
+; 32BIT-NEXT: stxvw4x 0, 1, 4
+; 32BIT-NEXT: lwz 4, L..C1(2) # %const.0
+; 32BIT-NEXT: li 3, 72
; 32BIT-NEXT: xxlxor 45, 45, 45
+; 32BIT-NEXT: stxvw4x 0, 1, 3
+; 32BIT-NEXT: li 3, 48
; 32BIT-NEXT: xxlxor 3, 3, 3
; 32BIT-NEXT: xxlxor 4, 4, 4
+; 32BIT-NEXT: lxvd2x 0, 0, 4
+; 32BIT-NEXT: li 4, 512
; 32BIT-NEXT: xxlxor 5, 5, 5
; 32BIT-NEXT: xxlxor 6, 6, 6
+; 32BIT-NEXT: stxvd2x 0, 1, 3
+; 32BIT-NEXT: stw 4, 152(1)
+; 32BIT-NEXT: li 3, 128
+; 32BIT-NEXT: li 4, 256
; 32BIT-NEXT: xxlxor 7, 7, 7
; 32BIT-NEXT: xxlxor 8, 8, 8
; 32BIT-NEXT: xxlxor 9, 9, 9
@@ -83,54 +79,52 @@ define double @caller() {
; 64BIT: # %bb.0: # %entry
; 64BIT-NEXT: mflr 0
; 64BIT-NEXT: stdu 1, -224(1)
-; 64BIT-NEXT: li 3, 2049
+; 64BIT-NEXT: ld 3, L..C0(2) # @__const.caller.t
+; 64BIT-NEXT: li 4, 16
; 64BIT-NEXT: std 0, 240(1)
-; 64BIT-NEXT: li 4, 1
-; 64BIT-NEXT: xxlxor 0, 0, 0
; 64BIT-NEXT: xxlxor 1, 1, 1
-; 64BIT-NEXT: rldic 3, 3, 51, 1
-; 64BIT-NEXT: rldic 4, 4, 62, 1
; 64BIT-NEXT: xxlxor 2, 2, 2
+; 64BIT-NEXT: lxvw4x 0, 3, 4
+; 64BIT-NEXT: li 4, 208
; 64BIT-NEXT: xxlxor 34, 34, 34
-; 64BIT-NEXT: std 3, 216(1)
-; 64BIT-NEXT: li 3, 1023
-; 64BIT-NEXT: std 4, 208(1)
-; 64BIT-NEXT: li 4, 0
; 64BIT-NEXT: xxlxor 35, 35, 35
+; 64BIT-NEXT: stxvw4x 0, 1, 4
+; 64BIT-NEXT: li 4, 144
; 64BIT-NEXT: xxlxor 36, 36, 36
-; 64BIT-NEXT: rldic 3, 3, 52, 2
-; 64BIT-NEXT: std 4, 192(1)
; 64BIT-NEXT: xxlxor 37, 37, 37
; 64BIT-NEXT: xxlxor 38, 38, 38
+; 64BIT-NEXT: lxvw4x 0, 0, 3
+; 64BIT-NEXT: li 3, 192
; 64BIT-NEXT: xxlxor 39, 39, 39
-; 64BIT-NEXT: std 3, 200(1)
+; 64BIT-NEXT: stxvw4x 0, 1, 3
+; 64BIT-NEXT: xxlxor 0, 0, 0
; 64BIT-NEXT: li 3, 160
; 64BIT-NEXT: xxlxor 40, 40, 40
-; 64BIT-NEXT: stxvw4x 0, 1, 3
-; 64BIT-NEXT: li 3, 144
; 64BIT-NEXT: xxlxor 41, 41, 41
-; 64BIT-NEXT: xxlxor 42, 42, 42
; 64BIT-NEXT: stxvw4x 0, 1, 3
+; 64BIT-NEXT: stxvw4x 0, 1, 4
+; 64BIT-NEXT: ld 4, L..C1(2) # %const.0
; 64BIT-NEXT: li 3, 128
+; 64BIT-NEXT: xxlxor 42, 42, 42
; 64BIT-NEXT: xxlxor 43, 43, 43
; 64BIT-NEXT: stxvw4x 0, 1, 3
-; 64BIT-NEXT: ld 3, L..C0(2) # %const.0
+; 64BIT-NEXT: li 3, 80
; 64BIT-NEXT: xxlxor 44, 44, 44
; 64BIT-NEXT: xxlxor 45, 45, 45
-; 64BIT-NEXT: lxvd2x 0, 0, 3
-; 64BIT-NEXT: li 3, 80
+; 64BIT-NEXT: lxvd2x 0, 0, 4
+; 64BIT-NEXT: li 4, 512
; 64BIT-NEXT: xxlxor 3, 3, 3
-; 64BIT-NEXT: xxlxor 4, 4, 4
-; 64BIT-NEXT: xxlxor 5, 5, 5
; 64BIT-NEXT: stxvd2x 0, 1, 3
-; 64BIT-NEXT: li 3, 512
-; 64BIT-NEXT: std 4, 176(1)
+; 64BIT-NEXT: li 3, 0
+; 64BIT-NEXT: std 4, 184(1)
; 64BIT-NEXT: li 4, 256
+; 64BIT-NEXT: xxlxor 4, 4, 4
+; 64BIT-NEXT: xxlxor 5, 5, 5
; 64BIT-NEXT: xxlxor 6, 6, 6
+; 64BIT-NEXT: std 3, 176(1)
+; 64BIT-NEXT: li 3, 128
; 64BIT-NEXT: xxlxor 7, 7, 7
; 64BIT-NEXT: xxlxor 8, 8, 8
-; 64BIT-NEXT: std 3, 184(1)
-; 64BIT-NEXT: li 3, 128
; 64BIT-NEXT: xxlxor 9, 9, 9
; 64BIT-NEXT: xxlxor 10, 10, 10
; 64BIT-NEXT: xxlxor 11, 11, 11
diff --git a/llvm/test/CodeGen/PowerPC/memcpy-vec.ll b/llvm/test/CodeGen/PowerPC/memcpy-vec.ll
index d636921eea3e51..34a7af4bc45916 100644
--- a/llvm/test/CodeGen/PowerPC/memcpy-vec.ll
+++ b/llvm/test/CodeGen/PowerPC/memcpy-vec.ll
@@ -10,12 +10,8 @@ entry:
ret void
; PWR7-LABEL: @foo1
-; PWR7-NOT: bl memcpy
-; PWR7-DAG: li [[OFFSET:[0-9]+]], 16
-; PWR7-DAG: lxvd2x [[TMP0:[0-9]+]], 4, [[OFFSET]]
-; PWR7-DAG: stxvd2x [[TMP0]], 3, [[OFFSET]]
-; PWR7-DAG: lxvd2x [[TMP1:[0-9]+]], 0, 4
-; PWR7-DAG: stxvd2x [[TMP1]], 0, 3
+; PWR7: lxvw4x
+; PWR7: stxvw4x
; PWR7: blr
; PWR8-LABEL: @foo1
@@ -34,7 +30,8 @@ entry:
ret void
; PWR7-LABEL: @foo2
-; PWR7: bl memcpy
+; PWR7: lxvw4x
+; PWR7: stxvw4x
; PWR7: blr
; PWR8-LABEL: @foo2
diff --git a/llvm/test/CodeGen/PowerPC/unal-altivec-wint.ll b/llvm/test/CodeGen/PowerPC/unal-altivec-wint.ll
index d6244cd828e5a6..a590f54b5a6765 100644
--- a/llvm/test/CodeGen/PowerPC/unal-altivec-wint.ll
+++ b/llvm/test/CodeGen/PowerPC/unal-altivec-wint.ll
@@ -17,7 +17,7 @@ entry:
; CHECK-LABEL: @test1
; CHECK: li [[REG:[0-9]+]], 16
; CHECK-NOT: li {{[0-9]+}}, 15
-; CHECK-DAG: lvx {{[0-9]+}}, 0, 3
+; CHECK-DAG: lxvw4x {{[0-9]+}}, 0, 3
; CHECK-DAG: lvx {{[0-9]+}}, 3, [[REG]]
; CHECK: blr
}
@@ -36,8 +36,8 @@ entry:
; CHECK-LABEL: @test2
; CHECK: li [[REG:[0-9]+]], 16
; CHECK-NOT: li {{[0-9]+}}, 15
-; CHECK-DAG: lvx {{[0-9]+}}, 0, 3
-; CHECK-DAG: lvx {{[0-9]+}}, 3, [[REG]]
+; CHECK-DAG: stvx 2, 3, [[REG]]
+; CHECK-DAG: lxvw4x {{[0-9]+}}, 0, 3
; CHECK: blr
}
diff --git a/llvm/test/CodeGen/PowerPC/unal-altivec2.ll b/llvm/test/CodeGen/PowerPC/unal-altivec2.ll
index fafcab8468eb4d..39a82fe0a0977c 100644
--- a/llvm/test/CodeGen/PowerPC/unal-altivec2.ll
+++ b/llvm/test/CodeGen/PowerPC/unal-altivec2.ll
@@ -1,4 +1,4 @@
-; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr6 < %s | FileCheck %s
target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"
diff --git a/llvm/test/CodeGen/PowerPC/unal-vec-ldst.ll b/llvm/test/CodeGen/PowerPC/unal-vec-ldst.ll
index b0ed395fc3a190..c2e20149cb9fee 100644
--- a/llvm/test/CodeGen/PowerPC/unal-vec-ldst.ll
+++ b/llvm/test/CodeGen/PowerPC/unal-vec-ldst.ll
@@ -5,11 +5,7 @@
define <16 x i8> @test_l_v16i8(ptr %p) #0 {
; CHECK-LABEL: test_l_v16i8:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 15
-; CHECK-NEXT: lvsl 3, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
-; CHECK-NEXT: lvx 4, 0, 3
-; CHECK-NEXT: vperm 2, 4, 2, 3
+; CHECK-NEXT: lxvw4x 34, 0, 3
; CHECK-NEXT: blr
entry:
%r = load <16 x i8>, ptr %p, align 1
@@ -20,14 +16,9 @@ entry:
define <32 x i8> @test_l_v32i8(ptr %p) #0 {
; CHECK-LABEL: test_l_v32i8:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 31
-; CHECK-NEXT: lvsl 4, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
; CHECK-NEXT: li 4, 16
-; CHECK-NEXT: lvx 5, 3, 4
-; CHECK-NEXT: vperm 3, 5, 2, 4
-; CHECK-NEXT: lvx 2, 0, 3
-; CHECK-NEXT: vperm 2, 2, 5, 4
+; CHECK-NEXT: lxvw4x 34, 0, 3
+; CHECK-NEXT: lxvw4x 35, 3, 4
; CHECK-NEXT: blr
entry:
%r = load <32 x i8>, ptr %p, align 1
@@ -38,11 +29,7 @@ entry:
define <8 x i16> @test_l_v8i16(ptr %p) #0 {
; CHECK-LABEL: test_l_v8i16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 15
-; CHECK-NEXT: lvsl 3, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
-; CHECK-NEXT: lvx 4, 0, 3
-; CHECK-NEXT: vperm 2, 4, 2, 3
+; CHECK-NEXT: lxvw4x 34, 0, 3
; CHECK-NEXT: blr
entry:
%r = load <8 x i16>, ptr %p, align 2
@@ -53,14 +40,9 @@ entry:
define <16 x i16> @test_l_v16i16(ptr %p) #0 {
; CHECK-LABEL: test_l_v16i16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 31
-; CHECK-NEXT: lvsl 4, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
; CHECK-NEXT: li 4, 16
-; CHECK-NEXT: lvx 5, 3, 4
-; CHECK-NEXT: vperm 3, 5, 2, 4
-; CHECK-NEXT: lvx 2, 0, 3
-; CHECK-NEXT: vperm 2, 2, 5, 4
+; CHECK-NEXT: lxvw4x 34, 0, 3
+; CHECK-NEXT: lxvw4x 35, 3, 4
; CHECK-NEXT: blr
entry:
%r = load <16 x i16>, ptr %p, align 2
@@ -71,11 +53,7 @@ entry:
define <4 x i32> @test_l_v4i32(ptr %p) #0 {
; CHECK-LABEL: test_l_v4i32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 15
-; CHECK-NEXT: lvsl 3, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
-; CHECK-NEXT: lvx 4, 0, 3
-; CHECK-NEXT: vperm 2, 4, 2, 3
+; CHECK-NEXT: lxvw4x 34, 0, 3
; CHECK-NEXT: blr
entry:
%r = load <4 x i32>, ptr %p, align 4
@@ -86,14 +64,9 @@ entry:
define <8 x i32> @test_l_v8i32(ptr %p) #0 {
; CHECK-LABEL: test_l_v8i32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: li 4, 31
-; CHECK-NEXT: lvsl 4, 0, 3
-; CHECK-NEXT: lvx 2, 3, 4
; CHECK-NEXT: li 4, 16
-; CHECK-NEXT: lvx 5, 3, 4
-; CHECK-NEXT: vperm 3, 5, 2, 4
-; CHECK-NEXT: lvx 2, 0, 3
-; CHECK-NEXT: vperm 2, 2, 5, 4
+; CHECK-NEXT: lxvw4x 34, 0, 3
+; CHECK-NEXT: lxvw4x 35, 3, 4
; CHECK-NEXT: blr
entry:
%r = load <8 x i32>, ptr %p, align 4
@@ -128,11 +101,7 @@ entry:
def...
[truncated]
|
@@ -17250,8 +17250,7 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, | |||
EVT PPCTargetLowering::getOptimalMemOpType( | |||
const MemOp &Op, const AttributeList &FuncAttributes) const { | |||
if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) { | |||
// We should use Altivec/VSX loads and stores when available. For unaligned | |||
// addresses, unaligned VSX loads are only fast starting with the P8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you compared the performance w/o unaligned vector-scalar load/store on powr7? The comment looks indicating why it is not performed prior pwr8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On pwr7 an lxvw4x with < 8 byte alignment will be flushed and micro-coded, and with < 4 byte alignment will be an alignment interrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked on a PWR7 AIX server, with alignment like 1/2, AIX OS observed a misalign load/store operation. But the lxvw4x is able to execute successfully.
The performance shows lxvw4x is same with current lxvd2x
(see llvm/test/CodeGen/PowerPC/memcpy-vec.ll
). lxvw4x is better than scalar fix point ld
on AIX P7.
Tested on AIX servers, it works well both with functionality and performance.
a775f49
to
78f09cf
Compare
My understanding is that we should also use
stxvw4x
/lxvw4x
on P7 even for unaligned load/stores.P8 ISA and P7 ISA indicates that these two instructions are able to handle the unaligned addresses well.