-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740
Conversation
…,UMAX} for odd-sized vectors i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done. This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp. The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added. The code for SelectionDAG is located at SelectionDAG::getNeutralELement().
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: Dhruv Chawla (work) (dc03-work) Changesi8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done. This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp. The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added. The code for SelectionDAG is located at SelectionDAG::getNeutralELement(). Full diff: https://github.com/llvm/llvm-project/pull/82740.diff 5 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index 2beb9919418fc9..5bb3692f0a46b4 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -281,6 +281,10 @@ class LegalizerHelper {
MachineInstr &MI,
LostDebugLocObserver &LocObserver);
+ MachineInstrBuilder
+ getNeutralElementForVecReduce(unsigned Opcode, MachineIRBuilder &MIRBuilder,
+ LLT Ty);
+
public:
/// Return the alignment to use for a stack temporary object with the given
/// type.
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 30f12bf5cca586..1472fdf8129b56 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -5216,6 +5216,42 @@ LegalizerHelper::moreElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx,
return Legalized;
}
+MachineInstrBuilder LegalizerHelper::getNeutralElementForVecReduce(
+ unsigned Opcode, MachineIRBuilder &MIRBuilder, LLT Ty) {
+ assert(Ty.isScalar() && "Expected scalar type to make neutral element for");
+
+ switch (Opcode) {
+ default:
+ return MIRBuilder.buildUndef(Ty);
+ case TargetOpcode::G_VECREDUCE_ADD:
+ case TargetOpcode::G_VECREDUCE_OR:
+ case TargetOpcode::G_VECREDUCE_XOR:
+ case TargetOpcode::G_VECREDUCE_UMAX:
+ return MIRBuilder.buildConstant(Ty, 0);
+ case TargetOpcode::G_VECREDUCE_MUL:
+ return MIRBuilder.buildConstant(Ty, 1);
+ case TargetOpcode::G_VECREDUCE_AND:
+ case TargetOpcode::G_VECREDUCE_UMIN:
+ return MIRBuilder.buildConstant(
+ Ty, APInt::getAllOnes(Ty.getScalarSizeInBits()));
+ case TargetOpcode::G_VECREDUCE_SMAX:
+ return MIRBuilder.buildConstant(
+ Ty, APInt::getSignedMinValue(Ty.getSizeInBits()));
+ case TargetOpcode::G_VECREDUCE_SMIN:
+ return MIRBuilder.buildConstant(
+ Ty, APInt::getSignedMaxValue(Ty.getSizeInBits()));
+ case TargetOpcode::G_VECREDUCE_FADD:
+ return MIRBuilder.buildFConstant(Ty, -0.0);
+ case TargetOpcode::G_VECREDUCE_FMUL:
+ return MIRBuilder.buildFConstant(Ty, 1.0);
+ case TargetOpcode::G_VECREDUCE_FMINIMUM:
+ case TargetOpcode::G_VECREDUCE_FMAXIMUM:
+ assert(false && "getNeutralElementForVecReduce unimplemented for "
+ "G_VECREDUCE_FMINIMUM and G_VECREDUCE_FMAXIMUM!");
+ }
+ llvm_unreachable("switch expected to return!");
+}
+
LegalizerHelper::LegalizeResult
LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
LLT MoreTy) {
@@ -5420,6 +5456,35 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
Observer.changedInstr(MI);
return Legalized;
}
+ case TargetOpcode::G_VECREDUCE_FADD:
+ case TargetOpcode::G_VECREDUCE_FMUL:
+ case TargetOpcode::G_VECREDUCE_ADD:
+ case TargetOpcode::G_VECREDUCE_MUL:
+ case TargetOpcode::G_VECREDUCE_AND:
+ case TargetOpcode::G_VECREDUCE_OR:
+ case TargetOpcode::G_VECREDUCE_XOR:
+ case TargetOpcode::G_VECREDUCE_SMAX:
+ case TargetOpcode::G_VECREDUCE_SMIN:
+ case TargetOpcode::G_VECREDUCE_UMAX:
+ case TargetOpcode::G_VECREDUCE_UMIN: {
+ LLT OrigTy = MRI.getType(MI.getOperand(1).getReg());
+ MachineOperand &MO = MI.getOperand(1);
+ auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO);
+ auto NeutralElement = getNeutralElementForVecReduce(
+ MI.getOpcode(), MIRBuilder, MoreTy.getElementType());
+ for (size_t i = OrigTy.getNumElements(), e = MoreTy.getNumElements();
+ i != e; i++) {
+ auto Idx = MIRBuilder.buildConstant(LLT::scalar(32), i);
+ NewVec = MIRBuilder.buildInsertVectorElement(MoreTy, NewVec,
+ NeutralElement, Idx);
+ }
+
+ Observer.changingInstr(MI);
+ MO.setReg(NewVec.getReg(0));
+ Observer.changedInstr(MI);
+ return Legalized;
+ }
+
default:
return UnableToLegalize;
}
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 60e046bc6cf407..a2e805e8cb56dc 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1074,6 +1074,13 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
{s16, v8s16},
{s32, v2s32},
{s32, v4s32}})
+ .moreElementsIf(
+ [=](const LegalityQuery &Query) {
+ return Query.Types[1].isVector() &&
+ Query.Types[1].getElementType() != s8 &&
+ Query.Types[1].getNumElements() & 1;
+ },
+ LegalizeMutations::moreElementsToNextPow2(1))
.clampMaxNumElements(1, s64, 2)
.clampMaxNumElements(1, s32, 4)
.clampMaxNumElements(1, s16, 8)
diff --git a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
index 194fe5be40c2bd..76790d128d066c 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
@@ -595,30 +595,14 @@ entry:
}
define i16 @sminv_v3i16(<3 x i16> %a) {
-; CHECK-SD-LABEL: sminv_v3i16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-SD-NEXT: mov w8, #32767 // =0x7fff
-; CHECK-SD-NEXT: mov v0.h[3], w8
-; CHECK-SD-NEXT: sminv h0, v0.4h
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: sminv_v3i16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: mov h1, v0.h[1]
-; CHECK-GI-NEXT: smov w8, v0.h[0]
-; CHECK-GI-NEXT: umov w9, v0.h[0]
-; CHECK-GI-NEXT: umov w10, v0.h[1]
-; CHECK-GI-NEXT: smov w11, v0.h[2]
-; CHECK-GI-NEXT: umov w13, v0.h[2]
-; CHECK-GI-NEXT: fmov w12, s1
-; CHECK-GI-NEXT: cmp w8, w12, sxth
-; CHECK-GI-NEXT: csel w8, w9, w10, lt
-; CHECK-GI-NEXT: cmp w11, w8, sxth
-; CHECK-GI-NEXT: csel w0, w8, w13, gt
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: sminv_v3i16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
+; CHECK-NEXT: mov w8, #32767 // =0x7fff
+; CHECK-NEXT: mov v0.h[3], w8
+; CHECK-NEXT: sminv h0, v0.4h
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
entry:
%arg1 = call i16 @llvm.vector.reduce.smin.v3i16(<3 x i16> %a)
ret i16 %arg1
@@ -670,28 +654,13 @@ entry:
}
define i32 @sminv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: sminv_v3i32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: mov w8, #2147483647 // =0x7fffffff
-; CHECK-SD-NEXT: mov v0.s[3], w8
-; CHECK-SD-NEXT: sminv s0, v0.4s
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: sminv_v3i32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: mov s1, v0.s[1]
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: mov s2, v0.s[2]
-; CHECK-GI-NEXT: fmov w9, s1
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fmov w9, s2
-; CHECK-GI-NEXT: fcsel s0, s0, s1, lt
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fcsel s0, s0, s2, lt
-; CHECK-GI-NEXT: fmov w0, s0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: sminv_v3i32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov w8, #2147483647 // =0x7fffffff
+; CHECK-NEXT: mov v0.s[3], w8
+; CHECK-NEXT: sminv s0, v0.4s
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
entry:
%arg1 = call i32 @llvm.vector.reduce.smin.v3i32(<3 x i32> %a)
ret i32 %arg1
@@ -972,17 +941,10 @@ define i16 @smaxv_v3i16(<3 x i16> %a) {
; CHECK-GI-LABEL: smaxv_v3i16:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: mov h1, v0.h[1]
-; CHECK-GI-NEXT: smov w8, v0.h[0]
-; CHECK-GI-NEXT: umov w9, v0.h[0]
-; CHECK-GI-NEXT: umov w10, v0.h[1]
-; CHECK-GI-NEXT: smov w11, v0.h[2]
-; CHECK-GI-NEXT: umov w13, v0.h[2]
-; CHECK-GI-NEXT: fmov w12, s1
-; CHECK-GI-NEXT: cmp w8, w12, sxth
-; CHECK-GI-NEXT: csel w8, w9, w10, gt
-; CHECK-GI-NEXT: cmp w11, w8, sxth
-; CHECK-GI-NEXT: csel w0, w8, w13, lt
+; CHECK-GI-NEXT: mov w8, #32768 // =0x8000
+; CHECK-GI-NEXT: mov v0.h[3], w8
+; CHECK-GI-NEXT: smaxv h0, v0.4h
+; CHECK-GI-NEXT: fmov w0, s0
; CHECK-GI-NEXT: ret
entry:
%arg1 = call i16 @llvm.vector.reduce.smax.v3i16(<3 x i16> %a)
@@ -1035,28 +997,13 @@ entry:
}
define i32 @smaxv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: smaxv_v3i32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: mov w8, #-2147483648 // =0x80000000
-; CHECK-SD-NEXT: mov v0.s[3], w8
-; CHECK-SD-NEXT: smaxv s0, v0.4s
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: smaxv_v3i32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: mov s1, v0.s[1]
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: mov s2, v0.s[2]
-; CHECK-GI-NEXT: fmov w9, s1
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fmov w9, s2
-; CHECK-GI-NEXT: fcsel s0, s0, s1, gt
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fcsel s0, s0, s2, gt
-; CHECK-GI-NEXT: fmov w0, s0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: smaxv_v3i32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT: mov v0.s[3], w8
+; CHECK-NEXT: smaxv s0, v0.4s
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
entry:
%arg1 = call i32 @llvm.vector.reduce.smax.v3i32(<3 x i32> %a)
ret i32 %arg1
@@ -1335,17 +1282,10 @@ define i16 @uminv_v3i16(<3 x i16> %a) {
; CHECK-GI-LABEL: uminv_v3i16:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: mov h1, v0.h[1]
-; CHECK-GI-NEXT: umov w8, v0.h[0]
-; CHECK-GI-NEXT: umov w9, v0.h[0]
-; CHECK-GI-NEXT: umov w10, v0.h[1]
-; CHECK-GI-NEXT: umov w11, v0.h[2]
-; CHECK-GI-NEXT: umov w13, v0.h[2]
-; CHECK-GI-NEXT: fmov w12, s1
-; CHECK-GI-NEXT: cmp w8, w12, uxth
-; CHECK-GI-NEXT: csel w8, w9, w10, lo
-; CHECK-GI-NEXT: cmp w11, w8, uxth
-; CHECK-GI-NEXT: csel w0, w8, w13, hi
+; CHECK-GI-NEXT: mov w8, #65535 // =0xffff
+; CHECK-GI-NEXT: mov v0.h[3], w8
+; CHECK-GI-NEXT: uminv h0, v0.4h
+; CHECK-GI-NEXT: fmov w0, s0
; CHECK-GI-NEXT: ret
entry:
%arg1 = call i16 @llvm.vector.reduce.umin.v3i16(<3 x i16> %a)
@@ -1398,28 +1338,13 @@ entry:
}
define i32 @uminv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: uminv_v3i32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: mov w8, #-1 // =0xffffffff
-; CHECK-SD-NEXT: mov v0.s[3], w8
-; CHECK-SD-NEXT: uminv s0, v0.4s
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: uminv_v3i32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: mov s1, v0.s[1]
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: mov s2, v0.s[2]
-; CHECK-GI-NEXT: fmov w9, s1
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fmov w9, s2
-; CHECK-GI-NEXT: fcsel s0, s0, s1, lo
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fcsel s0, s0, s2, lo
-; CHECK-GI-NEXT: fmov w0, s0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: uminv_v3i32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov w8, #-1 // =0xffffffff
+; CHECK-NEXT: mov v0.s[3], w8
+; CHECK-NEXT: uminv s0, v0.4s
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
entry:
%arg1 = call i32 @llvm.vector.reduce.umin.v3i32(<3 x i32> %a)
ret i32 %arg1
@@ -1697,17 +1622,10 @@ define i16 @umaxv_v3i16(<3 x i16> %a) {
; CHECK-GI-LABEL: umaxv_v3i16:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: mov h1, v0.h[1]
-; CHECK-GI-NEXT: umov w8, v0.h[0]
-; CHECK-GI-NEXT: umov w9, v0.h[0]
-; CHECK-GI-NEXT: umov w10, v0.h[1]
-; CHECK-GI-NEXT: umov w11, v0.h[2]
-; CHECK-GI-NEXT: umov w13, v0.h[2]
-; CHECK-GI-NEXT: fmov w12, s1
-; CHECK-GI-NEXT: cmp w8, w12, uxth
-; CHECK-GI-NEXT: csel w8, w9, w10, hi
-; CHECK-GI-NEXT: cmp w11, w8, uxth
-; CHECK-GI-NEXT: csel w0, w8, w13, lo
+; CHECK-GI-NEXT: mov w8, #0 // =0x0
+; CHECK-GI-NEXT: mov v0.h[3], w8
+; CHECK-GI-NEXT: umaxv h0, v0.4h
+; CHECK-GI-NEXT: fmov w0, s0
; CHECK-GI-NEXT: ret
entry:
%arg1 = call i16 @llvm.vector.reduce.umax.v3i16(<3 x i16> %a)
@@ -1760,27 +1678,12 @@ entry:
}
define i32 @umaxv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: umaxv_v3i32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: mov v0.s[3], wzr
-; CHECK-SD-NEXT: umaxv s0, v0.4s
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: umaxv_v3i32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: mov s1, v0.s[1]
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: mov s2, v0.s[2]
-; CHECK-GI-NEXT: fmov w9, s1
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fmov w9, s2
-; CHECK-GI-NEXT: fcsel s0, s0, s1, hi
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fcsel s0, s0, s2, hi
-; CHECK-GI-NEXT: fmov w0, s0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: umaxv_v3i32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov v0.s[3], wzr
+; CHECK-NEXT: umaxv s0, v0.4s
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
entry:
%arg1 = call i32 @llvm.vector.reduce.umax.v3i32(<3 x i32> %a)
ret i32 %arg1
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll b/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
index 8988481708cfb6..d71aed2d17506b 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
@@ -187,27 +187,12 @@ define i8 @test_v9i8(<9 x i8> %a) nounwind {
}
define i32 @test_v3i32(<3 x i32> %a) nounwind {
-; CHECK-SD-LABEL: test_v3i32:
-; CHECK-SD: // %bb.0:
-; CHECK-SD-NEXT: mov v0.s[3], wzr
-; CHECK-SD-NEXT: umaxv s0, v0.4s
-; CHECK-SD-NEXT: fmov w0, s0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_v3i32:
-; CHECK-GI: // %bb.0:
-; CHECK-GI-NEXT: mov s1, v0.s[1]
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: mov s2, v0.s[2]
-; CHECK-GI-NEXT: fmov w9, s1
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fmov w9, s2
-; CHECK-GI-NEXT: fcsel s0, s0, s1, hi
-; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: cmp w8, w9
-; CHECK-GI-NEXT: fcsel s0, s0, s2, hi
-; CHECK-GI-NEXT: fmov w0, s0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_v3i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov v0.s[3], wzr
+; CHECK-NEXT: umaxv s0, v0.4s
+; CHECK-NEXT: fmov w0, s0
+; CHECK-NEXT: ret
%b = call i32 @llvm.vector.reduce.umax.v3i32(<3 x i32> %a)
ret i32 %b
}
|
MI.getOpcode(), MIRBuilder, MoreTy.getElementType()); | ||
for (size_t i = OrigTy.getNumElements(), e = MoreTy.getNumElements(); | ||
i != e; i++) { | ||
auto Idx = MIRBuilder.buildConstant(LLT::scalar(32), i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use TLI.getVectorIdxTy() for the index type.
case TargetOpcode::G_VECREDUCE_UMIN: { | ||
LLT OrigTy = MRI.getType(MI.getOperand(1).getReg()); | ||
MachineOperand &MO = MI.getOperand(1); | ||
auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This builds a vector with undef, then inserts identity elements into it? Could it instead just build directly, with the new identity element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using G_BUILD_VECTOR seems to cause quite a few regressions: https://gist.github.com/dc03-work/66c179f4d380cfac3ed7b36525a583ef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like there is a combine to clear up the undef version, but not the direct one.
default: | ||
return MIRBuilder.buildUndef(Ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the default be the llvm_unreachable.
.moreElementsIf( | ||
[=](const LegalityQuery &Query) { | ||
return Query.Types[1].isVector() && | ||
Query.Types[1].getElementType() != s8 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The i8 issues look like they might be coming from the calling convention of v3i8 types. They might look better of the inputs were vectors. They might not be quite correct at the moment if they were enabled though. I wouldn't be against just using moreElementsToNextPow2 in the long run, if it doesn't need a more precise "cost model" than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case this LGTM. Thanks for the patch!
case TargetOpcode::G_VECREDUCE_UMIN: { | ||
LLT OrigTy = MRI.getType(MI.getOperand(1).getReg()); | ||
MachineOperand &MO = MI.getOperand(1); | ||
auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like there is a combine to clear up the undef version, but not the direct one.
@dc03-work Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done.
This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp.
The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added.
The code for SelectionDAG is located at SelectionDAG::getNeutralELement().