[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740

dc03-work · 2024-02-23T08:14:06Z

i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done.

This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp.

The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added.

The code for SelectionDAG is located at SelectionDAG::getNeutralELement().

…,UMAX} for odd-sized vectors i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done. This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp. The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added. The code for SelectionDAG is located at SelectionDAG::getNeutralELement().

github-actions · 2024-02-23T08:14:23Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-02-23T08:14:53Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Dhruv Chawla (work) (dc03-work)

Changes

i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done.

This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp.

The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added.

The code for SelectionDAG is located at SelectionDAG::getNeutralELement().

Full diff: https://github.com/llvm/llvm-project/pull/82740.diff

5 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h (+4)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+65)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+7)
(modified) llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll (+47-144)
(modified) llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll (+6-21)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index 2beb9919418fc9..5bb3692f0a46b4 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -281,6 +281,10 @@ class LegalizerHelper {
                                          MachineInstr &MI,
                                          LostDebugLocObserver &LocObserver);
 
+  MachineInstrBuilder
+  getNeutralElementForVecReduce(unsigned Opcode, MachineIRBuilder &MIRBuilder,
+                                LLT Ty);
+
 public:
   /// Return the alignment to use for a stack temporary object with the given
   /// type.
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 30f12bf5cca586..1472fdf8129b56 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -5216,6 +5216,42 @@ LegalizerHelper::moreElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx,
   return Legalized;
 }
 
+MachineInstrBuilder LegalizerHelper::getNeutralElementForVecReduce(
+    unsigned Opcode, MachineIRBuilder &MIRBuilder, LLT Ty) {
+  assert(Ty.isScalar() && "Expected scalar type to make neutral element for");
+
+  switch (Opcode) {
+  default:
+    return MIRBuilder.buildUndef(Ty);
+  case TargetOpcode::G_VECREDUCE_ADD:
+  case TargetOpcode::G_VECREDUCE_OR:
+  case TargetOpcode::G_VECREDUCE_XOR:
+  case TargetOpcode::G_VECREDUCE_UMAX:
+    return MIRBuilder.buildConstant(Ty, 0);
+  case TargetOpcode::G_VECREDUCE_MUL:
+    return MIRBuilder.buildConstant(Ty, 1);
+  case TargetOpcode::G_VECREDUCE_AND:
+  case TargetOpcode::G_VECREDUCE_UMIN:
+    return MIRBuilder.buildConstant(
+        Ty, APInt::getAllOnes(Ty.getScalarSizeInBits()));
+  case TargetOpcode::G_VECREDUCE_SMAX:
+    return MIRBuilder.buildConstant(
+        Ty, APInt::getSignedMinValue(Ty.getSizeInBits()));
+  case TargetOpcode::G_VECREDUCE_SMIN:
+    return MIRBuilder.buildConstant(
+        Ty, APInt::getSignedMaxValue(Ty.getSizeInBits()));
+  case TargetOpcode::G_VECREDUCE_FADD:
+    return MIRBuilder.buildFConstant(Ty, -0.0);
+  case TargetOpcode::G_VECREDUCE_FMUL:
+    return MIRBuilder.buildFConstant(Ty, 1.0);
+  case TargetOpcode::G_VECREDUCE_FMINIMUM:
+  case TargetOpcode::G_VECREDUCE_FMAXIMUM:
+    assert(false && "getNeutralElementForVecReduce unimplemented for "
+                    "G_VECREDUCE_FMINIMUM and G_VECREDUCE_FMAXIMUM!");
+  }
+  llvm_unreachable("switch expected to return!");
+}
+
 LegalizerHelper::LegalizeResult
 LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
                                     LLT MoreTy) {
@@ -5420,6 +5456,35 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
     Observer.changedInstr(MI);
     return Legalized;
   }
+  case TargetOpcode::G_VECREDUCE_FADD:
+  case TargetOpcode::G_VECREDUCE_FMUL:
+  case TargetOpcode::G_VECREDUCE_ADD:
+  case TargetOpcode::G_VECREDUCE_MUL:
+  case TargetOpcode::G_VECREDUCE_AND:
+  case TargetOpcode::G_VECREDUCE_OR:
+  case TargetOpcode::G_VECREDUCE_XOR:
+  case TargetOpcode::G_VECREDUCE_SMAX:
+  case TargetOpcode::G_VECREDUCE_SMIN:
+  case TargetOpcode::G_VECREDUCE_UMAX:
+  case TargetOpcode::G_VECREDUCE_UMIN: {
+    LLT OrigTy = MRI.getType(MI.getOperand(1).getReg());
+    MachineOperand &MO = MI.getOperand(1);
+    auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO);
+    auto NeutralElement = getNeutralElementForVecReduce(
+        MI.getOpcode(), MIRBuilder, MoreTy.getElementType());
+    for (size_t i = OrigTy.getNumElements(), e = MoreTy.getNumElements();
+         i != e; i++) {
+      auto Idx = MIRBuilder.buildConstant(LLT::scalar(32), i);
+      NewVec = MIRBuilder.buildInsertVectorElement(MoreTy, NewVec,
+                                                   NeutralElement, Idx);
+    }
+
+    Observer.changingInstr(MI);
+    MO.setReg(NewVec.getReg(0));
+    Observer.changedInstr(MI);
+    return Legalized;
+  }
+
   default:
     return UnableToLegalize;
   }
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 60e046bc6cf407..a2e805e8cb56dc 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1074,6 +1074,13 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
                  {s16, v8s16},
                  {s32, v2s32},
                  {s32, v4s32}})
+      .moreElementsIf(
+          [=](const LegalityQuery &Query) {
+            return Query.Types[1].isVector() &&
+                   Query.Types[1].getElementType() != s8 &&
+                   Query.Types[1].getNumElements() & 1;
+          },
+          LegalizeMutations::moreElementsToNextPow2(1))
       .clampMaxNumElements(1, s64, 2)
       .clampMaxNumElements(1, s32, 4)
       .clampMaxNumElements(1, s16, 8)
diff --git a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
index 194fe5be40c2bd..76790d128d066c 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
@@ -595,30 +595,14 @@ entry:
 }
 
 define i16 @sminv_v3i16(<3 x i16> %a) {
-; CHECK-SD-LABEL: sminv_v3i16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 def $q0
-; CHECK-SD-NEXT:    mov w8, #32767 // =0x7fff
-; CHECK-SD-NEXT:    mov v0.h[3], w8
-; CHECK-SD-NEXT:    sminv h0, v0.4h
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: sminv_v3i16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT:    mov h1, v0.h[1]
-; CHECK-GI-NEXT:    smov w8, v0.h[0]
-; CHECK-GI-NEXT:    umov w9, v0.h[0]
-; CHECK-GI-NEXT:    umov w10, v0.h[1]
-; CHECK-GI-NEXT:    smov w11, v0.h[2]
-; CHECK-GI-NEXT:    umov w13, v0.h[2]
-; CHECK-GI-NEXT:    fmov w12, s1
-; CHECK-GI-NEXT:    cmp w8, w12, sxth
-; CHECK-GI-NEXT:    csel w8, w9, w10, lt
-; CHECK-GI-NEXT:    cmp w11, w8, sxth
-; CHECK-GI-NEXT:    csel w0, w8, w13, gt
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: sminv_v3i16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    // kill: def $d0 killed $d0 def $q0
+; CHECK-NEXT:    mov w8, #32767 // =0x7fff
+; CHECK-NEXT:    mov v0.h[3], w8
+; CHECK-NEXT:    sminv h0, v0.4h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
 entry:
   %arg1 = call i16 @llvm.vector.reduce.smin.v3i16(<3 x i16> %a)
   ret i16 %arg1
@@ -670,28 +654,13 @@ entry:
 }
 
 define i32 @sminv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: sminv_v3i32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    mov w8, #2147483647 // =0x7fffffff
-; CHECK-SD-NEXT:    mov v0.s[3], w8
-; CHECK-SD-NEXT:    sminv s0, v0.4s
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: sminv_v3i32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    mov s1, v0.s[1]
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    mov s2, v0.s[2]
-; CHECK-GI-NEXT:    fmov w9, s1
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fmov w9, s2
-; CHECK-GI-NEXT:    fcsel s0, s0, s1, lt
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fcsel s0, s0, s2, lt
-; CHECK-GI-NEXT:    fmov w0, s0
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: sminv_v3i32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #2147483647 // =0x7fffffff
+; CHECK-NEXT:    mov v0.s[3], w8
+; CHECK-NEXT:    sminv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
 entry:
   %arg1 = call i32 @llvm.vector.reduce.smin.v3i32(<3 x i32> %a)
   ret i32 %arg1
@@ -972,17 +941,10 @@ define i16 @smaxv_v3i16(<3 x i16> %a) {
 ; CHECK-GI-LABEL: smaxv_v3i16:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT:    mov h1, v0.h[1]
-; CHECK-GI-NEXT:    smov w8, v0.h[0]
-; CHECK-GI-NEXT:    umov w9, v0.h[0]
-; CHECK-GI-NEXT:    umov w10, v0.h[1]
-; CHECK-GI-NEXT:    smov w11, v0.h[2]
-; CHECK-GI-NEXT:    umov w13, v0.h[2]
-; CHECK-GI-NEXT:    fmov w12, s1
-; CHECK-GI-NEXT:    cmp w8, w12, sxth
-; CHECK-GI-NEXT:    csel w8, w9, w10, gt
-; CHECK-GI-NEXT:    cmp w11, w8, sxth
-; CHECK-GI-NEXT:    csel w0, w8, w13, lt
+; CHECK-GI-NEXT:    mov w8, #32768 // =0x8000
+; CHECK-GI-NEXT:    mov v0.h[3], w8
+; CHECK-GI-NEXT:    smaxv h0, v0.4h
+; CHECK-GI-NEXT:    fmov w0, s0
 ; CHECK-GI-NEXT:    ret
 entry:
   %arg1 = call i16 @llvm.vector.reduce.smax.v3i16(<3 x i16> %a)
@@ -1035,28 +997,13 @@ entry:
 }
 
 define i32 @smaxv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: smaxv_v3i32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    mov w8, #-2147483648 // =0x80000000
-; CHECK-SD-NEXT:    mov v0.s[3], w8
-; CHECK-SD-NEXT:    smaxv s0, v0.4s
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: smaxv_v3i32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    mov s1, v0.s[1]
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    mov s2, v0.s[2]
-; CHECK-GI-NEXT:    fmov w9, s1
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fmov w9, s2
-; CHECK-GI-NEXT:    fcsel s0, s0, s1, gt
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fcsel s0, s0, s2, gt
-; CHECK-GI-NEXT:    fmov w0, s0
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: smaxv_v3i32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT:    mov v0.s[3], w8
+; CHECK-NEXT:    smaxv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
 entry:
   %arg1 = call i32 @llvm.vector.reduce.smax.v3i32(<3 x i32> %a)
   ret i32 %arg1
@@ -1335,17 +1282,10 @@ define i16 @uminv_v3i16(<3 x i16> %a) {
 ; CHECK-GI-LABEL: uminv_v3i16:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT:    mov h1, v0.h[1]
-; CHECK-GI-NEXT:    umov w8, v0.h[0]
-; CHECK-GI-NEXT:    umov w9, v0.h[0]
-; CHECK-GI-NEXT:    umov w10, v0.h[1]
-; CHECK-GI-NEXT:    umov w11, v0.h[2]
-; CHECK-GI-NEXT:    umov w13, v0.h[2]
-; CHECK-GI-NEXT:    fmov w12, s1
-; CHECK-GI-NEXT:    cmp w8, w12, uxth
-; CHECK-GI-NEXT:    csel w8, w9, w10, lo
-; CHECK-GI-NEXT:    cmp w11, w8, uxth
-; CHECK-GI-NEXT:    csel w0, w8, w13, hi
+; CHECK-GI-NEXT:    mov w8, #65535 // =0xffff
+; CHECK-GI-NEXT:    mov v0.h[3], w8
+; CHECK-GI-NEXT:    uminv h0, v0.4h
+; CHECK-GI-NEXT:    fmov w0, s0
 ; CHECK-GI-NEXT:    ret
 entry:
   %arg1 = call i16 @llvm.vector.reduce.umin.v3i16(<3 x i16> %a)
@@ -1398,28 +1338,13 @@ entry:
 }
 
 define i32 @uminv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: uminv_v3i32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-SD-NEXT:    mov v0.s[3], w8
-; CHECK-SD-NEXT:    uminv s0, v0.4s
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: uminv_v3i32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    mov s1, v0.s[1]
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    mov s2, v0.s[2]
-; CHECK-GI-NEXT:    fmov w9, s1
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fmov w9, s2
-; CHECK-GI-NEXT:    fcsel s0, s0, s1, lo
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fcsel s0, s0, s2, lo
-; CHECK-GI-NEXT:    fmov w0, s0
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: uminv_v3i32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
+; CHECK-NEXT:    mov v0.s[3], w8
+; CHECK-NEXT:    uminv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
 entry:
   %arg1 = call i32 @llvm.vector.reduce.umin.v3i32(<3 x i32> %a)
   ret i32 %arg1
@@ -1697,17 +1622,10 @@ define i16 @umaxv_v3i16(<3 x i16> %a) {
 ; CHECK-GI-LABEL: umaxv_v3i16:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT:    mov h1, v0.h[1]
-; CHECK-GI-NEXT:    umov w8, v0.h[0]
-; CHECK-GI-NEXT:    umov w9, v0.h[0]
-; CHECK-GI-NEXT:    umov w10, v0.h[1]
-; CHECK-GI-NEXT:    umov w11, v0.h[2]
-; CHECK-GI-NEXT:    umov w13, v0.h[2]
-; CHECK-GI-NEXT:    fmov w12, s1
-; CHECK-GI-NEXT:    cmp w8, w12, uxth
-; CHECK-GI-NEXT:    csel w8, w9, w10, hi
-; CHECK-GI-NEXT:    cmp w11, w8, uxth
-; CHECK-GI-NEXT:    csel w0, w8, w13, lo
+; CHECK-GI-NEXT:    mov w8, #0 // =0x0
+; CHECK-GI-NEXT:    mov v0.h[3], w8
+; CHECK-GI-NEXT:    umaxv h0, v0.4h
+; CHECK-GI-NEXT:    fmov w0, s0
 ; CHECK-GI-NEXT:    ret
 entry:
   %arg1 = call i16 @llvm.vector.reduce.umax.v3i16(<3 x i16> %a)
@@ -1760,27 +1678,12 @@ entry:
 }
 
 define i32 @umaxv_v3i32(<3 x i32> %a) {
-; CHECK-SD-LABEL: umaxv_v3i32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    mov v0.s[3], wzr
-; CHECK-SD-NEXT:    umaxv s0, v0.4s
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: umaxv_v3i32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    mov s1, v0.s[1]
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    mov s2, v0.s[2]
-; CHECK-GI-NEXT:    fmov w9, s1
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fmov w9, s2
-; CHECK-GI-NEXT:    fcsel s0, s0, s1, hi
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fcsel s0, s0, s2, hi
-; CHECK-GI-NEXT:    fmov w0, s0
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: umaxv_v3i32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov v0.s[3], wzr
+; CHECK-NEXT:    umaxv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
 entry:
   %arg1 = call i32 @llvm.vector.reduce.umax.v3i32(<3 x i32> %a)
   ret i32 %arg1
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll b/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
index 8988481708cfb6..d71aed2d17506b 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
@@ -187,27 +187,12 @@ define i8 @test_v9i8(<9 x i8> %a) nounwind {
 }
 
 define i32 @test_v3i32(<3 x i32> %a) nounwind {
-; CHECK-SD-LABEL: test_v3i32:
-; CHECK-SD:       // %bb.0:
-; CHECK-SD-NEXT:    mov v0.s[3], wzr
-; CHECK-SD-NEXT:    umaxv s0, v0.4s
-; CHECK-SD-NEXT:    fmov w0, s0
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_v3i32:
-; CHECK-GI:       // %bb.0:
-; CHECK-GI-NEXT:    mov s1, v0.s[1]
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    mov s2, v0.s[2]
-; CHECK-GI-NEXT:    fmov w9, s1
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fmov w9, s2
-; CHECK-GI-NEXT:    fcsel s0, s0, s1, hi
-; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    cmp w8, w9
-; CHECK-GI-NEXT:    fcsel s0, s0, s2, hi
-; CHECK-GI-NEXT:    fmov w0, s0
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_v3i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov v0.s[3], wzr
+; CHECK-NEXT:    umaxv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %b = call i32 @llvm.vector.reduce.umax.v3i32(<3 x i32> %a)
   ret i32 %b
 }

davemgreen · 2024-02-25T15:36:01Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+        MI.getOpcode(), MIRBuilder, MoreTy.getElementType());
+    for (size_t i = OrigTy.getNumElements(), e = MoreTy.getNumElements();
+         i != e; i++) {
+      auto Idx = MIRBuilder.buildConstant(LLT::scalar(32), i);


Can this use TLI.getVectorIdxTy() for the index type.

davemgreen · 2024-02-25T15:37:59Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+  case TargetOpcode::G_VECREDUCE_UMIN: {
+    LLT OrigTy = MRI.getType(MI.getOperand(1).getReg());
+    MachineOperand &MO = MI.getOperand(1);
+    auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO);


This builds a vector with undef, then inserts identity elements into it? Could it instead just build directly, with the new identity element?

Using G_BUILD_VECTOR seems to cause quite a few regressions: https://gist.github.com/dc03-work/66c179f4d380cfac3ed7b36525a583ef.

It sounds like there is a combine to clear up the undef version, but not the direct one.

davemgreen · 2024-02-25T15:40:47Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+  default:
+    return MIRBuilder.buildUndef(Ty);


Can the default be the llvm_unreachable.

davemgreen · 2024-02-25T16:19:25Z

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

+      .moreElementsIf(
+          [=](const LegalityQuery &Query) {
+            return Query.Types[1].isVector() &&
+                   Query.Types[1].getElementType() != s8 &&


The i8 issues look like they might be coming from the calling convention of v3i8 types. They might look better of the inputs were vectors. They might not be quite correct at the moment if they were enabled though. I wouldn't be against just using moreElementsToNextPow2 in the long run, if it doesn't need a more precise "cost model" than that.

davemgreen

In that case this LGTM. Thanks for the patch!

davemgreen · 2024-02-27T10:15:06Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+  case TargetOpcode::G_VECREDUCE_UMIN: {
+    LLT OrigTy = MRI.getType(MI.getOperand(1).getReg());
+    MachineOperand &MO = MI.getOperand(1);
+    auto NewVec = MIRBuilder.buildPadVectorWithUndefElements(MoreTy, MO);


It sounds like there is a combine to clear up the undef version, but not the direct one.

github-actions · 2024-02-27T10:28:04Z

@dc03-work Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may recieve a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

dc03-work requested review from madhur13490 and davemgreen February 23, 2024 08:14

llvmbot added backend:AArch64 llvm:globalisel labels Feb 23, 2024

This was referenced Feb 23, 2024

[GlobalISel] Add support for most G_VECREDUCE_* operations to moreElementsVector #81830

Closed

[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #81831

Closed

madhur13490 requested review from aemerson and arsenm February 23, 2024 08:57

davemgreen reviewed Feb 25, 2024

View reviewed changes

Address reviewer comments

e76f05c

davemgreen approved these changes Feb 27, 2024

View reviewed changes

dc03-work merged commit 2c9b6c1 into llvm:main Feb 27, 2024
4 checks passed

dc03-work mentioned this pull request Mar 1, 2024

[AArch64][GlobalISel] Expand abs.v4i8 to v4i16 and abs.v2s16 to v2s32 #81231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740

[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740

dc03-work commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

llvmbot commented Feb 23, 2024 •

edited

davemgreen Feb 25, 2024

davemgreen Feb 25, 2024

dc03-work Feb 26, 2024

davemgreen Feb 27, 2024

davemgreen Feb 25, 2024

davemgreen Feb 25, 2024

davemgreen left a comment

davemgreen Feb 27, 2024

github-actions bot commented Feb 27, 2024

[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740

[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors #82740

Conversation

dc03-work commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

llvmbot commented Feb 23, 2024 • edited

davemgreen Feb 25, 2024

Choose a reason for hiding this comment

davemgreen Feb 25, 2024

Choose a reason for hiding this comment

dc03-work Feb 26, 2024

Choose a reason for hiding this comment

davemgreen Feb 27, 2024

Choose a reason for hiding this comment

davemgreen Feb 25, 2024

Choose a reason for hiding this comment

davemgreen Feb 25, 2024

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen Feb 27, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 27, 2024

llvmbot commented Feb 23, 2024 •

edited