[InstCombine] Transform `vector.reduce.add` and `splat` into multiplication #161020

spaits · 2025-09-27T20:30:03Z

Whenever we have a vector with all the same elemnts, created with insertelement and shufflevector and we sum the vector, we have a multiplication.

…32 %0, 2` Fixes llvm#160066 Whenever we have a vector with all the same elemnts, created with `insertelement` and `shufflevector` and the result type's element number is a power of two and we sum the vector, we have a multiplication by a power of two, which can be replaced with a left shift.

nikic · 2025-09-27T20:36:33Z

This should not be limited to powers of two. You can just emit a multiply and it will get folded to a shift in the power of two case.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

spaits · 2025-09-27T21:34:25Z

Thank you very much for your review @nikic . I am really happy that you have suggested to optimize the non power of two cases. It was fun implementig those too. :)

I am now using getSplatValue instead of my custom match.
I am properly constructing APInts now.
Addeds tests for i64 types so now there aren't only i32 tests.

I am also open to any further potential improvement idea for this patch.

spaits · 2025-09-27T22:05:03Z

There is one lldb failure on Linux. I think that is just a flaky test case, which isn't caused by this PR. I will retrigger the CI.

llvmbot · 2025-09-27T22:06:55Z

@llvm/pr-subscribers-llvm-transforms

Author: Gábor Spaits (spaits)

Changes

Fixes #160066

Whenever we have a vector with all the same elemnts, created with insertelement and shufflevector and we sum the vector, we have a multiplication.

Full diff: https://github.com/llvm/llvm-project/pull/161020.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+36)
(modified) llvm/test/Transforms/InstCombine/vector-reductions.ll (+90)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 6ad493772d170..74c263e86f4a4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -64,6 +64,7 @@
 #include "llvm/Support/KnownBits.h"
 #include "llvm/Support/KnownFPClass.h"
 #include "llvm/Support/MathExtras.h"
+#include "llvm/Support/TypeSize.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/InstCombine/InstCombiner.h"
 #include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
@@ -3761,6 +3762,41 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
             return replaceInstUsesWith(CI, Res);
           }
       }
+
+      // Handle the case where a value is multiplied by a power of two.
+      // For example:
+      // %2 = insertelement <4 x i32> poison, i32 %0, i64 0
+      // %3 = shufflevector <4 x i32> %2, poison, <4 x i32> zeroinitializer
+      // %4 = tail call i32 @llvm.vector.reduce.add.v4i32(%3)
+      // =>
+      // %2 = shl i32 %0, 2
+      assert(Arg->getType()->isVectorTy() &&
+             "The vector.reduce.add intrinsic's argument must be a vector!");
+
+      if (Value *Splat = getSplatValue(Arg)) {
+        // It is only a multiplication if we add the same element over and over.
+        ElementCount ReducedVectorElementCount =
+            static_cast<VectorType *>(Arg->getType())->getElementCount();
+        if (ReducedVectorElementCount.isFixed()) {
+          unsigned VectorSize = ReducedVectorElementCount.getFixedValue();
+          Type *SplatType = Splat->getType();
+          unsigned SplatTypeWidth = SplatType->getIntegerBitWidth();
+          Value *Res;
+          // Power of two is a special case. We can just use a left shif here.
+          if (isPowerOf2_32(VectorSize)) {
+            unsigned Pow2 = Log2_32(VectorSize);
+            Res = Builder.CreateShl(
+                Splat, Constant::getIntegerValue(SplatType,
+                                                 APInt(SplatTypeWidth, Pow2)));
+            return replaceInstUsesWith(CI, Res);
+          }
+          // Otherwise just multiply.
+          Res = Builder.CreateMul(
+              Splat, Constant::getIntegerValue(
+                         SplatType, APInt(SplatTypeWidth, VectorSize)));
+          return replaceInstUsesWith(CI, Res);
+        }
+      }
     }
     [[fallthrough]];
   }
diff --git a/llvm/test/Transforms/InstCombine/vector-reductions.ll b/llvm/test/Transforms/InstCombine/vector-reductions.ll
index 10f4aca72dbc7..e071415d2d6c1 100644
--- a/llvm/test/Transforms/InstCombine/vector-reductions.ll
+++ b/llvm/test/Transforms/InstCombine/vector-reductions.ll
@@ -308,3 +308,93 @@ define i32 @diff_of_sums_type_mismatch2(<8 x i32> %v0, <4 x i32> %v1) {
   %r = sub i32 %r0, %r1
   ret i32 %r
 }
+
+define i32 @constant_multiplied_at_0(i32 %0) {
+; CHECK-LABEL: @constant_multiplied_at_0(
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i32 [[TMP0:%.*]], 2
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 0
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison, <4 x i32> zeroinitializer
+  %4 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %3)
+  ret i32 %4
+}
+
+define i64 @constant_multiplied_at_0_64bits(i64 %0) {
+; CHECK-LABEL: @constant_multiplied_at_0_64bits(
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i64 [[TMP0:%.*]], 2
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = insertelement <4 x i64> poison, i64 %0, i64 0
+  %3 = shufflevector <4 x i64> %2, <4 x i64> poison, <4 x i32> zeroinitializer
+  %4 = tail call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> %3)
+  ret i64 %4
+}
+
+define i32 @constant_multiplied_at_0_two_pow8(i32 %0) {
+; CHECK-LABEL: @constant_multiplied_at_0_two_pow8(
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i32 [[TMP0:%.*]], 3
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 0
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison, <8 x i32> zeroinitializer
+  %4 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %3)
+  ret i32 %4
+}
+
+
+define i32 @constant_multiplied_at_0_two_pow16(i32 %0) {
+; CHECK-LABEL: @constant_multiplied_at_0_two_pow16(
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i32 [[TMP0:%.*]], 4
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 0
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison, <16 x i32> zeroinitializer
+  %4 = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %3)
+  ret i32 %4
+}
+
+
+define i32 @constant_multiplied_at_1(i32 %0) {
+; CHECK-LABEL: @constant_multiplied_at_1(
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i32 [[TMP0:%.*]], 2
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 1
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison,
+  <4 x i32> <i32 1, i32 1, i32 1, i32 1>
+  %4 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %3)
+  ret i32 %4
+}
+
+define i32 @negative_constant_multiplied_at_1(i32 %0) {
+; CHECK-LABEL: @negative_constant_multiplied_at_1(
+; CHECK-NEXT:    ret i32 poison
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 1
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison, <4 x i32> zeroinitializer
+  %4 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %3)
+  ret i32 %4
+}
+
+define i32 @constant_multiplied_non_power_of_2(i32 %0) {
+; CHECK-LABEL: @constant_multiplied_non_power_of_2(
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i32 [[TMP0:%.*]], 6
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = insertelement <4 x i32> poison, i32 %0, i64 0
+  %3 = shufflevector <4 x i32> %2, <4 x i32> poison, <6 x i32> zeroinitializer
+  %4 = tail call i32 @llvm.vector.reduce.add.v6i32(<6 x i32> %3)
+  ret i32 %4
+}
+
+define i64 @constant_multiplied_non_power_of_2_i64(i64 %0) {
+; CHECK-LABEL: @constant_multiplied_non_power_of_2_i64(
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP0:%.*]], 6
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = insertelement <4 x i64> poison, i64 %0, i64 0
+  %3 = shufflevector <4 x i64> %2, <4 x i64> poison, <6 x i32> zeroinitializer
+  %4 = tail call i64 @llvm.vector.reduce.add.v6i64(<6 x i64> %3)
+  ret i64 %4
+}

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

spaits · 2025-09-28T09:57:34Z

Thank you very much for your review @XChy . I have addressed your comments.

llvm/test/Transforms/InstCombine/vector-reductions.ll

XChy · 2025-09-28T13:30:00Z

@zyw-bot mfuzz

nikic · 2025-09-28T14:38:31Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+        assert(Arg->getType()->isVectorTy() &&
+               "The vector.reduce.add intrinsic's argument must be a vector!");
+        ElementCount ReducedVectorElementCount =
+            static_cast<VectorType *>(Arg->getType())->getElementCount();


Suggested change

static_cast<VectorType *>(Arg->getType())->getElementCount();

cast<VectorType>(Arg->getType())->getElementCount();

And remove the assert.

nikic · 2025-09-28T14:40:22Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+          Value *Res =
+              Builder.CreateMul(Splat, ConstantInt::get(SplatType, VectorSize));
+          return replaceInstUsesWith(CI, Res);


Suggested change

Value *Res =

Builder.CreateMul(Splat, ConstantInt::get(SplatType, VectorSize));

return replaceInstUsesWith(CI, Res);

return BinaryOperator::CreateMul(Splat, ConstantInt::get(SplatType, VectorSize));

llvm/test/Transforms/InstCombine/vector-reductions.ll

nikic · 2025-09-28T14:42:05Z

llvm/test/Transforms/InstCombine/vector-reductions.ll

+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <2 x i1> [[TMP2]], <2 x i1> poison, <2 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast <2 x i1> [[TMP3]] to i2
+; CHECK-NEXT:    [[TMP5:%.*]] = call range(i2 0, -1) i2 @llvm.ctpop.i2(i2 [[TMP4]])
+; CHECK-NEXT:    [[TMP6:%.*]] = trunc i2 [[TMP5]] to i1


No need for so many i1 tests that don't hit this code path anyway. I'd suggest adding additional i2 tests instead, which make it a bit clearer what is going on (e.g. v5i2 and v6i2).

…tUsesWith

spaits · 2025-09-28T15:25:37Z

llvm/test/Transforms/InstCombine/vector-reductions.ll

+  ret i2 %4
+}
+
+define i2 @constant_multiplied_5xi2(i2 %0) {


https://alive2.llvm.org/ce/z/PPf5c7

spaits · 2025-09-28T15:27:34Z

llvm/test/Transforms/InstCombine/vector-reductions.ll

+  ret i2 %4
+}
+
+define i2 @constant_multiplied_7xi2(i2 %0) {


https://alive2.llvm.org/ce/z/ELzRN7

spaits · 2025-09-28T15:29:12Z

llvm/test/Transforms/InstCombine/vector-reductions.ll

+  ret i2 %4
+}
+
+define i2 @constant_multiplied_6xi2(i2 %0) {


https://alive2.llvm.org/ce/z/atkDsk

XChy

LGTM except a nit, cheers.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

github-actions · 2025-09-28T19:15:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm-ci · 2025-09-29T01:51:41Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 11 "ninja check 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/13019

Here is the relevant piece of the build log for the reference

Step 11 (ninja check 2) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
PASS: ThreadSanitizer-powerpc64le :: restore_stack.cpp (102123 of 102137)
PASS: mlgo-utils :: corpus/extract_ir_script.test (102124 of 102137)
PASS: LLVM :: CodeGen/AMDGPU/elf-header-flags-mach.ll (102125 of 102137)
PASS: LLVM :: CodeGen/X86/cpus-intel.ll (102126 of 102137)
PASS: LLVM :: CodeGen/AMDGPU/directive-amdgcn-target.ll (102127 of 102137)
PASS: SanitizerCommon-lsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (102128 of 102137)
PASS: SanitizerCommon-asan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (102129 of 102137)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (102130 of 102137)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (102131 of 102137)
PASS: SanitizerCommon-ubsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (102132 of 102137)
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=6144.902318

…cation (llvm#161020) Fixes llvm#160066 Whenever we have a vector with all the same elemnts, created with `insertelement` and `shufflevector` and we sum the vector, we have a multiplication.

nikic reviewed Sep 27, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

spaits added 2 commits September 27, 2025 23:17

Use getSplatValue and correctly construct APInt and add i64 test

fb492be

Address non power of 2 cases

e9cc989

spaits changed the title ~~[InstCombine] Transform vector.reduce.add (splat %0, 4) into shl i32 %0, 2~~ [InstCombine] Transform vector.reduce.add and splat into multiplication Sep 27, 2025

spaits marked this pull request as ready for review September 27, 2025 22:06

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Sep 27, 2025

spaits requested a review from dtcxzyw September 27, 2025 22:06

spaits requested a review from nikic September 27, 2025 22:06

Update comments and move assertion to a more fitting place

a8b32af

XChy reviewed Sep 28, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

spaits added 2 commits September 28, 2025 11:46

Remove redundant power of 2 case

d11a108

Use ConstantInt::get instead of Constant::getIntegerValue

01eb571

spaits requested a review from XChy September 28, 2025 09:57

XChy reviewed Sep 28, 2025

View reviewed changes

llvm/test/Transforms/InstCombine/vector-reductions.ll Show resolved Hide resolved

spaits added 2 commits September 28, 2025 13:33

Add i1 test

0adef1d

More small type tests

d2f235e

nikic reviewed Sep 28, 2025

View reviewed changes

dtcxzyw mentioned this pull request Sep 28, 2025

Fuzz PR161020 dtcxzyw/llvm-mutation-based-fuzz-service#108

Closed

spaits added 4 commits September 28, 2025 17:01

Throw out redundant i1 tests

045f0ef

More consistent test naming

027efe7

Use cast instead of static_cast

8e2c2e5

Use BinaryOperator::CreateMul instead of using Builder and replaceIns…

ff6491b

…tUsesWith

Extend testing

38ca5ce

spaits commented Sep 28, 2025

View reviewed changes

XChy approved these changes Sep 28, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

Remove assertion

78a00dd

nikic approved these changes Sep 28, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

spaits added 2 commits September 28, 2025 20:59

Remove redundant variable and comment

b86f985

Add a vscale test case

2dd6052

spaits added 2 commits September 28, 2025 21:21

Formatting

91e53d4

Rename a variable

8c31848

spaits merged commit d297987 into llvm:main Sep 28, 2025
9 checks passed

spaits mentioned this pull request Sep 28, 2025

[InstCombine] Extend vector.reduce.add and splat transform to scalable vectors #161101

Draft

	static_cast<VectorType *>(Arg->getType())->getElementCount();
	cast<VectorType>(Arg->getType())->getElementCount();

[InstCombine] Transform vector.reduce.add and splat into multiplication #161020

[InstCombine] Transform vector.reduce.add and splat into multiplication #161020

Uh oh!

Conversation

spaits commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Sep 27, 2025

Uh oh!

Uh oh!

Uh oh!

spaits commented Sep 27, 2025

Uh oh!

spaits commented Sep 27, 2025

Uh oh!

llvmbot commented Sep 27, 2025

Uh oh!

Uh oh!

Uh oh!

spaits commented Sep 28, 2025

Uh oh!

Uh oh!

XChy commented Sep 28, 2025

Uh oh!

nikic Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

spaits Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

spaits Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

spaits Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

XChy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[InstCombine] Transform `vector.reduce.add` and `splat` into multiplication #161020

[InstCombine] Transform `vector.reduce.add` and `splat` into multiplication #161020

spaits commented Sep 27, 2025 •

edited

Loading

github-actions bot commented Sep 28, 2025 •

edited

Loading