[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009

ckoparkar · 2025-09-21T20:19:32Z

* main: (1562 commits) Document Policy on supporting newer C++ standard in LLVM codebase (llvm#156823) [MLIR][Transform][SMT] Introduce transform.smt.constrain_params (llvm#159450) Reapply "[compiler-rt] Remove %T from shared object substitutions (llvm#155302)" [NFC] [IndVarSimplify] Add non-overflowing usub test (llvm#159683) [Github] Remove separate tools checkout from pr-code workflows (llvm#159967) [clang] fix using enum redecl in template regression (llvm#159996) [DAG] Skip `mstore` combine for `<1 x ty>` vectors (llvm#159915) [mlir] Expose optional `PatternBenefit` to `func` populate functions (NFC) (llvm#159986) [LV] Set correct costs for interleave group members. [clang] ast-dump: use template pattern for `instantiated_from` (llvm#159952) [ARM] ha-alignstack-call.ll - regenerate test checks (llvm#159988) [LLD][MachO] Silence warning when building with MSVC [llvm][Analysis] Silence warning when building with MSVC [LV] Skip select cost for invariant divisors in legacy cost model. [Clang] Fix an error-recovery crash after d1a80de (llvm#159976) [VPlanPatternMatch] Introduce m_ConstantInt (llvm#159558) [GlobalISel] Add G_ABS computeKnownBits (llvm#154413) [gn build] Port 4cabd1e Reland "[clangd] Add feature modules registry" (llvm#154836) [LV] Also handle non-uniform scalarized loads when processing AddrDefs. ...

ckoparkar

This patch is also causing some tests to timeout, looking into it:

LLVM :: Transforms/PhaseOrdering/X86/blendv-select.ll
LLVM :: Transforms/PhaseOrdering/X86/pr67803.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle.ll
LLVM :: Transforms/PhaseOrdering/X86/vec-load-combine.ll
LLVM :: Transforms/VectorCombine/AArch64/combine-shuffle-ext.ll
LLVM :: Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/VectorCombine/X86/shuffle.ll

llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll

llvmbot · 2025-09-23T13:14:21Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Chaitanya Koparkar (ckoparkar)

Changes

Fixes #156853.

Full diff: https://github.com/llvm/llvm-project/pull/160009.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+44-23)
(modified) llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll (+8-9)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 0ef933f596604..526a4add2a89a 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -2487,21 +2487,28 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
     return false;
 
+  // Check whether this is a unary shuffle.
+  // TODO: check if this can be extended to match undef or unused values,
+  // perhaps using ShuffleVectorInst::isSingleSource.
+  bool IsBinaryShuffle = !isa<PoisonValue>(V1);
+
   auto *C0 = dyn_cast<CastInst>(V0);
   auto *C1 = dyn_cast<CastInst>(V1);
-  if (!C0 || !C1)
+  if (!C0 || (IsBinaryShuffle && !C1))
     return false;
 
   Instruction::CastOps Opcode = C0->getOpcode();
-  if (C0->getSrcTy() != C1->getSrcTy())
-    return false;
 
-  // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
-  if (Opcode != C1->getOpcode()) {
-    if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
-      Opcode = Instruction::SExt;
-    else
+  if (IsBinaryShuffle) {
+    if (C0->getSrcTy() != C1->getSrcTy())
       return false;
+    // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
+    if (Opcode != C1->getOpcode()) {
+      if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
+        Opcode = Instruction::SExt;
+      else
+        return false;
+    }
   }
 
   auto *ShuffleDstTy = dyn_cast<FixedVectorType>(I.getType());
@@ -2544,23 +2551,31 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
-  InstructionCost CostC1 =
-      TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
-                           TTI::CastContextHint::None, CostKind);
-  InstructionCost OldCost = CostC0 + CostC1;
-  OldCost +=
-      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, ShuffleDstTy,
-                         CastDstTy, OldMask, CostKind, 0, nullptr, {}, &I);
 
-  InstructionCost NewCost =
-      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, NewShuffleDstTy,
-                         CastSrcTy, NewMask, CostKind);
+  TargetTransformInfo::ShuffleKind ShuffleKind;
+  if (IsBinaryShuffle)
+    ShuffleKind = TargetTransformInfo::SK_PermuteTwoSrc;
+  else
+    ShuffleKind = TargetTransformInfo::SK_PermuteSingleSrc;
+
+  InstructionCost OldCost = CostC0;
+  OldCost += TTI.getShuffleCost(ShuffleKind, ShuffleDstTy, CastDstTy, OldMask,
+                                CostKind, 0, nullptr, {}, &I);
+
+  InstructionCost NewCost = TTI.getShuffleCost(ShuffleKind, NewShuffleDstTy,
+                                               CastSrcTy, NewMask, CostKind);
   NewCost += TTI.getCastInstrCost(Opcode, ShuffleDstTy, NewShuffleDstTy,
                                   TTI::CastContextHint::None, CostKind);
   if (!C0->hasOneUse())
     NewCost += CostC0;
-  if (!C1->hasOneUse())
-    NewCost += CostC1;
+  if (IsBinaryShuffle) {
+    InstructionCost CostC1 =
+        TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
+                             TTI::CastContextHint::None, CostKind);
+    OldCost += CostC1;
+    if (!C1->hasOneUse())
+      NewCost += CostC1;
+  }
 
   LLVM_DEBUG(dbgs() << "Found a shuffle feeding two casts: " << I
                     << "\n  OldCost: " << OldCost << " vs NewCost: " << NewCost
@@ -2568,14 +2583,20 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   if (NewCost > OldCost)
     return false;
 
-  Value *Shuf = Builder.CreateShuffleVector(C0->getOperand(0),
-                                            C1->getOperand(0), NewMask);
+  Value *Shuf;
+  if (IsBinaryShuffle)
+    Shuf = Builder.CreateShuffleVector(C0->getOperand(0), C1->getOperand(0),
+                                       NewMask);
+  else
+    Shuf = Builder.CreateShuffleVector(C0->getOperand(0), NewMask);
+
   Value *Cast = Builder.CreateCast(Opcode, Shuf, ShuffleDstTy);
 
   // Intersect flags from the old casts.
   if (auto *NewInst = dyn_cast<Instruction>(Cast)) {
     NewInst->copyIRFlags(C0);
-    NewInst->andIRFlags(C1);
+    if (IsBinaryShuffle)
+      NewInst->andIRFlags(C1);
   }
 
   Worklist.pushValue(Shuf);
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
index acbc836ffcab0..ed29719d49493 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
@@ -205,8 +205,8 @@ define <8 x i8> @abs_different(<8 x i8> %a) {
 define <4 x i32> @poison_intrinsic(<2 x i16> %l256) {
 ; CHECK-LABEL: @poison_intrinsic(
 ; CHECK-NEXT:    [[L266:%.*]] = call <2 x i16> @llvm.abs.v2i16(<2 x i16> [[L256:%.*]], i1 false)
-; CHECK-NEXT:    [[L267:%.*]] = zext <2 x i16> [[L266]] to <2 x i32>
-; CHECK-NEXT:    [[L271:%.*]] = shufflevector <2 x i32> [[L267]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT:    [[L267:%.*]] = shufflevector <2 x i16> [[L266]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT:    [[L271:%.*]] = zext <4 x i16> [[L267]] to <4 x i32>
 ; CHECK-NEXT:    ret <4 x i32> [[L271]]
 ;
   %l266 = call <2 x i16> @llvm.abs.v2i16(<2 x i16> %l256, i1 false)
@@ -534,9 +534,9 @@ define <4 x i64> @single_zext(<4 x i32> %x) {
 
 define <4 x i64> @not_zext(<4 x i32> %x) {
 ; CHECK-LABEL: @not_zext(
-; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[X:%.*]] to <4 x i64>
-; CHECK-NEXT:    [[REVSHUF:%.*]] = shufflevector <4 x i64> [[ZEXT]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT:    ret <4 x i64> [[REVSHUF]]
+; CHECK-NEXT:    [[REVSHUF:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[REVSHUF:%.*]] to <4 x i64>
+; CHECK-NEXT:    ret <4 x i64> [[ZEXT]]
 ;
   %zext = zext <4 x i32> %x to <4 x i64>
   %revshuf = shufflevector <4 x i64> %zext, <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
@@ -922,10 +922,9 @@ define <4 x i8> @singleop(<4 x i8> %a, <4 x i8> %b) {
 
 define <4 x i64> @cast_mismatched_types(<4 x i32> %x) {
 ; CHECK-LABEL: @cast_mismatched_types(
-; CHECK-NEXT:    [[SHUF:%.*]] = shufflevector <4 x i32> [[X:%.*]], <4 x i32> poison, <2 x i32> <i32 0, i32 2>
-; CHECK-NEXT:    [[ZEXT:%.*]] = zext <2 x i32> [[SHUF]] to <2 x i64>
-; CHECK-NEXT:    [[EXTSHUF:%.*]] = shufflevector <2 x i64> [[ZEXT]], <2 x i64> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
-; CHECK-NEXT:    ret <4 x i64> [[EXTSHUF]]
+; CHECK-SAME: <4 x i32> [[X:%.*]]) {
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[X]] to <4 x i64>
+; CHECK-NEXT:    ret <4 x i64> [[ZEXT]]
 ;
   %shuf = shufflevector <4 x i32> %x, <4 x i32> poison, <2 x i32> <i32 0, i32 2>
   %zext = zext <2 x i32> %shuf to <2 x i64>

ckoparkar · 2025-09-23T13:14:22Z

/cc @RKSimon

RKSimon

You might be stuck in a loop with foldBitcastShuffle?

ckoparkar · 2025-09-24T11:22:13Z

You might be stuck in a loop with foldBitcastShuffle?

You're absolutely right, foldBitcastOfShuffle and foldShuffleOfCastops keep changing the same two instructions back and forth and are stuck in an infinite loop.

What would your suggestion be to fix this? I don't have a feel for which of these folds is preferable. I could add a check in foldShuffleOfCastops to skip the fold if the instruction is a unary shuffle followed by a bitcast: shuffle X, poison, mask ; bitcast.

ckoparkar · 2025-09-24T13:47:52Z

The CI is green now. Things to do on this:

Whether bool IsBinaryShuffle = !isa<PoisonValue>(V1); is okay or should we also check if the second argument is undef/unused. Trying out isUndefValue and ShuffleVectorInst::isSingleSource.
Whether the loop breaking condition is okay (!IsBinaryShuffle && Opcode == Instruction::BitCast).
There are already a few tests that show the fold working, but I can add some more to the correct file (shuffle-of-casts.ll). Trying out different examples to see how this works.

github-actions · 2025-09-25T11:18:04Z

✅ With the latest revision this PR passed the undef deprecator.

ckoparkar · 2025-09-25T11:22:56Z

⚠️ undef deprecator found issues in your code. ⚠️
You can test this locally with the following command:
...
In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

Oops, okay I'll remove the test.

* main: (502 commits) GlobalISel: Adjust insert point when expanding G_[SU]DIVREM (llvm#160683) [LV] Add coverage for fixing-up scalar resume values (llvm#160492) AMDGPU: Convert wave_any test to use update_mc_test_checks [LV] Add partial reduction tests multiplying extend with constants. Revert "[MLIR] Implement remark emitting policies in MLIR" (llvm#160681) [NFC][InstSimplify] Refactor fminmax-folds.ll test (llvm#160504) [LoongArch] Pre-commit tests for [x]vldi instructions with special constant splats (llvm#159228) [BOLT] Fix dwarf5-dwoid-no-dwoname.s (llvm#160676) [lldb][test] Refactor and expand TestMemoryRegionDirtyPages.py (llvm#156035) [gn build] Port 833d5f0 AMDGPU: Ensure both wavesize features are not set (llvm#159234) [LoopInterchange] Bail out when finding a dependency with all `*` elements (llvm#149049) [libc++] Avoid constructing additional objects when using map::at (llvm#157866) [lldb][test] Make hex prefix optional in DWARF union types test [X86] Add missing prefixes to trunc-sat tests (llvm#160662) [AMDGPU] Fix vector legalization for bf16 valu ops (llvm#158439) [LoongArch][NFC] Pre-commit tests for `[x]vadda.{b/h/w/d}` [mlir][tosa] Relax constraint on matmul verifier requiring equal operand types (llvm#155799) [clang][Sema] Accept gnu format attributes (llvm#160255) [LoongArch][NFC] Add tests for element extraction from binary add operation (llvm#159725) ...

ckoparkar · 2025-09-25T12:24:13Z

The test llvm/test/Transforms/LoopVectorize/AArch64/epilogue-vectorization-fix-scalar-resume-values.ll is failing in CI. I merged main but the test fails for me locally as well, so I expect this run to fail.

I do think the failure is unrelated to this patch; the test only runs -passes=loop-vectorize and not -passes=vector-combine and I don't see anything relevant in the debug output.

Edit: I spoke too soon, the test didn't fail!

RKSimon

LGTM - cheers

ckoparkar · 2025-09-29T13:18:23Z

Thanks for your help on this. Could you please merge too? I don't have commit access yet, I just opened an issue for it.

…0009) Fixes llvm#156853.

ckoparkar added 3 commits September 21, 2025 16:18

[VectorCombine] foldShuffleOfCastops - handle unary shuffles

ed4c1df

Fix shuffletoidentity test

8ae9816

ckoparkar commented Sep 23, 2025

View reviewed changes

llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll Show resolved Hide resolved

ckoparkar marked this pull request as ready for review September 23, 2025 13:13

llvmbot added vectorizers llvm:transforms llvm:vectorcombine labels Sep 23, 2025

RKSimon self-requested a review September 23, 2025 13:54

RKSimon reviewed Sep 23, 2025

View reviewed changes

Break loop with foldBitcastOfShuffle

8c27278

Add tests, remove comment

e4b329f

ckoparkar force-pushed the ckoparkar/156853 branch from 157b8f3 to e4b329f Compare September 25, 2025 11:15

ckoparkar added 2 commits September 25, 2025 07:23

Remove test that uses undef

4f5f5da

ckoparkar requested a review from RKSimon September 25, 2025 12:24

RKSimon approved these changes Sep 29, 2025

View reviewed changes

RKSimon merged commit 766c90f into llvm:main Sep 29, 2025
9 checks passed

ckoparkar deleted the ckoparkar/156853 branch September 29, 2025 13:46

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025

[VectorCombine] foldShuffleOfCastops - handle unary shuffles (llvm#16…

0caec4b

…0009) Fixes llvm#156853.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009

[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009

ckoparkar commented Sep 21, 2025

Uh oh!

ckoparkar left a comment

Uh oh!

Uh oh!

llvmbot commented Sep 23, 2025 •

edited

Loading

Uh oh!

ckoparkar commented Sep 23, 2025

Uh oh!

RKSimon left a comment

Uh oh!

ckoparkar commented Sep 24, 2025

Uh oh!

ckoparkar commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

ckoparkar commented Sep 25, 2025

Uh oh!

ckoparkar commented Sep 25, 2025 •

edited

Loading

Uh oh!

RKSimon left a comment

Uh oh!

ckoparkar commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009

[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009

Conversation

ckoparkar commented Sep 21, 2025

Uh oh!

ckoparkar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ckoparkar commented Sep 23, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

ckoparkar commented Sep 24, 2025

Uh oh!

ckoparkar commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ckoparkar commented Sep 25, 2025

Uh oh!

ckoparkar commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

ckoparkar commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Sep 23, 2025 •

edited

Loading

github-actions bot commented Sep 25, 2025 •

edited

Loading

ckoparkar commented Sep 25, 2025 •

edited

Loading