Skip to content

Conversation

ckoparkar
Copy link
Contributor

Fixes #156853.

* main: (1562 commits)
  Document Policy on supporting newer C++ standard in LLVM codebase (llvm#156823)
  [MLIR][Transform][SMT] Introduce transform.smt.constrain_params (llvm#159450)
  Reapply "[compiler-rt] Remove %T from shared object substitutions (llvm#155302)"
  [NFC] [IndVarSimplify] Add non-overflowing usub test (llvm#159683)
  [Github] Remove separate tools checkout from pr-code workflows (llvm#159967)
  [clang] fix using enum redecl in template regression (llvm#159996)
  [DAG] Skip `mstore` combine for `<1 x ty>` vectors (llvm#159915)
  [mlir] Expose optional `PatternBenefit` to `func` populate functions (NFC) (llvm#159986)
  [LV] Set correct costs for interleave group members.
  [clang] ast-dump: use template pattern for `instantiated_from` (llvm#159952)
  [ARM] ha-alignstack-call.ll - regenerate test checks (llvm#159988)
  [LLD][MachO] Silence warning when building with MSVC
  [llvm][Analysis] Silence warning when building with MSVC
  [LV] Skip select cost for invariant divisors in legacy cost model.
  [Clang] Fix an error-recovery crash after d1a80de (llvm#159976)
  [VPlanPatternMatch] Introduce m_ConstantInt (llvm#159558)
  [GlobalISel] Add G_ABS computeKnownBits (llvm#154413)
  [gn build] Port 4cabd1e
  Reland "[clangd] Add feature modules registry" (llvm#154836)
  [LV] Also handle non-uniform scalarized loads when processing AddrDefs.
  ...
Copy link
Contributor Author

@ckoparkar ckoparkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is also causing some tests to timeout, looking into it:

LLVM :: Transforms/PhaseOrdering/X86/blendv-select.ll
LLVM :: Transforms/PhaseOrdering/X86/pr67803.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle.ll
LLVM :: Transforms/PhaseOrdering/X86/vec-load-combine.ll
LLVM :: Transforms/VectorCombine/AArch64/combine-shuffle-ext.ll
LLVM :: Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/VectorCombine/X86/shuffle.ll

@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Chaitanya Koparkar (ckoparkar)

Changes

Fixes #156853.


Full diff: https://github.com/llvm/llvm-project/pull/160009.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+44-23)
  • (modified) llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll (+8-9)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 0ef933f596604..526a4add2a89a 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -2487,21 +2487,28 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
     return false;
 
+  // Check whether this is a unary shuffle.
+  // TODO: check if this can be extended to match undef or unused values,
+  // perhaps using ShuffleVectorInst::isSingleSource.
+  bool IsBinaryShuffle = !isa<PoisonValue>(V1);
+
   auto *C0 = dyn_cast<CastInst>(V0);
   auto *C1 = dyn_cast<CastInst>(V1);
-  if (!C0 || !C1)
+  if (!C0 || (IsBinaryShuffle && !C1))
     return false;
 
   Instruction::CastOps Opcode = C0->getOpcode();
-  if (C0->getSrcTy() != C1->getSrcTy())
-    return false;
 
-  // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
-  if (Opcode != C1->getOpcode()) {
-    if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
-      Opcode = Instruction::SExt;
-    else
+  if (IsBinaryShuffle) {
+    if (C0->getSrcTy() != C1->getSrcTy())
       return false;
+    // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
+    if (Opcode != C1->getOpcode()) {
+      if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
+        Opcode = Instruction::SExt;
+      else
+        return false;
+    }
   }
 
   auto *ShuffleDstTy = dyn_cast<FixedVectorType>(I.getType());
@@ -2544,23 +2551,31 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
-  InstructionCost CostC1 =
-      TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
-                           TTI::CastContextHint::None, CostKind);
-  InstructionCost OldCost = CostC0 + CostC1;
-  OldCost +=
-      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, ShuffleDstTy,
-                         CastDstTy, OldMask, CostKind, 0, nullptr, {}, &I);
 
-  InstructionCost NewCost =
-      TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, NewShuffleDstTy,
-                         CastSrcTy, NewMask, CostKind);
+  TargetTransformInfo::ShuffleKind ShuffleKind;
+  if (IsBinaryShuffle)
+    ShuffleKind = TargetTransformInfo::SK_PermuteTwoSrc;
+  else
+    ShuffleKind = TargetTransformInfo::SK_PermuteSingleSrc;
+
+  InstructionCost OldCost = CostC0;
+  OldCost += TTI.getShuffleCost(ShuffleKind, ShuffleDstTy, CastDstTy, OldMask,
+                                CostKind, 0, nullptr, {}, &I);
+
+  InstructionCost NewCost = TTI.getShuffleCost(ShuffleKind, NewShuffleDstTy,
+                                               CastSrcTy, NewMask, CostKind);
   NewCost += TTI.getCastInstrCost(Opcode, ShuffleDstTy, NewShuffleDstTy,
                                   TTI::CastContextHint::None, CostKind);
   if (!C0->hasOneUse())
     NewCost += CostC0;
-  if (!C1->hasOneUse())
-    NewCost += CostC1;
+  if (IsBinaryShuffle) {
+    InstructionCost CostC1 =
+        TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
+                             TTI::CastContextHint::None, CostKind);
+    OldCost += CostC1;
+    if (!C1->hasOneUse())
+      NewCost += CostC1;
+  }
 
   LLVM_DEBUG(dbgs() << "Found a shuffle feeding two casts: " << I
                     << "\n  OldCost: " << OldCost << " vs NewCost: " << NewCost
@@ -2568,14 +2583,20 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   if (NewCost > OldCost)
     return false;
 
-  Value *Shuf = Builder.CreateShuffleVector(C0->getOperand(0),
-                                            C1->getOperand(0), NewMask);
+  Value *Shuf;
+  if (IsBinaryShuffle)
+    Shuf = Builder.CreateShuffleVector(C0->getOperand(0), C1->getOperand(0),
+                                       NewMask);
+  else
+    Shuf = Builder.CreateShuffleVector(C0->getOperand(0), NewMask);
+
   Value *Cast = Builder.CreateCast(Opcode, Shuf, ShuffleDstTy);
 
   // Intersect flags from the old casts.
   if (auto *NewInst = dyn_cast<Instruction>(Cast)) {
     NewInst->copyIRFlags(C0);
-    NewInst->andIRFlags(C1);
+    if (IsBinaryShuffle)
+      NewInst->andIRFlags(C1);
   }
 
   Worklist.pushValue(Shuf);
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
index acbc836ffcab0..ed29719d49493 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
@@ -205,8 +205,8 @@ define <8 x i8> @abs_different(<8 x i8> %a) {
 define <4 x i32> @poison_intrinsic(<2 x i16> %l256) {
 ; CHECK-LABEL: @poison_intrinsic(
 ; CHECK-NEXT:    [[L266:%.*]] = call <2 x i16> @llvm.abs.v2i16(<2 x i16> [[L256:%.*]], i1 false)
-; CHECK-NEXT:    [[L267:%.*]] = zext <2 x i16> [[L266]] to <2 x i32>
-; CHECK-NEXT:    [[L271:%.*]] = shufflevector <2 x i32> [[L267]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT:    [[L267:%.*]] = shufflevector <2 x i16> [[L266]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT:    [[L271:%.*]] = zext <4 x i16> [[L267]] to <4 x i32>
 ; CHECK-NEXT:    ret <4 x i32> [[L271]]
 ;
   %l266 = call <2 x i16> @llvm.abs.v2i16(<2 x i16> %l256, i1 false)
@@ -534,9 +534,9 @@ define <4 x i64> @single_zext(<4 x i32> %x) {
 
 define <4 x i64> @not_zext(<4 x i32> %x) {
 ; CHECK-LABEL: @not_zext(
-; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[X:%.*]] to <4 x i64>
-; CHECK-NEXT:    [[REVSHUF:%.*]] = shufflevector <4 x i64> [[ZEXT]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT:    ret <4 x i64> [[REVSHUF]]
+; CHECK-NEXT:    [[REVSHUF:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[REVSHUF:%.*]] to <4 x i64>
+; CHECK-NEXT:    ret <4 x i64> [[ZEXT]]
 ;
   %zext = zext <4 x i32> %x to <4 x i64>
   %revshuf = shufflevector <4 x i64> %zext, <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
@@ -922,10 +922,9 @@ define <4 x i8> @singleop(<4 x i8> %a, <4 x i8> %b) {
 
 define <4 x i64> @cast_mismatched_types(<4 x i32> %x) {
 ; CHECK-LABEL: @cast_mismatched_types(
-; CHECK-NEXT:    [[SHUF:%.*]] = shufflevector <4 x i32> [[X:%.*]], <4 x i32> poison, <2 x i32> <i32 0, i32 2>
-; CHECK-NEXT:    [[ZEXT:%.*]] = zext <2 x i32> [[SHUF]] to <2 x i64>
-; CHECK-NEXT:    [[EXTSHUF:%.*]] = shufflevector <2 x i64> [[ZEXT]], <2 x i64> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
-; CHECK-NEXT:    ret <4 x i64> [[EXTSHUF]]
+; CHECK-SAME: <4 x i32> [[X:%.*]]) {
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext <4 x i32> [[X]] to <4 x i64>
+; CHECK-NEXT:    ret <4 x i64> [[ZEXT]]
 ;
   %shuf = shufflevector <4 x i32> %x, <4 x i32> poison, <2 x i32> <i32 0, i32 2>
   %zext = zext <2 x i32> %shuf to <2 x i64>

@ckoparkar
Copy link
Contributor Author

/cc @RKSimon

@RKSimon RKSimon self-requested a review September 23, 2025 13:54
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be stuck in a loop with foldBitcastShuffle?

@ckoparkar
Copy link
Contributor Author

You might be stuck in a loop with foldBitcastShuffle?

You're absolutely right, foldBitcastOfShuffle and foldShuffleOfCastops keep changing the same two instructions back and forth and are stuck in an infinite loop.

What would your suggestion be to fix this? I don't have a feel for which of these folds is preferable. I could add a check in foldShuffleOfCastops to skip the fold if the instruction is a unary shuffle followed by a bitcast: shuffle X, poison, mask ; bitcast.

@ckoparkar
Copy link
Contributor Author

The CI is green now. Things to do on this:

  1. Whether bool IsBinaryShuffle = !isa<PoisonValue>(V1); is okay or should we also check if the second argument is undef/unused. Trying out isUndefValue and ShuffleVectorInst::isSingleSource.
  2. Whether the loop breaking condition is okay (!IsBinaryShuffle && Opcode == Instruction::BitCast).
  3. There are already a few tests that show the fold working, but I can add some more to the correct file (shuffle-of-casts.ll). Trying out different examples to see how this works.

Copy link

github-actions bot commented Sep 25, 2025

✅ With the latest revision this PR passed the undef deprecator.

@ckoparkar
Copy link
Contributor Author

⚠️ undef deprecator found issues in your code. ⚠️
You can test this locally with the following command:
...
In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

Oops, okay I'll remove the test.

* main: (502 commits)
  GlobalISel: Adjust insert point when expanding G_[SU]DIVREM  (llvm#160683)
  [LV] Add coverage for fixing-up scalar resume values (llvm#160492)
  AMDGPU: Convert wave_any test to use update_mc_test_checks
  [LV] Add partial reduction tests multiplying extend with constants.
  Revert "[MLIR] Implement remark emitting policies in MLIR" (llvm#160681)
  [NFC][InstSimplify] Refactor fminmax-folds.ll test (llvm#160504)
  [LoongArch] Pre-commit tests for [x]vldi instructions with special constant splats (llvm#159228)
  [BOLT] Fix dwarf5-dwoid-no-dwoname.s (llvm#160676)
  [lldb][test] Refactor and expand TestMemoryRegionDirtyPages.py (llvm#156035)
  [gn build] Port 833d5f0
  AMDGPU: Ensure both wavesize features are not set (llvm#159234)
  [LoopInterchange] Bail out when finding a dependency with all `*` elements (llvm#149049)
  [libc++] Avoid constructing additional objects when using map::at (llvm#157866)
  [lldb][test] Make hex prefix optional in DWARF union types test
  [X86] Add missing prefixes to trunc-sat tests (llvm#160662)
  [AMDGPU] Fix vector legalization for bf16 valu ops (llvm#158439)
  [LoongArch][NFC] Pre-commit tests for `[x]vadda.{b/h/w/d}`
  [mlir][tosa] Relax constraint on matmul verifier requiring equal operand types (llvm#155799)
  [clang][Sema] Accept gnu format attributes (llvm#160255)
  [LoongArch][NFC] Add tests for element extraction from binary add operation (llvm#159725)
  ...
@ckoparkar
Copy link
Contributor Author

ckoparkar commented Sep 25, 2025

The test llvm/test/Transforms/LoopVectorize/AArch64/epilogue-vectorization-fix-scalar-resume-values.ll is failing in CI. I merged main but the test fails for me locally as well, so I expect this run to fail.

I do think the failure is unrelated to this patch; the test only runs -passes=loop-vectorize and not -passes=vector-combine and I don't see anything relevant in the debug output.

Edit: I spoke too soon, the test didn't fail!

@ckoparkar ckoparkar requested a review from RKSimon September 25, 2025 12:24
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

@ckoparkar
Copy link
Contributor Author

Thanks for your help on this. Could you please merge too? I don't have commit access yet, I just opened an issue for it.

@RKSimon RKSimon merged commit 766c90f into llvm:main Sep 29, 2025
9 checks passed
@ckoparkar ckoparkar deleted the ckoparkar/156853 branch September 29, 2025 13:46
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VectorCombine] foldShuffleOfCastops - failure to handle unary shuffles to fold shuffle(cast(x), poison) -> cast(shuffle(x,poison))
3 participants