-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VectorCombine] foldShuffleOfCastops - handle unary shuffles #160009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* main: (1562 commits) Document Policy on supporting newer C++ standard in LLVM codebase (llvm#156823) [MLIR][Transform][SMT] Introduce transform.smt.constrain_params (llvm#159450) Reapply "[compiler-rt] Remove %T from shared object substitutions (llvm#155302)" [NFC] [IndVarSimplify] Add non-overflowing usub test (llvm#159683) [Github] Remove separate tools checkout from pr-code workflows (llvm#159967) [clang] fix using enum redecl in template regression (llvm#159996) [DAG] Skip `mstore` combine for `<1 x ty>` vectors (llvm#159915) [mlir] Expose optional `PatternBenefit` to `func` populate functions (NFC) (llvm#159986) [LV] Set correct costs for interleave group members. [clang] ast-dump: use template pattern for `instantiated_from` (llvm#159952) [ARM] ha-alignstack-call.ll - regenerate test checks (llvm#159988) [LLD][MachO] Silence warning when building with MSVC [llvm][Analysis] Silence warning when building with MSVC [LV] Skip select cost for invariant divisors in legacy cost model. [Clang] Fix an error-recovery crash after d1a80de (llvm#159976) [VPlanPatternMatch] Introduce m_ConstantInt (llvm#159558) [GlobalISel] Add G_ABS computeKnownBits (llvm#154413) [gn build] Port 4cabd1e Reland "[clangd] Add feature modules registry" (llvm#154836) [LV] Also handle non-uniform scalarized loads when processing AddrDefs. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is also causing some tests to timeout, looking into it:
LLVM :: Transforms/PhaseOrdering/X86/blendv-select.ll
LLVM :: Transforms/PhaseOrdering/X86/pr67803.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/PhaseOrdering/X86/shuffle.ll
LLVM :: Transforms/PhaseOrdering/X86/vec-load-combine.ll
LLVM :: Transforms/VectorCombine/AArch64/combine-shuffle-ext.ll
LLVM :: Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
LLVM :: Transforms/VectorCombine/X86/shuffle.ll
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Chaitanya Koparkar (ckoparkar) ChangesFixes #156853. Full diff: https://github.com/llvm/llvm-project/pull/160009.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 0ef933f596604..526a4add2a89a 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -2487,21 +2487,28 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
return false;
+ // Check whether this is a unary shuffle.
+ // TODO: check if this can be extended to match undef or unused values,
+ // perhaps using ShuffleVectorInst::isSingleSource.
+ bool IsBinaryShuffle = !isa<PoisonValue>(V1);
+
auto *C0 = dyn_cast<CastInst>(V0);
auto *C1 = dyn_cast<CastInst>(V1);
- if (!C0 || !C1)
+ if (!C0 || (IsBinaryShuffle && !C1))
return false;
Instruction::CastOps Opcode = C0->getOpcode();
- if (C0->getSrcTy() != C1->getSrcTy())
- return false;
- // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
- if (Opcode != C1->getOpcode()) {
- if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
- Opcode = Instruction::SExt;
- else
+ if (IsBinaryShuffle) {
+ if (C0->getSrcTy() != C1->getSrcTy())
return false;
+ // Handle shuffle(zext_nneg(x), sext(y)) -> sext(shuffle(x,y)) folds.
+ if (Opcode != C1->getOpcode()) {
+ if (match(C0, m_SExtLike(m_Value())) && match(C1, m_SExtLike(m_Value())))
+ Opcode = Instruction::SExt;
+ else
+ return false;
+ }
}
auto *ShuffleDstTy = dyn_cast<FixedVectorType>(I.getType());
@@ -2544,23 +2551,31 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
InstructionCost CostC0 =
TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
TTI::CastContextHint::None, CostKind);
- InstructionCost CostC1 =
- TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
- TTI::CastContextHint::None, CostKind);
- InstructionCost OldCost = CostC0 + CostC1;
- OldCost +=
- TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, ShuffleDstTy,
- CastDstTy, OldMask, CostKind, 0, nullptr, {}, &I);
- InstructionCost NewCost =
- TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, NewShuffleDstTy,
- CastSrcTy, NewMask, CostKind);
+ TargetTransformInfo::ShuffleKind ShuffleKind;
+ if (IsBinaryShuffle)
+ ShuffleKind = TargetTransformInfo::SK_PermuteTwoSrc;
+ else
+ ShuffleKind = TargetTransformInfo::SK_PermuteSingleSrc;
+
+ InstructionCost OldCost = CostC0;
+ OldCost += TTI.getShuffleCost(ShuffleKind, ShuffleDstTy, CastDstTy, OldMask,
+ CostKind, 0, nullptr, {}, &I);
+
+ InstructionCost NewCost = TTI.getShuffleCost(ShuffleKind, NewShuffleDstTy,
+ CastSrcTy, NewMask, CostKind);
NewCost += TTI.getCastInstrCost(Opcode, ShuffleDstTy, NewShuffleDstTy,
TTI::CastContextHint::None, CostKind);
if (!C0->hasOneUse())
NewCost += CostC0;
- if (!C1->hasOneUse())
- NewCost += CostC1;
+ if (IsBinaryShuffle) {
+ InstructionCost CostC1 =
+ TTI.getCastInstrCost(C1->getOpcode(), CastDstTy, CastSrcTy,
+ TTI::CastContextHint::None, CostKind);
+ OldCost += CostC1;
+ if (!C1->hasOneUse())
+ NewCost += CostC1;
+ }
LLVM_DEBUG(dbgs() << "Found a shuffle feeding two casts: " << I
<< "\n OldCost: " << OldCost << " vs NewCost: " << NewCost
@@ -2568,14 +2583,20 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
if (NewCost > OldCost)
return false;
- Value *Shuf = Builder.CreateShuffleVector(C0->getOperand(0),
- C1->getOperand(0), NewMask);
+ Value *Shuf;
+ if (IsBinaryShuffle)
+ Shuf = Builder.CreateShuffleVector(C0->getOperand(0), C1->getOperand(0),
+ NewMask);
+ else
+ Shuf = Builder.CreateShuffleVector(C0->getOperand(0), NewMask);
+
Value *Cast = Builder.CreateCast(Opcode, Shuf, ShuffleDstTy);
// Intersect flags from the old casts.
if (auto *NewInst = dyn_cast<Instruction>(Cast)) {
NewInst->copyIRFlags(C0);
- NewInst->andIRFlags(C1);
+ if (IsBinaryShuffle)
+ NewInst->andIRFlags(C1);
}
Worklist.pushValue(Shuf);
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
index acbc836ffcab0..ed29719d49493 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
@@ -205,8 +205,8 @@ define <8 x i8> @abs_different(<8 x i8> %a) {
define <4 x i32> @poison_intrinsic(<2 x i16> %l256) {
; CHECK-LABEL: @poison_intrinsic(
; CHECK-NEXT: [[L266:%.*]] = call <2 x i16> @llvm.abs.v2i16(<2 x i16> [[L256:%.*]], i1 false)
-; CHECK-NEXT: [[L267:%.*]] = zext <2 x i16> [[L266]] to <2 x i32>
-; CHECK-NEXT: [[L271:%.*]] = shufflevector <2 x i32> [[L267]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT: [[L267:%.*]] = shufflevector <2 x i16> [[L266]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
+; CHECK-NEXT: [[L271:%.*]] = zext <4 x i16> [[L267]] to <4 x i32>
; CHECK-NEXT: ret <4 x i32> [[L271]]
;
%l266 = call <2 x i16> @llvm.abs.v2i16(<2 x i16> %l256, i1 false)
@@ -534,9 +534,9 @@ define <4 x i64> @single_zext(<4 x i32> %x) {
define <4 x i64> @not_zext(<4 x i32> %x) {
; CHECK-LABEL: @not_zext(
-; CHECK-NEXT: [[ZEXT:%.*]] = zext <4 x i32> [[X:%.*]] to <4 x i64>
-; CHECK-NEXT: [[REVSHUF:%.*]] = shufflevector <4 x i64> [[ZEXT]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: ret <4 x i64> [[REVSHUF]]
+; CHECK-NEXT: [[REVSHUF:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: [[ZEXT:%.*]] = zext <4 x i32> [[REVSHUF:%.*]] to <4 x i64>
+; CHECK-NEXT: ret <4 x i64> [[ZEXT]]
;
%zext = zext <4 x i32> %x to <4 x i64>
%revshuf = shufflevector <4 x i64> %zext, <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
@@ -922,10 +922,9 @@ define <4 x i8> @singleop(<4 x i8> %a, <4 x i8> %b) {
define <4 x i64> @cast_mismatched_types(<4 x i32> %x) {
; CHECK-LABEL: @cast_mismatched_types(
-; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <4 x i32> [[X:%.*]], <4 x i32> poison, <2 x i32> <i32 0, i32 2>
-; CHECK-NEXT: [[ZEXT:%.*]] = zext <2 x i32> [[SHUF]] to <2 x i64>
-; CHECK-NEXT: [[EXTSHUF:%.*]] = shufflevector <2 x i64> [[ZEXT]], <2 x i64> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
-; CHECK-NEXT: ret <4 x i64> [[EXTSHUF]]
+; CHECK-SAME: <4 x i32> [[X:%.*]]) {
+; CHECK-NEXT: [[ZEXT:%.*]] = zext <4 x i32> [[X]] to <4 x i64>
+; CHECK-NEXT: ret <4 x i64> [[ZEXT]]
;
%shuf = shufflevector <4 x i32> %x, <4 x i32> poison, <2 x i32> <i32 0, i32 2>
%zext = zext <2 x i32> %shuf to <2 x i64>
|
/cc @RKSimon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be stuck in a loop with foldBitcastShuffle?
You're absolutely right, foldBitcastOfShuffle and foldShuffleOfCastops keep changing the same two instructions back and forth and are stuck in an infinite loop. What would your suggestion be to fix this? I don't have a feel for which of these folds is preferable. I could add a check in foldShuffleOfCastops to skip the fold if the instruction is a unary shuffle followed by a bitcast: |
The CI is green now. Things to do on this:
|
157b8f3
to
e4b329f
Compare
✅ With the latest revision this PR passed the undef deprecator. |
Oops, okay I'll remove the test. |
* main: (502 commits) GlobalISel: Adjust insert point when expanding G_[SU]DIVREM (llvm#160683) [LV] Add coverage for fixing-up scalar resume values (llvm#160492) AMDGPU: Convert wave_any test to use update_mc_test_checks [LV] Add partial reduction tests multiplying extend with constants. Revert "[MLIR] Implement remark emitting policies in MLIR" (llvm#160681) [NFC][InstSimplify] Refactor fminmax-folds.ll test (llvm#160504) [LoongArch] Pre-commit tests for [x]vldi instructions with special constant splats (llvm#159228) [BOLT] Fix dwarf5-dwoid-no-dwoname.s (llvm#160676) [lldb][test] Refactor and expand TestMemoryRegionDirtyPages.py (llvm#156035) [gn build] Port 833d5f0 AMDGPU: Ensure both wavesize features are not set (llvm#159234) [LoopInterchange] Bail out when finding a dependency with all `*` elements (llvm#149049) [libc++] Avoid constructing additional objects when using map::at (llvm#157866) [lldb][test] Make hex prefix optional in DWARF union types test [X86] Add missing prefixes to trunc-sat tests (llvm#160662) [AMDGPU] Fix vector legalization for bf16 valu ops (llvm#158439) [LoongArch][NFC] Pre-commit tests for `[x]vadda.{b/h/w/d}` [mlir][tosa] Relax constraint on matmul verifier requiring equal operand types (llvm#155799) [clang][Sema] Accept gnu format attributes (llvm#160255) [LoongArch][NFC] Add tests for element extraction from binary add operation (llvm#159725) ...
The test I do think the failure is unrelated to this patch; the test only runs Edit: I spoke too soon, the test didn't fail! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers
Thanks for your help on this. Could you please merge too? I don't have commit access yet, I just opened an issue for it. |
Fixes #156853.