Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLP] Delete vectorized users when tree contains an invalid cost #86344

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

patrick-rivos
Copy link
Contributor

If a tree is partially vectorized and contains an unvectorizable op, the users of the partially vectorized subtree will not be marked for deletion/cleaned up. With this patch the cleanup code is triggered even when an invalid cost is encountered.

If a tree is partially vectorized and contains an unvectorizable op, the
users of the partially vectorized subtree will not be marked for
deletion/cleaned up. With this patch the cleanup code is triggered
even when an invalid cost is encountered.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 22, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Patrick O'Neill (patrick-rivos)

Changes

If a tree is partially vectorized and contains an unvectorizable op, the users of the partially vectorized subtree will not be marked for deletion/cleaned up. With this patch the cleanup code is triggered even when an invalid cost is encountered.


Full diff: https://github.com/llvm/llvm-project/pull/86344.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+1-1)
  • (added) llvm/test/Transforms/SLPVectorizer/RISCV/partial-vec-invalid-cost.ll (+57)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 0f7afa2fc25c29..f98d15c285a693 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -15959,7 +15959,7 @@ class HorizontalReduction {
         LLVM_DEBUG(dbgs() << "SLP: Found cost = " << Cost
                           << " for reduction\n");
         if (!Cost.isValid())
-          return nullptr;
+          break;
         if (Cost >= -SLPCostThreshold) {
           V.getORE()->emit([&]() {
             return OptimizationRemarkMissed(
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/partial-vec-invalid-cost.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/partial-vec-invalid-cost.ll
new file mode 100644
index 00000000000000..31f16801b7a64e
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/partial-vec-invalid-cost.ll
@@ -0,0 +1,57 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes=slp-vectorizer -S | FileCheck %s
+
+target triple = "riscv64-unknown-linux-gnu"
+
+define void @partial_vec_invalid_cost() #0 {
+; CHECK-LABEL: define void @partial_vec_invalid_cost(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[LSHR_1:%.*]] = lshr i96 0, 0
+; CHECK-NEXT:    [[LSHR_2:%.*]] = lshr i96 0, 0
+; CHECK-NEXT:    [[TRUNC_I96_1:%.*]] = trunc i96 [[LSHR_1]] to i32
+; CHECK-NEXT:    [[TRUNC_I96_2:%.*]] = trunc i96 [[LSHR_2]] to i32
+; CHECK-NEXT:    [[TRUNC_I96_3:%.*]] = trunc i96 0 to i32
+; CHECK-NEXT:    [[TRUNC_I96_4:%.*]] = trunc i96 0 to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> zeroinitializer)
+; CHECK-NEXT:    [[OP_RDX:%.*]] = or i32 [[TMP0]], [[TRUNC_I96_2]]
+; CHECK-NEXT:    [[OP_RDX1:%.*]] = or i32 [[TRUNC_I96_1]], [[TRUNC_I96_3]]
+; CHECK-NEXT:    [[OP_RDX2:%.*]] = or i32 [[OP_RDX]], [[OP_RDX1]]
+; CHECK-NEXT:    [[OP_RDX3:%.*]] = or i32 [[OP_RDX2]], [[TRUNC_I96_4]]
+; CHECK-NEXT:    [[STORE_THIS:%.*]] = zext i32 [[OP_RDX3]] to i96
+; CHECK-NEXT:    store i96 [[STORE_THIS]], ptr null, align 16
+; CHECK-NEXT:    ret void
+;
+entry:
+
+  %lshr.1 = lshr i96 0, 0 ; These ops
+  %lshr.2 = lshr i96 0, 0 ; return an
+  %add.0 = add i96 0, 0   ; invalid
+  %add.1 = add i96 0, 0   ; vector cost.
+
+  %trunc.i96.1 = trunc i96 %lshr.1 to i32 ; These ops
+  %trunc.i96.2 = trunc i96 %lshr.2 to i32 ; return an
+  %trunc.i96.3 = trunc i96 %add.0 to i32  ; invalid
+  %trunc.i96.4 = trunc i96 %add.1 to i32  ; vector cost.
+
+  %or.0 = or i32 %trunc.i96.1, %trunc.i96.2
+  %or.1 = or i32 %or.0, %trunc.i96.3
+  %or.2 = or i32 %or.1, %trunc.i96.4
+
+  %zext.0 = zext i1 0 to i32 ; These
+  %zext.1 = zext i1 0 to i32 ; ops
+  %zext.2 = zext i1 0 to i32 ; are
+  %zext.3 = zext i1 0 to i32 ; vectorized
+
+  %or.3 = or i32 %or.2, %zext.0 ; users
+  %or.4 = or i32 %or.3, %zext.1 ; of
+  %or.5 = or i32 %or.4, %zext.2 ; vectorized
+  %or.6 = or i32 %or.5, %zext.3 ; ops
+
+  %store.this = zext i32 %or.6 to i96
+
+  store i96 %store.this, ptr null, align 16
+  ret void
+}
+
+attributes #0 = { "target-features"="+v" }

Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

@patrick-rivos
Copy link
Contributor Author

patrick-rivos commented Mar 22, 2024

ICE Backtrace (added debug trace to show all the offending ops):

Deleting   %zext.1 = zext to i32 with users:  %or.4 = or i32 %or.3, %zext.1.
Deleting   %zext.2 = zext to i32 with users:  %or.5 = or i32 %or.4, %zext.2.
Deleting   %zext.3 = zext to i32 with users:  %or.6 = or i32 %or.5, %zext.3.
Deleting   %zext.0 = zext to i32 with users:  %or.3 = or i32 %or.2, %zext.0.
opt: /scratch/tc-testing/tc-mar-5-llvm-slp-ice-2/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3788: llvm::slpvectorizer::BoUpSLP::~BoUpSLP(): Assertion `I->use_empty() && "trying to erase instruction with users."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

Explanation:

(Store.this)
     |
   (or.3-6)
   ////   \
zext[1]  (or.0-2)
         |||\
        trunc i96 [2]
           // \\
	 lshr  add [3]

[1] This operation gets vectorized. or.3-6 still exist and are not cleaned up.

Begin evaluating tree of depth 2 ([2] and [3]):
[2] The truncations are considered and return an invalid cost. This is due to
risc-v not having a vector op with SEW > 64.
[3] Also considered and rejected (because they are i96 ops).

! At this point we return a null ptr without running the cleanup code.

All other vectorization attempts either bail or are unprofitable which causes
the cleanup code to never run for or.3-6

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks!

@patrick-rivos
Copy link
Contributor Author

Thanks for the quick review! I don't have write permission to llvm - could you land this for me?

@alexey-bataev alexey-bataev merged commit 4652ec0 into llvm:main Mar 22, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants