-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[slp][profcheck] Mark select
s as having unknown profile
#162960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
e2d5f62
to
1236326
Compare
select
s as having unknown profile
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-ir Author: Mircea Trofin (mtrofin) ChangesFull diff: https://github.com/llvm/llvm-project/pull/162960.diff 4 Files Affected:
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 041a4ce112275..dacda0afc7f03 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -2548,6 +2548,11 @@ class IRBuilderBase {
std::optional<RoundingMode> Rounding = std::nullopt,
std::optional<fp::ExceptionBehavior> Except = std::nullopt);
+ LLVM_ABI Value *CreateSelectWithUnknownProfile(Value *C, Value *True,
+ Value *False,
+ StringRef PassName,
+ const Twine &Name = "");
+
LLVM_ABI Value *CreateSelect(Value *C, Value *True, Value *False,
const Twine &Name = "",
Instruction *MDFrom = nullptr);
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 614c3a9abb8d0..347bb66cb1145 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -25,6 +25,7 @@
#include "llvm/IR/Module.h"
#include "llvm/IR/NoFolder.h"
#include "llvm/IR/Operator.h"
+#include "llvm/IR/ProfDataUtils.h"
#include "llvm/IR/Statepoint.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"
@@ -1002,6 +1003,18 @@ CallInst *IRBuilderBase::CreateConstrainedFPCall(
return C;
}
+Value *IRBuilderBase::CreateSelectWithUnknownProfile(Value *C, Value *True,
+ Value *False,
+ StringRef PassName,
+ const Twine &Name) {
+ Value *Ret = CreateSelectFMF(C, True, False, {}, Name);
+ if (auto *SI = dyn_cast<SelectInst>(Ret)) {
+ setExplicitlyUnknownBranchWeightsIfProfiled(
+ *SI, *SI->getParent()->getParent(), PassName);
+ }
+ return Ret;
+}
+
Value *IRBuilderBase::CreateSelect(Value *C, Value *True, Value *False,
const Twine &Name, Instruction *MDFrom) {
return CreateSelectFMF(C, True, False, {}, Name, MDFrom);
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index cfa8d2703592e..08df8faad487c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -19414,7 +19414,8 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
}
assert(getNumElements(Cond->getType()) == TrueNumElements &&
"Cannot vectorize Instruction::Select");
- Value *V = Builder.CreateSelect(Cond, True, False);
+ Value *V =
+ Builder.CreateSelectWithUnknownProfile(Cond, True, False, DEBUG_TYPE);
V = FinalShuffle(V, E);
E->VectorizedValue = V;
@@ -23532,18 +23533,19 @@ class HorizontalReduction {
switch (Kind) {
case RecurKind::Or: {
if (UseSelect && OpTy == CmpInst::makeCmpResultType(OpTy))
- return Builder.CreateSelect(
+ return Builder.CreateSelectWithUnknownProfile(
LHS, ConstantInt::getAllOnesValue(CmpInst::makeCmpResultType(OpTy)),
- RHS, Name);
+ RHS, DEBUG_TYPE, Name);
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);
return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,
Name);
}
case RecurKind::And: {
if (UseSelect && OpTy == CmpInst::makeCmpResultType(OpTy))
- return Builder.CreateSelect(
+ return Builder.CreateSelectWithUnknownProfile(
LHS, RHS,
- ConstantInt::getNullValue(CmpInst::makeCmpResultType(OpTy)), Name);
+ ConstantInt::getNullValue(CmpInst::makeCmpResultType(OpTy)),
+ DEBUG_TYPE, Name);
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);
return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,
Name);
@@ -23564,7 +23566,8 @@ class HorizontalReduction {
if (UseSelect) {
CmpInst::Predicate Pred = llvm::getMinMaxReductionPredicate(Kind);
Value *Cmp = Builder.CreateICmp(Pred, LHS, RHS, Name);
- return Builder.CreateSelect(Cmp, LHS, RHS, Name);
+ return Builder.CreateSelectWithUnknownProfile(Cmp, LHS, RHS, DEBUG_TYPE,
+ Name);
}
[[fallthrough]];
case RecurKind::FMax:
diff --git a/llvm/utils/profcheck-xfail.txt b/llvm/utils/profcheck-xfail.txt
index 092d63d2d40df..42b1293b08aec 100644
--- a/llvm/utils/profcheck-xfail.txt
+++ b/llvm/utils/profcheck-xfail.txt
@@ -1310,82 +1310,6 @@ Transforms/SimpleLoopUnswitch/pr60736.ll
Transforms/SimpleLoopUnswitch/trivial-unswitch-freeze-individual-conditions.ll
Transforms/SimpleLoopUnswitch/trivial-unswitch.ll
Transforms/SimpleLoopUnswitch/trivial-unswitch-logical-and-or.ll
-Transforms/SLPVectorizer/AArch64/gather-root.ll
-Transforms/SLPVectorizer/AArch64/horizontal.ll
-Transforms/SLPVectorizer/AArch64/loadi8.ll
-Transforms/SLPVectorizer/AArch64/phi-node-bitwidt-op-not.ll
-Transforms/SLPVectorizer/AArch64/uselistorder.ll
-Transforms/SLPVectorizer/AArch64/vec3-reorder-reshuffle.ll
-Transforms/SLPVectorizer/AArch64/vectorizable-selects-min-max.ll
-Transforms/SLPVectorizer/AArch64/vectorizable-selects-uniform-cmps.ll
-Transforms/SLPVectorizer/AMDGPU/horizontal-store.ll
-Transforms/SLPVectorizer/bool-logical-op-reduction-with-poison.ll
-Transforms/SLPVectorizer/call-arg-reduced-by-minbitwidth.ll
-Transforms/SLPVectorizer/const-bool-logical-or-reduction.ll
-Transforms/SLPVectorizer/extracts-with-undefs.ll
-Transforms/SLPVectorizer/freeze-signedness-missed.ll
-Transforms/SLPVectorizer/gathered-consecutive-loads-different-types.ll
-Transforms/SLPVectorizer/gather_extract_from_vectorbuild.ll
-Transforms/SLPVectorizer/insert-element-build-vector-const.ll
-Transforms/SLPVectorizer/insert-element-build-vector-inseltpoison.ll
-Transforms/SLPVectorizer/insert-element-build-vector.ll
-Transforms/SLPVectorizer/logical-ops-poisonous-repeated.ll
-Transforms/SLPVectorizer/minbitwidth-node-with-multi-users.ll
-Transforms/SLPVectorizer/minbitwidth-user-not-min.ll
-Transforms/SLPVectorizer/partial-register-extract.ll
-Transforms/SLPVectorizer/reduction-gather-non-scheduled-extracts.ll
-Transforms/SLPVectorizer/reorder-node.ll
-Transforms/SLPVectorizer/reused-buildvector-matching-vectorized-node.ll
-Transforms/SLPVectorizer/revec.ll
-Transforms/SLPVectorizer/RISCV/remarks_cmp_sel_min_max.ll
-Transforms/SLPVectorizer/RISCV/remarks-insert-into-small-vector.ll
-Transforms/SLPVectorizer/RISCV/reordered-interleaved-loads.ll
-Transforms/SLPVectorizer/RISCV/revec.ll
-Transforms/SLPVectorizer/RISCV/select-profitability.ll
-Transforms/SLPVectorizer/RISCV/shuffled-gather-casted.ll
-Transforms/SLPVectorizer/RISCV/unsigned-node-trunc-with-signed-users.ll
-Transforms/SLPVectorizer/slp-deleted-inst.ll
-Transforms/SLPVectorizer/SystemZ/cmp-ptr-minmax.ll
-Transforms/SLPVectorizer/SystemZ/ext-not-resized-op-resized.ll
-Transforms/SLPVectorizer/SystemZ/minbitwidth-trunc.ll
-Transforms/SLPVectorizer/X86/bool-mask.ll
-Transforms/SLPVectorizer/X86/bv-root-part-of-graph.ll
-Transforms/SLPVectorizer/X86/cmp-after-intrinsic-call-minbitwidth.ll
-Transforms/SLPVectorizer/X86/cmp-as-alternate-ops.ll
-Transforms/SLPVectorizer/X86/cmp_sel.ll
-Transforms/SLPVectorizer/X86/crash_7zip.ll
-Transforms/SLPVectorizer/X86/crash_clear_undefs.ll
-Transforms/SLPVectorizer/X86/crash_cmpop.ll
-Transforms/SLPVectorizer/X86/debug-counter.ll
-Transforms/SLPVectorizer/X86/debug-info-salvage.ll
-Transforms/SLPVectorizer/X86/extractelement-single-use-many-nodes.ll
-Transforms/SLPVectorizer/X86/extracts-non-extendable.ll
-Transforms/SLPVectorizer/X86/ext-used-scalar-different-bitwidth.ll
-Transforms/SLPVectorizer/X86/gather-node-same-as-vect-but-order.ll
-Transforms/SLPVectorizer/X86/horizontal-minmax.ll
-Transforms/SLPVectorizer/X86/insert-after-bundle.ll
-Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll
-Transforms/SLPVectorizer/X86/minbitwidth-icmp-to-trunc.ll
-Transforms/SLPVectorizer/X86/minbw-user-non-sizable.ll
-Transforms/SLPVectorizer/X86/non-load-reduced-as-part-of-bv.ll
-Transforms/SLPVectorizer/X86/ordering-bug.ll
-Transforms/SLPVectorizer/X86/phi-node-bitwidt-op-not.ll
-Transforms/SLPVectorizer/X86/phi-node-reshuffled-part.ll
-Transforms/SLPVectorizer/X86/pr46983.ll
-Transforms/SLPVectorizer/X86/pr49933.ll
-Transforms/SLPVectorizer/X86/propagate_ir_flags.ll
-Transforms/SLPVectorizer/X86/reduction-bool-logic-op-inside.ll
-Transforms/SLPVectorizer/X86/reduction-logical.ll
-Transforms/SLPVectorizer/X86/resized-bv-values-non-power-of2-node.ll
-Transforms/SLPVectorizer/X86/reused-reductions-with-minbitwidth.ll
-Transforms/SLPVectorizer/X86/select-reduction-op.ll
-Transforms/SLPVectorizer/X86/shrink_after_reorder.ll
-Transforms/SLPVectorizer/X86/subvector-minbitwidth-unsigned-value.ll
-Transforms/SLPVectorizer/X86/undef_vect.ll
-Transforms/SLPVectorizer/X86/used-reduced-op.ll
-Transforms/SLPVectorizer/X86/vec3-reorder-reshuffle.ll
-Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll
-Transforms/SLPVectorizer/X86/whole-registers-compare.ll
Transforms/SROA/addrspacecast.ll
Transforms/SROA/phi-and-select.ll
Transforms/SROA/phi-gep.ll
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
1236326
to
025aef2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally LGTM, though I do wonder whether vector selects shouldn't just be excluded wholesale from profcheck? Vector selects are not going to be converted back to branches.
setExplicitlyUnknownBranchWeightsIfProfiled( | ||
*SI, *SI->getParent()->getParent(), PassName); | ||
} | ||
return Ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use this in 01e19e8 as well -- the tradeoff being having to pass the pass name, but getting to keep IRBuilder usage. The pass name requirement is annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
FWIW, I'm actively thinking how to avoid the explicit name and still get the property we want - namely, identification of origin of "unknowns", which we can use to farm out opportunities for better profiling. If there aren't too many cases, maybe it's not too bad, but I want to have an alternative ready.
I was thinking about that too, my worry is that a wholesale exclusion would allow anything to pass in the future (like, a branch). I realize, on one hand, that it should probably not happen in this case, but on the other hand, we're replacing a one-liner with another one-liner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG for SLP part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although the xfail list looks a bit wonky.
llvm/utils/profcheck-xfail.txt
Outdated
tools/UpdateTestChecks/update_test_checks/various_ir_values_dbgrecords.test | ||
Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll | ||
Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll | ||
Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be quite a few removals in here that come from other patches? Might be fixed by a rebase/merge though.
There are 2 cases:
select
condition is a vector of bools, case in which we don't currently have a way to represent the per-element branch probabilities anyway;llvm.vector.reduce
. We could potentially try and do more here - if the reduced vector contained conditions from other selects, for instanceIn either case, IIUC, chances are the
select
doesn't get lowered to a branch, at least I'm not seeing any evidence of that in an internal complex application (CSFDO + ThinLTO). Seems sufficient to mark the selects are unknown (for profiled functions); since that metadata carries with it the pass name (DEBUG_TYPE
) that marked it as such, we can revisit this if we detect later lowerings of these selects that would have required an actual profile.Issue #147390