-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed default value of slp-max-vf to 192 #70479
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Dmitriy Smirnov (d-smirnov) ChangesThis PR
The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied. Full diff: https://github.com/llvm/llvm-project/pull/70479.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index bb4e743c1544a98..de7952b1839f5b3 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -138,7 +138,7 @@ MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
cl::desc("Attempt to vectorize for this register size in bits"));
static cl::opt<unsigned>
-MaxVFOption("slp-max-vf", cl::init(0), cl::Hidden,
+MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
cl::desc("Maximum SLP vectorization factor (0=unlimited)"));
/// Limits the size of scheduling regions in a block.
@@ -4135,7 +4135,7 @@ static bool areTwoInsertFromSameBuildVector(
// Go through the vector operand of insertelement instructions trying to find
// either VU as the original vector for IE2 or V as the original vector for
// IE1.
- SmallSet<int, 8> ReusedIdx;
+ SmallDenseSet<int, 8> ReusedIdx;
bool IsReusedIdx = false;
do {
if (IE2 == VU && !IE1)
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
1. Changed default value of slp-max-vf to 192 2. Minor performance fix: SmallSet -> SmallDenseSet
6252e1e
to
8d0ea93
Compare
MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden, | ||
cl::desc("Maximum SLP vectorization factor (0=unlimited)")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this change, need to take a closer look at the problematic case. This is a hacak that does not fix the issue, just hides it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the usage of SmallDenseSet
instead of SmallSet
(below) go in without further info or investigation?
Would you have suggestions on how to share a reproducer, since the issue manifests only at LTO on a large benchmark 528.cam4_r (spec2017)?
@alexey-bataev, @vporpo Looks like I narrowed down "search space" a bit. The demonstrator IR is attached. |
I have a fix for this problem, hope to commit it later today after thorough testing. There are 2 problems, which can be easily fixed: 1. Better to use SmallBitVector instead of SmallSet. 2. Some particular trees, consisting only of phis and buildvector/gather nodes, can be dropped early as non-vectorizable in many cases. Fixed opt results |
Fixed in d4cec1c |
This PR
The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied.
Issue observed at LTO phase (link time increased from ~2 mins to ~62 mins) and caused by increased amount of lengthy instruction chains that SLP vectorizer tries to asses.
@vporpo