Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed default value of slp-max-vf to 192 #70479

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

d-smirnov
Copy link
Contributor

@d-smirnov d-smirnov commented Oct 27, 2023

This PR

  1. Changed default value of slp-max-vf to 192
  2. Minor performance fix: SmallSet -> SmallDenseSet

The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied.
Issue observed at LTO phase (link time increased from ~2 mins to ~62 mins) and caused by increased amount of lengthy instruction chains that SLP vectorizer tries to asses.

@vporpo

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 27, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Dmitriy Smirnov (d-smirnov)

Changes

This PR

  1. Changed default value of slp-max-vf to 192
  2. Minor performance fix: SmallSet -> SmallDenseSet

The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied.
Issue observed at LTO phase and caused by increased amount of lengthy instruction chains that SLP vectorizer tries to asses.
@vporpo


Full diff: https://github.com/llvm/llvm-project/pull/70479.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index bb4e743c1544a98..de7952b1839f5b3 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -138,7 +138,7 @@ MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
     cl::desc("Attempt to vectorize for this register size in bits"));
 
 static cl::opt<unsigned>
-MaxVFOption("slp-max-vf", cl::init(0), cl::Hidden,
+MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
     cl::desc("Maximum SLP vectorization factor (0=unlimited)"));
 
 /// Limits the size of scheduling regions in a block.
@@ -4135,7 +4135,7 @@ static bool areTwoInsertFromSameBuildVector(
   // Go through the vector operand of insertelement instructions trying to find
   // either VU as the original vector for IE2 or V as the original vector for
   // IE1.
-  SmallSet<int, 8> ReusedIdx;
+  SmallDenseSet<int, 8> ReusedIdx;
   bool IsReusedIdx = false;
   do {
     if (IE2 == VU && !IE1)

@github-actions
Copy link

github-actions bot commented Oct 27, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

1. Changed default value of slp-max-vf to 192
2. Minor performance fix: SmallSet -> SmallDenseSet
Comment on lines +141 to +142
MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
cl::desc("Maximum SLP vectorization factor (0=unlimited)"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this change, need to take a closer look at the problematic case. This is a hacak that does not fix the issue, just hides it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the usage of SmallDenseSet instead of SmallSet (below) go in without further info or investigation?

Would you have suggestions on how to share a reproducer, since the issue manifests only at LTO on a large benchmark 528.cam4_r (spec2017)?

@d-smirnov
Copy link
Contributor Author

@alexey-bataev, @vporpo Looks like I narrowed down "search space" a bit. The demonstrator IR is attached.
full.ll.gz
/usr/bin/time opt --passes="loop-vectorize,sroa,simplifycfg,slp-vectorizer" full.ll --disable-output --pass-remarks=slp-vectorize
Run takes about 30 minutes on Graviton 3 machine with unlimited chains and 13 sec with --slp-max-vf=192
Most of the time spent slp-vectorizing compute_uwshcu subroutine

@alexey-bataev
Copy link
Member

I have a fix for this problem, hope to commit it later today after thorough testing. There are 2 problems, which can be easily fixed: 1. Better to use SmallBitVector instead of SmallSet. 2. Some particular trees, consisting only of phis and buildvector/gather nodes, can be dropped early as non-vectorizable in many cases.

Fixed opt results
time opt -passes=slp-vectorizer repro.ll -disable-output opt -passes=slp-vectorizer repro.ll 11.06s user 0.17s system 99% cpu 11.276 total

@alexey-bataev
Copy link
Member

Fixed in d4cec1c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants