Changed default value of slp-max-vf to 192 #70479

d-smirnov · 2023-10-27T16:59:39Z

This PR

Changed default value of slp-max-vf to 192
Minor performance fix: SmallSet -> SmallDenseSet

The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied.
Issue observed at LTO phase (link time increased from ~2 mins to ~62 mins) and caused by increased amount of lengthy instruction chains that SLP vectorizer tries to asses.

@vporpo

llvmbot · 2023-10-27T17:00:40Z

@llvm/pr-subscribers-llvm-transforms

Author: Dmitriy Smirnov (d-smirnov)

Changes

This PR

Changed default value of slp-max-vf to 192
Minor performance fix: SmallSet -> SmallDenseSet

The PR fixes sharp compilation time increase noted in 527.cam4_r when patch https://reviews.llvm.org/D155689 applied.
Issue observed at LTO phase and caused by increased amount of lengthy instruction chains that SLP vectorizer tries to asses.
@vporpo

Full diff: https://github.com/llvm/llvm-project/pull/70479.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index bb4e743c1544a98..de7952b1839f5b3 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -138,7 +138,7 @@ MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
     cl::desc("Attempt to vectorize for this register size in bits"));
 
 static cl::opt<unsigned>
-MaxVFOption("slp-max-vf", cl::init(0), cl::Hidden,
+MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
     cl::desc("Maximum SLP vectorization factor (0=unlimited)"));
 
 /// Limits the size of scheduling regions in a block.
@@ -4135,7 +4135,7 @@ static bool areTwoInsertFromSameBuildVector(
   // Go through the vector operand of insertelement instructions trying to find
   // either VU as the original vector for IE2 or V as the original vector for
   // IE1.
-  SmallSet<int, 8> ReusedIdx;
+  SmallDenseSet<int, 8> ReusedIdx;
   bool IsReusedIdx = false;
   do {
     if (IE2 == VU && !IE1)

github-actions · 2023-10-27T17:13:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

1. Changed default value of slp-max-vf to 192 2. Minor performance fix: SmallSet -> SmallDenseSet

alexey-bataev · 2023-10-30T15:32:14Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+    MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
+                cl::desc("Maximum SLP vectorization factor (0=unlimited)"));


I don't like this change, need to take a closer look at the problematic case. This is a hacak that does not fix the issue, just hides it.

Can the usage of SmallDenseSet instead of SmallSet (below) go in without further info or investigation?

Would you have suggestions on how to share a reproducer, since the issue manifests only at LTO on a large benchmark 528.cam4_r (spec2017)?

d-smirnov · 2023-11-10T23:17:14Z

@alexey-bataev, @vporpo Looks like I narrowed down "search space" a bit. The demonstrator IR is attached.
full.ll.gz
/usr/bin/time opt --passes="loop-vectorize,sroa,simplifycfg,slp-vectorizer" full.ll --disable-output --pass-remarks=slp-vectorize
Run takes about 30 minutes on Graviton 3 machine with unlimited chains and 13 sec with --slp-max-vf=192
Most of the time spent slp-vectorizing compute_uwshcu subroutine

alexey-bataev · 2023-11-13T15:52:23Z

I have a fix for this problem, hope to commit it later today after thorough testing. There are 2 problems, which can be easily fixed: 1. Better to use SmallBitVector instead of SmallSet. 2. Some particular trees, consisting only of phis and buildvector/gather nodes, can be dropped early as non-vectorizable in many cases.

Fixed opt results
time opt -passes=slp-vectorizer repro.ll -disable-output opt -passes=slp-vectorizer repro.ll 11.06s user 0.17s system 99% cpu 11.276 total

alexey-bataev · 2023-11-14T14:31:19Z

Fixed in d4cec1c

llvmbot added vectorization llvm:transforms labels Oct 27, 2023

kiranchandramohan requested review from vporpo and alexey-bataev October 27, 2023 17:02

Changed default value of slp-max-vf to 192

8d0ea93

1. Changed default value of slp-max-vf to 192 2. Minor performance fix: SmallSet -> SmallDenseSet

d-smirnov force-pushed the slp-combinatoric branch from 6252e1e to 8d0ea93 Compare October 30, 2023 15:24

alexey-bataev reviewed Oct 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed default value of slp-max-vf to 192 #70479

Changed default value of slp-max-vf to 192 #70479

d-smirnov commented Oct 27, 2023 •

edited

llvmbot commented Oct 27, 2023

github-actions bot commented Oct 27, 2023 •

edited

alexey-bataev Oct 30, 2023

kiranchandramohan Nov 6, 2023

d-smirnov commented Nov 10, 2023

alexey-bataev commented Nov 13, 2023

alexey-bataev commented Nov 14, 2023

		MaxVFOption("slp-max-vf", cl::init(192), cl::Hidden,
		cl::desc("Maximum SLP vectorization factor (0=unlimited)"));

Changed default value of slp-max-vf to 192 #70479

Are you sure you want to change the base?

Changed default value of slp-max-vf to 192 #70479

Conversation

d-smirnov commented Oct 27, 2023 • edited

llvmbot commented Oct 27, 2023

github-actions bot commented Oct 27, 2023 • edited

alexey-bataev Oct 30, 2023

Choose a reason for hiding this comment

kiranchandramohan Nov 6, 2023

Choose a reason for hiding this comment

d-smirnov commented Nov 10, 2023

alexey-bataev commented Nov 13, 2023

alexey-bataev commented Nov 14, 2023

d-smirnov commented Oct 27, 2023 •

edited

github-actions bot commented Oct 27, 2023 •

edited