-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SimpleLoopUnswitch] Don't use BlockFrequencyInfo to skip cold loops #159522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-analysis Author: Luke Lau (lukel97) ChangesStacked on #159516 In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156 LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available. However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559 And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800 I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch. This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:
Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff 16 Files Affected:
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
class AAResults;
class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
class DominatorTree;
class Function;
class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
ScalarEvolution &SE;
TargetLibraryInfo &TLI;
TargetTransformInfo &TTI;
- BlockFrequencyInfo *BFI;
- BranchProbabilityInfo *BPI;
MemorySSA *MSSA;
};
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
bool UseMemorySSA = false,
- bool UseBlockFrequencyInfo = false,
- bool UseBranchProbabilityInfo = false,
bool LoopNestMode = false)
: Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
- UseBlockFrequencyInfo(UseBlockFrequencyInfo),
- UseBranchProbabilityInfo(UseBranchProbabilityInfo),
LoopNestMode(LoopNestMode) {
LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
FunctionPassManager LoopCanonicalizationFPM;
bool UseMemorySSA = false;
- bool UseBlockFrequencyInfo = false;
- bool UseBranchProbabilityInfo = false;
const bool LoopNestMode;
};
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
/// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
template <typename LoopPassT>
inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
- bool UseBlockFrequencyInfo = false,
- bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
using PassModelT =
detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::forward<LoopPassT>(Pass))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+ UseMemorySSA, false);
} else {
LoopPassManager LPM;
LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::move(LPM))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+ UseMemorySSA, true);
}
}
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
/// be in loop-nest mode if the pass manager contains only loop-nest passes.
template <>
inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
- LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
- bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+ bool UseMemorySSA) {
// Check if LPM contains any loop pass and if it does not, returns an adaptor
// in loop-nest mode.
using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::move(LPM))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
- LoopNestMode);
+ UseMemorySSA, LoopNestMode);
}
/// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
MPM.addPass(createModuleToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
MPM.addPass(createModuleToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- MPM.addPass( \
- createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor( \
- CREATE_PASS(Params.get()), false, false))); \
+ MPM.addPass(createModuleToFunctionPassAdaptor( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false))); \
return Error::success(); \
}
#include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- CGPM.addPass( \
- createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor( \
- CREATE_PASS(Params.get()), false, false))); \
+ CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false))); \
return Error::success(); \
}
#include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
return Err;
// Add the nested pass manager with the appropriate adaptor.
bool UseMemorySSA = (Name == "loop-mssa");
- bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
- return Pipeline.Name.contains("simple-loop-unswitch");
- });
- bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
- return Pipeline.Name == "loop-predication";
- });
- FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
- UseBFI, UseBPI));
+ FPM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
return Error::success();
}
if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
// The risk is that it may become obsolete if we're not careful.
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)); \
+ FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false)); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)); \
+ FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false)); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), \
- false, false)); \
+ FPM.addPass( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)); \
return Error::success(); \
}
#include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
- /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ /*UseMemorySSA=*/true));
FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());
// The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/false));
// Delete small array after loop unroll.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
- /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ /*UseMemorySSA=*/true));
FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/false));
// Delete small array after loop unroll.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/true));
FPM.addPass(CoroElidePass());
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
createFunctionToLoopPassAdaptor(
LoopRotatePass(EnableLoopHeaderDuplication ||
Level != OptimizationLevel::Oz),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false),
+ /*UseMemorySSA=*/false),
PTO.EagerlyInvalidateAnalyses));
}
}
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
OptimizationLevel::O3));
ExtraPasses.addPass(
- createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
ExtraPasses.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/true));
// Now that we've vectorized and unrolled loops, we may have more refined
// alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*USeMemorySSA=*/true));
}
OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (PTO.LoopInterchange)
LPM.addPass(LoopInterchangePass());
- OptimizePM.addPass(createFunctionToLoopPassAdaptor(
- std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+ OptimizePM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
// FIXME: This may not be the right place in the pipeline.
// We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MainFPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*USeMemorySSA=*/true));
if (RunNewGVN)
MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
PTO.ForgetAllSCEVInLoopUnroll));
// The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
- MainFPM.addPass(createFunctionToLoopPassAdaptor(
- std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+ MainFPM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
MainFPM.addPass(LoopDistributePass());
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
// Get the analysis results needed by loop passes.
MemorySSA *MSSA =
UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
- BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
- ? (&AM.getResult<BlockFrequencyAnalysis>(F))
- : nullptr;
- BranchProbabilityInfo *BPI =
- UseBranchProbabilityInfo && F.hasProfileData()
- ? (&AM.getResult<BranchProbabilityAnalysis>(F))
- : nullptr;
LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
AM.getResult<AssumptionAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
AM.getResult<ScalarEvolutionAnalysis>(F),
AM.getResult<TargetLibraryAnalysis>(F),
AM.getResult<TargetIRAnalysis>(F),
- BFI,
- BPI,
MSSA};
// Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
AssumptionCache &AC, AAResults &AA,
TargetTransformInfo &TTI, bool Trivial,
...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesStacked on #159516 In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156 LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available. However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559 And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800 I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch. This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:
Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff 16 Files Affected:
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
class AAResults;
class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
class DominatorTree;
class Function;
class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
ScalarEvolution &SE;
TargetLibraryInfo &TLI;
TargetTransformInfo &TTI;
- BlockFrequencyInfo *BFI;
- BranchProbabilityInfo *BPI;
MemorySSA *MSSA;
};
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
bool UseMemorySSA = false,
- bool UseBlockFrequencyInfo = false,
- bool UseBranchProbabilityInfo = false,
bool LoopNestMode = false)
: Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
- UseBlockFrequencyInfo(UseBlockFrequencyInfo),
- UseBranchProbabilityInfo(UseBranchProbabilityInfo),
LoopNestMode(LoopNestMode) {
LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
FunctionPassManager LoopCanonicalizationFPM;
bool UseMemorySSA = false;
- bool UseBlockFrequencyInfo = false;
- bool UseBranchProbabilityInfo = false;
const bool LoopNestMode;
};
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
/// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
template <typename LoopPassT>
inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
- bool UseBlockFrequencyInfo = false,
- bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
using PassModelT =
detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::forward<LoopPassT>(Pass))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+ UseMemorySSA, false);
} else {
LoopPassManager LPM;
LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::move(LPM))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+ UseMemorySSA, true);
}
}
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
/// be in loop-nest mode if the pass manager contains only loop-nest passes.
template <>
inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
- LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
- bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+ bool UseMemorySSA) {
// Check if LPM contains any loop pass and if it does not, returns an adaptor
// in loop-nest mode.
using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
return FunctionToLoopPassAdaptor(
std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
new PassModelT(std::move(LPM))),
- UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
- LoopNestMode);
+ UseMemorySSA, LoopNestMode);
}
/// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
MPM.addPass(createModuleToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
MPM.addPass(createModuleToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- MPM.addPass( \
- createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor( \
- CREATE_PASS(Params.get()), false, false))); \
+ MPM.addPass(createModuleToFunctionPassAdaptor( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false))); \
return Error::success(); \
}
#include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
- createFunctionToLoopPassAdaptor(CREATE_PASS, false, false))); \
+ createFunctionToLoopPassAdaptor(CREATE_PASS, false))); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- CGPM.addPass( \
- createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor( \
- CREATE_PASS(Params.get()), false, false))); \
+ CGPM.addPass(createCGSCCToFunctionPassAdaptor( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false))); \
return Error::success(); \
}
#include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
return Err;
// Add the nested pass manager with the appropriate adaptor.
bool UseMemorySSA = (Name == "loop-mssa");
- bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
- return Pipeline.Name.contains("simple-loop-unswitch");
- });
- bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
- return Pipeline.Name == "loop-predication";
- });
- FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
- UseBFI, UseBPI));
+ FPM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
return Error::success();
}
if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
// The risk is that it may become obsolete if we're not careful.
#define LOOPNEST_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)); \
+ FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false)); \
return Error::success(); \
}
#define LOOP_PASS(NAME, CREATE_PASS) \
if (Name == NAME) { \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)); \
+ FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false)); \
return Error::success(); \
}
#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS) \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
auto Params = parsePassParameters(PARSER, Name, NAME); \
if (!Params) \
return Params.takeError(); \
- FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), \
- false, false)); \
+ FPM.addPass( \
+ createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)); \
return Error::success(); \
}
#include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
- /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ /*UseMemorySSA=*/true));
FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());
// The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/false));
// Delete small array after loop unroll.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
- /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ /*UseMemorySSA=*/true));
FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/false));
// Delete small array after loop unroll.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/true));
FPM.addPass(CoroElidePass());
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
createFunctionToLoopPassAdaptor(
LoopRotatePass(EnableLoopHeaderDuplication ||
Level != OptimizationLevel::Oz),
- /*UseMemorySSA=*/false,
- /*UseBlockFrequencyInfo=*/false),
+ /*UseMemorySSA=*/false),
PTO.EagerlyInvalidateAnalyses));
}
}
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
OptimizationLevel::O3));
ExtraPasses.addPass(
- createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
- /*UseBlockFrequencyInfo=*/true));
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
ExtraPasses.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*UseMemorySSA=*/true));
// Now that we've vectorized and unrolled loops, we may have more refined
// alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*USeMemorySSA=*/true));
}
OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (PTO.LoopInterchange)
LPM.addPass(LoopInterchangePass());
- OptimizePM.addPass(createFunctionToLoopPassAdaptor(
- std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+ OptimizePM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
// FIXME: This may not be the right place in the pipeline.
// We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MainFPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/*AllowSpeculation=*/true),
- /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+ /*USeMemorySSA=*/true));
if (RunNewGVN)
MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
PTO.ForgetAllSCEVInLoopUnroll));
// The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
- MainFPM.addPass(createFunctionToLoopPassAdaptor(
- std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+ MainFPM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
MainFPM.addPass(LoopDistributePass());
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
// Get the analysis results needed by loop passes.
MemorySSA *MSSA =
UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
- BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
- ? (&AM.getResult<BlockFrequencyAnalysis>(F))
- : nullptr;
- BranchProbabilityInfo *BPI =
- UseBranchProbabilityInfo && F.hasProfileData()
- ? (&AM.getResult<BranchProbabilityAnalysis>(F))
- : nullptr;
LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
AM.getResult<AssumptionAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
AM.getResult<ScalarEvolutionAnalysis>(F),
AM.getResult<TargetLibraryAnalysis>(F),
AM.getResult<TargetIRAnalysis>(F),
- BFI,
- BPI,
MSSA};
// Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
AssumptionCache &AC, AAResults &AA,
TargetTransformInfo &TTI, bool Trivial,
...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the existing test to PhaseOrdering to show how using PGOForceFunctionAttrs also prevents unswitching this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the test added in #157888 to make sure we invalidated BFI in the loop pass adaptor. But if we don't use BFI anymore then it doesn't get invalidated
This only works if the whole function is cold, not if there is a cold loop inside a hot function, right? |
Yeah I briefly touched on that in the PR description. I couldn't find anything in https://reviews.llvm.org/D129599 that explicitly mentioned cold loops in hot functions, so I hope that function level granularity is good enough for whatever case motivated that patch. At least it seems to be good enough for the test case that was included with it, llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll |
Okay, looking through the history of what happened here, we have:
So with that history in mind, it does seem like just relying on the whole function being cold (via optsize) may be fine, as that was one of the intermediate solutions, which was only dropped due to compile-time issues. (Note that PGOForceFunctionAttrsPass uses isFunctionColdInCallGraph as well, so it should be equivalent to the second solution without the compile-time impact.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please leave some time for others to comment.
In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156 LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available. However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559 And coincidentally sometime after the initial BFI patch PGOForceFunctionAttrsPass was added which will automatically annotate cold functions with OptSize: https://reviews.llvm.org/D149800 I think using PGOForceFunctionAttrs to add the OptSize is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch. This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this: - It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline - We no longer have to worry about lossy analysis results - Which in turn means the decision to avoid non-trivial switching will be more accurate - It brings the behaviour inline with other passes that respect OptSize - It respects the -pgo-cold-func-opt flag so users can control the behaviour - It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time
This was added to illustrate the lossy preservation of BPI in 452714f But it fails now because we don't run OuterAnalysisManagerProxy beforehand. I'm not sure if it's still a useful thing to test given that llvm#159516 removes BPI from the adaptor, so I've gone ahead and deleted the test
39d102c
to
754f242
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall checking for hot/cold loops in principle makes more sense to me; i.e. no point in unswitching a loop that's never executed in a hot function.
Not regressing SPEC is great, but still leaves a lot of room to regress other workloads. I don't really have any concrete suggestions at the moment; if we can't easily check for hotness of loops I guess this is fine for now and we can revisit if/when we discover regressions
Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.
|
The main motivation is to remove the last use of BlockFrequencyInfo from within a loop pass, because I'm not sure if there's much value in relying on a potentially inaccurate result given that it's only preserved lossily. I noticed this originally when some of the lossy results escaped into the loop vectorizer in #157888, causing BFI to return some bogus frequencies. Not off by one or two etc, but returning the frequencies for different blocks altogether since the CFG had been modified. I'm a bit worried that SimpleLoopUnswitch is also affected by this e.g. thinking a loop is cold when it's not. There is a secondary motivation that avoiding code size increases should be handled by PGOForceFunctionAttrsPass. I think it's somewhat coincidental that the original patch checked that the loops were cold instead of the function. As Nikita mentions, fb45f3c seems to suggest that the original author was fine with checking the function. |
Isn't this a larger problem though - downstream function passes such as BB layout will also be affected by incorrect BFI. I'm unclear why disabling use of BFI in this particular pass is the right approach to this issue.
I'm not sure I agree here - why should code size increases from cold code only be gated at the function level? (I see you filed #159595, that seems like a good approach longer term.)
These messages reference SPEC, though, which isn't always the best reflector of real world codes. |
Incorrect BFI leaking through to function passes was fixed in #157888 (wasn't exactly the easiest thing to debug 😅). AFAIK SimpleLoopUnswitch is now the last user of lossy BFI. |
@fhahn @teresajohnson I did some more experimenting and I think the usage of lossy BFI is a bigger problem than I had initially expected. I changed SimpeLoopUnswitch to always invalidate BlockFrequencyAnalysis and BranchProbabilityAnalysis beforehand (not including other analyses), so that the BFI data is always fresh. On arm64-apple-darwin -O3 there's a large difference in the number of "cold" loops skipped, specifically a lot that previously were considered cold but no longer aren't:
This translates to a significant number of loops that we are erroneously not unswitching because we mistakenly think they are cold, 4.3% geomean overall.
For this reason alone I think we should remove the use of BlockFrequencyInfo as it's very likely that it's impeding performance. In the above results, the increase in loops unswitched all come from hot loops. |
Thanks for the data. That does look pretty compelling. We may have some binary size or compile time increase for loops that still are cold in non-cold functions - do you have any data on how many cold loops (after forcing BFI/BPA invalidation before the pass as above) will be unnecessarily unswitched after this change due to being in non-cold functions? Also can you check a couple of instances where the loop was cold until BFI/BPA were invalidated and see why they were cold - were they new loops created by previous loop transformations? With any eventual support for loop attributes we'd need to ensure they were copied or synthesized on such new loop nests appropriately. But generally I think this change makes sense. However, I have a question on a test change - it isn't clear to me whether the effects of this PR are being captured in a test currently. |
define void @_Z11hotFunctionbiiPiS_S_(i1 %cond, i32 %M, i32 %N, ptr %A, ptr %B, ptr %C) !prof !36 { | ||
; CHECK-LABEL: define void @_Z11hotFunctionbiiPiS_S_ | ||
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) !prof [[PROF16:![0-9]+]] { | ||
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) #[[ATTR0:[0-9]+]] {{.*}}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATTR0 is never matched with anything later on - what is this trying to check? I'm a little unsure of what changed here as per the comments and the function name this is a cold loop nest in a hot function, so shouldn't we stop seeing it be unswitched with this PR?
In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo with PGO, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156
LICM moved away from BFI so as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.
However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559
And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800
I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.
This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this: