Skip to content

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 18, 2025

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo with PGO, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI so as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

  • It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
  • We no longer have to worry about lossy analysis results
  • Which in turn means the decision to avoid non-trivial switching will be more accurate
  • It brings the behaviour inline with other passes that respect OptSize
  • It respects the -pgo-cold-func-opt flag so users can control the behaviour
  • It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Sep 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

Stacked on #159516

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

  • It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
  • We no longer have to worry about lossy analysis results
  • Which in turn means the decision to avoid non-trivial switching will be more accurate
  • It brings the behaviour inline with other passes that respect OptSize
  • It respects the -pgo-cold-func-opt flag so users can control the behaviour
  • It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time

Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAnalysisManager.h (-4)
  • (modified) llvm/include/llvm/Transforms/Scalar/LoopPassManager.h (+6-16)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+14-22)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-20)
  • (modified) llvm/lib/Transforms/Scalar/LoopPassManager.cpp (-11)
  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+2-38)
  • (modified) llvm/test/Other/loop-pm-invalidation.ll (-30)
  • (modified) llvm/test/Other/new-pm-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (-1)
  • (renamed) llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll (+3-2)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll (-1)
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
 
 class AAResults;
 class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
 class DominatorTree;
 class Function;
 class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
   ScalarEvolution &SE;
   TargetLibraryInfo &TLI;
   TargetTransformInfo &TTI;
-  BlockFrequencyInfo *BFI;
-  BranchProbabilityInfo *BPI;
   MemorySSA *MSSA;
 };
 
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
 
   explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
                                      bool UseMemorySSA = false,
-                                     bool UseBlockFrequencyInfo = false,
-                                     bool UseBranchProbabilityInfo = false,
                                      bool LoopNestMode = false)
       : Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
-        UseBlockFrequencyInfo(UseBlockFrequencyInfo),
-        UseBranchProbabilityInfo(UseBranchProbabilityInfo),
         LoopNestMode(LoopNestMode) {
     LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
     LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
   FunctionPassManager LoopCanonicalizationFPM;
 
   bool UseMemorySSA = false;
-  bool UseBlockFrequencyInfo = false;
-  bool UseBranchProbabilityInfo = false;
   const bool LoopNestMode;
 };
 
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
 /// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
 template <typename LoopPassT>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
-                                bool UseBlockFrequencyInfo = false,
-                                bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
   if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
     using PassModelT =
         detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::forward<LoopPassT>(Pass))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+        UseMemorySSA, false);
   } else {
     LoopPassManager LPM;
     LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::move(LPM))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+        UseMemorySSA, true);
   }
 }
 
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
 /// be in loop-nest mode if the pass manager contains only loop-nest passes.
 template <>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
-    LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
-    bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+                                                 bool UseMemorySSA) {
   // Check if LPM contains any loop pass and if it does not, returns an adaptor
   // in loop-nest mode.
   using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
   return FunctionToLoopPassAdaptor(
       std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
           new PassModelT(std::move(LPM))),
-      UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
-      LoopNestMode);
+      UseMemorySSA, LoopNestMode);
 }
 
 /// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    MPM.addPass(                                                               \
-        createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(     \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    MPM.addPass(createModuleToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    CGPM.addPass(                                                              \
-        createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(      \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
         return Err;
       // Add the nested pass manager with the appropriate adaptor.
       bool UseMemorySSA = (Name == "loop-mssa");
-      bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name.contains("simple-loop-unswitch");
-      });
-      bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name == "loop-predication";
-      });
-      FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
-                                                  UseBFI, UseBPI));
+      FPM.addPass(
+          createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
       return Error::success();
     }
     if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
 //        The risk is that it may become obsolete if we're not careful.
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()),     \
-                                                false, false));                \
+    FPM.addPass(                                                               \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false));    \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
   // The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   // LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   FPM.addPass(CoroElidePass());
 
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
         createFunctionToLoopPassAdaptor(
             LoopRotatePass(EnableLoopHeaderDuplication ||
                            Level != OptimizationLevel::Oz),
-            /*UseMemorySSA=*/false,
-            /*UseBlockFrequencyInfo=*/false),
+            /*UseMemorySSA=*/false),
         PTO.EagerlyInvalidateAnalyses));
   }
 }
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
     LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
                                        OptimizationLevel::O3));
     ExtraPasses.addPass(
-        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
-                                        /*UseBlockFrequencyInfo=*/true));
+        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
     ExtraPasses.addPass(
         SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
     ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
     OptimizePM.addPass(createFunctionToLoopPassAdaptor(
         LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                  /*AllowSpeculation=*/true),
-        /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+        /*USeMemorySSA=*/true));
   }
 
   OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   if (PTO.LoopInterchange)
     LPM.addPass(LoopInterchangePass());
 
-  OptimizePM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+  OptimizePM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   // FIXME: This may not be the right place in the pipeline.
   // We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MainFPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*USeMemorySSA=*/true));
 
   if (RunNewGVN)
     MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
                                  PTO.ForgetAllSCEVInLoopUnroll));
   // The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
-  MainFPM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+  MainFPM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   MainFPM.addPass(LoopDistributePass());
 
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
 
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
   // Get the analysis results needed by loop passes.
   MemorySSA *MSSA =
       UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
-  BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
-                                ? (&AM.getResult<BlockFrequencyAnalysis>(F))
-                                : nullptr;
-  BranchProbabilityInfo *BPI =
-      UseBranchProbabilityInfo && F.hasProfileData()
-          ? (&AM.getResult<BranchProbabilityAnalysis>(F))
-          : nullptr;
   LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
                                      AM.getResult<AssumptionAnalysis>(F),
                                      AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
                                      AM.getResult<ScalarEvolutionAnalysis>(F),
                                      AM.getResult<TargetLibraryAnalysis>(F),
                                      AM.getResult<TargetIRAnalysis>(F),
-                                     BFI,
-                                     BPI,
                                      MSSA};
 
   // Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/MemorySSAUpdater.h"
 #include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
                          AssumptionCache &AC, AAResults &AA,
                          TargetTransformInfo &TTI, bool Trivial,
  ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Stacked on #159516

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

  • It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
  • We no longer have to worry about lossy analysis results
  • Which in turn means the decision to avoid non-trivial switching will be more accurate
  • It brings the behaviour inline with other passes that respect OptSize
  • It respects the -pgo-cold-func-opt flag so users can control the behaviour
  • It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time

Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAnalysisManager.h (-4)
  • (modified) llvm/include/llvm/Transforms/Scalar/LoopPassManager.h (+6-16)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+14-22)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-20)
  • (modified) llvm/lib/Transforms/Scalar/LoopPassManager.cpp (-11)
  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+2-38)
  • (modified) llvm/test/Other/loop-pm-invalidation.ll (-30)
  • (modified) llvm/test/Other/new-pm-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (-1)
  • (renamed) llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll (+3-2)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll (-1)
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
 
 class AAResults;
 class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
 class DominatorTree;
 class Function;
 class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
   ScalarEvolution &SE;
   TargetLibraryInfo &TLI;
   TargetTransformInfo &TTI;
-  BlockFrequencyInfo *BFI;
-  BranchProbabilityInfo *BPI;
   MemorySSA *MSSA;
 };
 
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
 
   explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
                                      bool UseMemorySSA = false,
-                                     bool UseBlockFrequencyInfo = false,
-                                     bool UseBranchProbabilityInfo = false,
                                      bool LoopNestMode = false)
       : Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
-        UseBlockFrequencyInfo(UseBlockFrequencyInfo),
-        UseBranchProbabilityInfo(UseBranchProbabilityInfo),
         LoopNestMode(LoopNestMode) {
     LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
     LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
   FunctionPassManager LoopCanonicalizationFPM;
 
   bool UseMemorySSA = false;
-  bool UseBlockFrequencyInfo = false;
-  bool UseBranchProbabilityInfo = false;
   const bool LoopNestMode;
 };
 
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
 /// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
 template <typename LoopPassT>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
-                                bool UseBlockFrequencyInfo = false,
-                                bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
   if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
     using PassModelT =
         detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::forward<LoopPassT>(Pass))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+        UseMemorySSA, false);
   } else {
     LoopPassManager LPM;
     LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::move(LPM))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+        UseMemorySSA, true);
   }
 }
 
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
 /// be in loop-nest mode if the pass manager contains only loop-nest passes.
 template <>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
-    LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
-    bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+                                                 bool UseMemorySSA) {
   // Check if LPM contains any loop pass and if it does not, returns an adaptor
   // in loop-nest mode.
   using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
   return FunctionToLoopPassAdaptor(
       std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
           new PassModelT(std::move(LPM))),
-      UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
-      LoopNestMode);
+      UseMemorySSA, LoopNestMode);
 }
 
 /// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    MPM.addPass(                                                               \
-        createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(     \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    MPM.addPass(createModuleToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    CGPM.addPass(                                                              \
-        createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(      \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
         return Err;
       // Add the nested pass manager with the appropriate adaptor.
       bool UseMemorySSA = (Name == "loop-mssa");
-      bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name.contains("simple-loop-unswitch");
-      });
-      bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name == "loop-predication";
-      });
-      FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
-                                                  UseBFI, UseBPI));
+      FPM.addPass(
+          createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
       return Error::success();
     }
     if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
 //        The risk is that it may become obsolete if we're not careful.
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()),     \
-                                                false, false));                \
+    FPM.addPass(                                                               \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false));    \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
   // The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   // LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   FPM.addPass(CoroElidePass());
 
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
         createFunctionToLoopPassAdaptor(
             LoopRotatePass(EnableLoopHeaderDuplication ||
                            Level != OptimizationLevel::Oz),
-            /*UseMemorySSA=*/false,
-            /*UseBlockFrequencyInfo=*/false),
+            /*UseMemorySSA=*/false),
         PTO.EagerlyInvalidateAnalyses));
   }
 }
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
     LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
                                        OptimizationLevel::O3));
     ExtraPasses.addPass(
-        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
-                                        /*UseBlockFrequencyInfo=*/true));
+        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
     ExtraPasses.addPass(
         SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
     ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
     OptimizePM.addPass(createFunctionToLoopPassAdaptor(
         LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                  /*AllowSpeculation=*/true),
-        /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+        /*USeMemorySSA=*/true));
   }
 
   OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   if (PTO.LoopInterchange)
     LPM.addPass(LoopInterchangePass());
 
-  OptimizePM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+  OptimizePM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   // FIXME: This may not be the right place in the pipeline.
   // We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MainFPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*USeMemorySSA=*/true));
 
   if (RunNewGVN)
     MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
                                  PTO.ForgetAllSCEVInLoopUnroll));
   // The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
-  MainFPM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+  MainFPM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   MainFPM.addPass(LoopDistributePass());
 
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
 
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
   // Get the analysis results needed by loop passes.
   MemorySSA *MSSA =
       UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
-  BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
-                                ? (&AM.getResult<BlockFrequencyAnalysis>(F))
-                                : nullptr;
-  BranchProbabilityInfo *BPI =
-      UseBranchProbabilityInfo && F.hasProfileData()
-          ? (&AM.getResult<BranchProbabilityAnalysis>(F))
-          : nullptr;
   LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
                                      AM.getResult<AssumptionAnalysis>(F),
                                      AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
                                      AM.getResult<ScalarEvolutionAnalysis>(F),
                                      AM.getResult<TargetLibraryAnalysis>(F),
                                      AM.getResult<TargetIRAnalysis>(F),
-                                     BFI,
-                                     BPI,
                                      MSSA};
 
   // Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/MemorySSAUpdater.h"
 #include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
                          AssumptionCache &AC, AAResults &AA,
                          TargetTransformInfo &TTI, bool Trivial,
  ...
[truncated]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the existing test to PhaseOrdering to show how using PGOForceFunctionAttrs also prevents unswitching this loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the test added in #157888 to make sure we invalidated BFI in the loop pass adaptor. But if we don't use BFI anymore then it doesn't get invalidated

@nikic
Copy link
Contributor

nikic commented Sep 18, 2025

This only works if the whole function is cold, not if there is a cold loop inside a hot function, right?

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

This only works if the whole function is cold, not if there is a cold loop inside a hot function, right?

Yeah I briefly touched on that in the PR description. I couldn't find anything in https://reviews.llvm.org/D129599 that explicitly mentioned cold loops in hot functions, so I hope that function level granularity is good enough for whatever case motivated that patch. At least it seems to be good enough for the test case that was included with it, llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll

@nikic
Copy link
Contributor

nikic commented Sep 18, 2025

Okay, looking through the history of what happened here, we have:

  • f756f06 initially checked whether the loop is cold
  • fb45f3c switched this to check whether the function is cold (using isFunctionColdInCallGraph) to prevent SPEC regressions from the previous
  • dfb40d3 then switched to checking for a cold loop nest because of compile-time issues with the previous approach.

So with that history in mind, it does seem like just relying on the whole function being cold (via optsize) may be fine, as that was one of the intermediate solutions, which was only dropped due to compile-time issues. (Note that PGOForceFunctionAttrsPass uses isFunctionColdInCallGraph as well, so it should be equivalent to the second solution without the compile-time impact.)

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please leave some time for others to comment.

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch PGOForceFunctionAttrsPass was added which will automatically annotate cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to add the OptSize is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

- It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
- We no longer have to worry about lossy analysis results
- Which in turn means the decision to avoid non-trivial switching will be more accurate
- It brings the behaviour inline with other passes that respect OptSize
- It respects the -pgo-cold-func-opt flag so users can control the behaviour
- It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time
This was added to illustrate the lossy preservation of BPI in 452714f

But it fails now because we don't run OuterAnalysisManagerProxy beforehand. I'm not sure if it's still a useful thing to test given that llvm#159516 removes BPI from the adaptor, so I've gone ahead and deleted the test
@lukel97 lukel97 force-pushed the simpleloopunswitch-remove-bfi branch from 39d102c to 754f242 Compare September 18, 2025 12:57
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall checking for hot/cold loops in principle makes more sense to me; i.e. no point in unswitching a loop that's never executed in a hot function.

Not regressing SPEC is great, but still leaves a lot of room to regress other workloads. I don't really have any concrete suggestions at the moment; if we can't easily check for hotness of loops I guess this is fine for now and we can revisit if/when we discover regressions

@teresajohnson
Copy link
Contributor

Overall checking for hot/cold loops in principle makes more sense to me; i.e. no point in unswitching a loop that's never executed in a hot function.

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

Not regressing SPEC is great, but still leaves a lot of room to regress other workloads. I don't really have any concrete suggestions at the moment; if we can't easily check for hotness of loops I guess this is fine for now and we can revisit if/when we discover regressions

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

The main motivation is to remove the last use of BlockFrequencyInfo from within a loop pass, because I'm not sure if there's much value in relying on a potentially inaccurate result given that it's only preserved lossily.

I noticed this originally when some of the lossy results escaped into the loop vectorizer in #157888, causing BFI to return some bogus frequencies. Not off by one or two etc, but returning the frequencies for different blocks altogether since the CFG had been modified. I'm a bit worried that SimpleLoopUnswitch is also affected by this e.g. thinking a loop is cold when it's not.

There is a secondary motivation that avoiding code size increases should be handled by PGOForceFunctionAttrsPass. I think it's somewhat coincidental that the original patch checked that the loops were cold instead of the function. As Nikita mentions, fb45f3c seems to suggest that the original author was fine with checking the function.

@teresajohnson
Copy link
Contributor

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

The main motivation is to remove the last use of BlockFrequencyInfo from within a loop pass, because I'm not sure if there's much value in relying on a potentially inaccurate result given that it's only preserved lossily.

I noticed this originally when some of the lossy results escaped into the loop vectorizer in #157888, causing BFI to return some bogus frequencies. Not off by one or two etc, but returning the frequencies for different blocks altogether since the CFG had been modified. I'm a bit worried that SimpleLoopUnswitch is also affected by this e.g. thinking a loop is cold when it's not.

Isn't this a larger problem though - downstream function passes such as BB layout will also be affected by incorrect BFI. I'm unclear why disabling use of BFI in this particular pass is the right approach to this issue.

There is a secondary motivation that avoiding code size increases should be handled by PGOForceFunctionAttrsPass.

I'm not sure I agree here - why should code size increases from cold code only be gated at the function level? (I see you filed #159595, that seems like a good approach longer term.)

I think it's somewhat coincidental that the original patch checked that the loops were cold instead of the function. As Nikita mentions, fb45f3c seems to suggest that the original author was fine with checking the function.

These messages reference SPEC, though, which isn't always the best reflector of real world codes.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

Isn't this a larger problem though - downstream function passes such as BB layout will also be affected by incorrect BFI. I'm unclear why disabling use of BFI in this particular pass is the right approach to this issue.

Incorrect BFI leaking through to function passes was fixed in #157888 (wasn't exactly the easiest thing to debug 😅). AFAIK SimpleLoopUnswitch is now the last user of lossy BFI.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 23, 2025

@fhahn @teresajohnson I did some more experimenting and I think the usage of lossy BFI is a bigger problem than I had initially expected.

I changed SimpeLoopUnswitch to always invalidate BlockFrequencyAnalysis and BranchProbabilityAnalysis beforehand (not including other analyses), so that the BFI data is always fresh. On arm64-apple-darwin -O3 there's a large difference in the number of "cold" loops skipped, specifically a lot that previously were considered cold but no longer aren't:

lhs = lossy BFI
rhs = correct BFI

Program                                       simple-loop-unswitch.NumColdSkipped                
                                              lhs                                 rhs     diff
MultiSourc...ch/consumer-lame/consumer-lame    283.00                              282.00   -0.4%
MultiSourc...sumer-typeset/consumer-typeset    571.00                              564.00   -1.2%
MultiSource/Applications/siod/siod             176.00                              173.00   -1.7%
MultiSourc...e/Applications/SIBsim4/SIBsim4     56.00                               55.00   -1.8%
MultiSourc.../Applications/JM/lencod/lencod   1247.00                             1220.00   -2.2%
MultiSourc...arks/DOE-ProxyApps-C/CoMD/CoMD     75.00                               73.00   -2.7%
MultiSourc...xyApps-C/Pathfinder/PathFinder     65.00                               63.00   -3.1%
MultiSourc...OE-ProxyApps-C/XSBench/XSBench     20.00                               19.00   -5.0%
MultiSource/Applications/d/make_dparser        153.00                              143.00   -6.5%
MultiSource/Benchmarks/nbench/nbench            82.00                               76.00   -7.3%
MultiSource/Applications/oggenc/oggenc         472.00                              434.00   -8.1%
MultiSourc...hmarks/MallocBench/cfrac/cfrac     23.00                               21.00   -8.7%
MultiSourc...Benchmarks/SciMark2-C/scimark2     32.00                               29.00   -9.4%
MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk     89.00                               80.00  -10.1%
MultiSourc...rsaBench/beamformer/beamformer     19.00                               17.00  -10.5%
MultiSourc...e/Applications/obsequi/Obsequi     70.00                               62.00  -11.4%
MultiSource/Benchmarks/McCat/18-imp/imp         30.00                               26.00  -13.3%
MultiSourc...e-bitcount/automotive-bitcount      7.00                                6.00  -14.3%
MultiSource/Benchmarks/sim/sim                   7.00                                6.00  -14.3%
MultiSourc...SC_Sequoia/CrystalMk/CrystalMk     12.00                               10.00  -16.7%
MultiSource/Applications/sgefa/sgefa            48.00                               40.00  -16.7%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench      6.00                                5.00  -16.7%
MultiSourc...Benchmarks/Ptrdist/yacr2/yacr2     35.00                               28.00  -20.0%
MultiSourc...marks/Prolangs-C/bison/mybison     75.00                               60.00  -20.0%
MultiSource/Benchmarks/Olden/mst/mst            15.00                               11.00  -26.7%
MultiSourc.../Rodinia/pathfinder/pathfinder     11.00                                8.00  -27.3%
MultiSource/Benchmarks/Olden/em3d/em3d          20.00                               14.00  -30.0%
MultiSourc.../Benchmarks/VersaBench/bmm/bmm     26.00                               18.00  -30.8%
MultiSourc...hmarks/FreeBench/neural/neural     28.00                               19.00  -32.1%
MultiSourc...enchmarks/VersaBench/dbms/dbms      6.00                                4.00  -33.3%
MultiSourc.../telecomm-CRC32/telecomm-CRC32      3.00                                2.00  -33.3%
MultiSourc...chmarks/VersaBench/8b10b/8b10b      5.00                                3.00  -40.0%
MultiSource/Benchmarks/McCat/09-vor/vor          2.00                                1.00  -50.0%
MultiSourc...chmarks/McCat/04-bisect/bisect      6.00                                3.00  -50.0%
MultiSource/Benchmarks/Ptrdist/ks/ks             8.00                                4.00  -50.0%
MultiSourc...Benchmarks/Olden/health/health      2.00                                1.00  -50.0%
MultiSourc...marks/Trimaran/enc-rc4/enc-rc4     16.00                                8.00  -50.0%
MultiSourc...rks/Trimaran/enc-3des/enc-3des     12.00                                3.00  -75.0%
MultiSourc...rks/BitBench/uuencode/uuencode      5.00                                1.00  -80.0%
MultiSourc...arks/FreeBench/distray/distray      2.00                                0.00 -100.0%
MultiSourc...reeBench/pcompress2/pcompress2      1.00                                0.00 -100.0%

This translates to a significant number of loops that we are erroneously not unswitching because we mistakenly think they are cold, 4.3% geomean overall.

Metric: simple-loop-unswitch.NumBranches

Program                                       simple-loop-unswitch.NumBranches              
                                              lhs                              rhs    diff  
Applications/aha/aha                            1.00                             2.00 100.0%
Benchmarks/McCat/09-vor/vor                     1.00                             2.00 100.0%
Benchmarks/VersaBench/bmm/bmm                   6.00                            11.00  83.3%
Benchmarks/ASC_Sequoia/IRSmk/IRSmk              2.00                             3.00  50.0%
Benchmarks/nbench/nbench                        4.00                             6.00  50.0%
Benchmarks/ASC_Sequoia/AMGmk/AMGmk             11.00                            16.00  45.5%
Benchmarks...sumer-typeset/consumer-typeset    10.00                            14.00  40.0%
Benchmarks...iabench/g721/g721encode/encode     3.00                             4.00  33.3%
Benchmarks.../Rodinia/pathfinder/pathfinder     3.00                             4.00  33.3%
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD           13.00                            17.00  30.8%
Benchmarks/McCat/18-imp/imp                    13.00                            17.00  30.8%
Benchmarks/FreeBench/neural/neural             10.00                            13.00  30.0%
Benchmarks...nch/mpeg2/mpeg2dec/mpeg2decode     4.00                             5.00  25.0%
Benchmarks/Trimaran/enc-3des/enc-3des           4.00                             5.00  25.0%
Benchmarks.../DOE-ProxyApps-C++/HPCCG/HPCCG     4.00                             5.00  25.0%
                           Geomean difference                                           4.3%

For this reason alone I think we should remove the use of BlockFrequencyInfo as it's very likely that it's impeding performance. In the above results, the increase in loops unswitched all come from hot loops.

@teresajohnson
Copy link
Contributor

Thanks for the data. That does look pretty compelling. We may have some binary size or compile time increase for loops that still are cold in non-cold functions - do you have any data on how many cold loops (after forcing BFI/BPA invalidation before the pass as above) will be unnecessarily unswitched after this change due to being in non-cold functions?

Also can you check a couple of instances where the loop was cold until BFI/BPA were invalidated and see why they were cold - were they new loops created by previous loop transformations? With any eventual support for loop attributes we'd need to ensure they were copied or synthesized on such new loop nests appropriately.

But generally I think this change makes sense. However, I have a question on a test change - it isn't clear to me whether the effects of this PR are being captured in a test currently.

define void @_Z11hotFunctionbiiPiS_S_(i1 %cond, i32 %M, i32 %N, ptr %A, ptr %B, ptr %C) !prof !36 {
; CHECK-LABEL: define void @_Z11hotFunctionbiiPiS_S_
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) !prof [[PROF16:![0-9]+]] {
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) #[[ATTR0:[0-9]+]] {{.*}}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATTR0 is never matched with anything later on - what is this trying to check? I'm a little unsure of what changed here as per the comments and the function name this is a cold loop nest in a hot function, so shouldn't we stop seeing it be unswitched with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants