Skip to content

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 10, 2025

Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them.

However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438.

So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness.

But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes.

This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors.

This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set.

I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.

Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them.

However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438.

So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness.

But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes.

This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors.

This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set.

I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.
@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them.

However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438.

So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness.

But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes.

This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors.

This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set.

I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.


Full diff: https://github.com/llvm/llvm-project/pull/157888.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopPassManager.cpp (-4)
  • (modified) llvm/test/Other/loop-pm-invalidation.ll (+30)
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index e390fdfb81e7b..4b26ce12a28db 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -349,10 +349,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
   PA.preserve<DominatorTreeAnalysis>();
   PA.preserve<LoopAnalysis>();
   PA.preserve<ScalarEvolutionAnalysis>();
-  if (UseBlockFrequencyInfo && F.hasProfileData())
-    PA.preserve<BlockFrequencyAnalysis>();
-  if (UseBranchProbabilityInfo && F.hasProfileData())
-    PA.preserve<BranchProbabilityAnalysis>();
   if (UseMemorySSA)
     PA.preserve<MemorySSAAnalysis>();
   return PA;
diff --git a/llvm/test/Other/loop-pm-invalidation.ll b/llvm/test/Other/loop-pm-invalidation.ll
index 25552f7f139fd..f680aa970d0c4 100644
--- a/llvm/test/Other/loop-pm-invalidation.ll
+++ b/llvm/test/Other/loop-pm-invalidation.ll
@@ -16,6 +16,11 @@
 ; RUN: opt -disable-output -disable-verify -verify-analysis-invalidation=0  -debug-pass-manager %s -aa-pipeline= 2>&1 \
 ; RUN:     -passes='loop(no-op-loop,loop-deletion),invalidate<scalar-evolution>,loop(no-op-loop)' \
 ; RUN:     | FileCheck %s --check-prefix=CHECK-SCEV-INV-AFTER-DELETE
+;
+; Test that BFI is invalidated after the loop adapter if any of the loop passes
+; invalidated it.
+; RUN: opt -disable-output -disable-verify -verify-analysis-invalidation=0  -debug-pass-manager %s -aa-pipeline= 2>&1 \
+; RUN:     -O1 | FileCheck %s --check-prefix=CHECK-BFI-INV
 
 define void @no_loops() {
 ; CHECK-LOOP-INV: Running pass: LoopSimplifyPass
@@ -242,3 +247,28 @@ l0.header:
 exit:
   ret void
 }
+
+; CHECK-BFI-INV-LABEL: Running analysis: OuterAnalysisManagerProxy<FunctionAnalysisManager, Loop, LoopStandardAnalysisResults &> on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: LoopInstSimplifyPass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: LoopSimplifyCFGPass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: LICMPass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: LoopRotatePass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: LICMPass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: SimpleLoopUnswitchPass on loop %l0.header in function simplifiable_loop
+; CHECK-BFI-INV-NEXT: Invalidating analysis: PostDominatorTreeAnalysis on simplifiable_loop
+; CHECK-BFI-INV-NEXT: Invalidating analysis: BranchProbabilityAnalysis on simplifiable_loop
+; CHECK-BFI-INV-NEXT: Invalidating analysis: BlockFrequencyAnalysis on simplifiable_loop
+; CHECK-BFI-INV-NEXT: Running pass: SimplifyCFGPass on simplifiable_loop (5 instructions)
+
+define void @simplifiable_loop(i1 %c) !prof !0 {
+entry:
+  br label %l0.header
+
+l0.header:
+  br label %l0.latch
+
+l0.latch:
+  br i1 %c, label %l0.header, label %l0.latch
+}
+
+!0 = !{!"function_entry_count", i64 1}

Apparently the pass ID is based off the demangled type name, which I guess differs on MSVC?
@alinas
Copy link
Contributor

alinas commented Sep 11, 2025

Could you add a testcase showing the incorrect results when followed by the LoopVectorizer?

@aeubanks
Copy link
Contributor

or at least a test that runs print<block-freq> that changes with this commit

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 15, 2025

Could you add a testcase showing the incorrect results when followed by the LoopVectorizer?

This mostly causes incorrect results in an upcoming patch where I'm trying to use BFI more in the LoopVectorizer cost model. The smallest reproducer I have so far is with opt -p 'lto<O3>':

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define void @_Z22SetupPrecalculatedDatav(ptr readonly captures(none) %KingSafetyMask, ptr writeonly captures(none) initializes((0, 8)) %Mask, i1 %0) local_unnamed_addr #0 !prof !0 {
  br label %2

2:                                                ; preds = %5, %1
  %.pre = load i64, ptr %KingSafetyMask, align 8
  br label %3

3:                                                ; preds = %3, %2
  %4 = phi i1 [ false, %2 ], [ true, %3 ]
  store i64 %.pre, ptr %Mask, align 8
  br i1 %4, label %5, label %3

5:                                                ; preds = %3
  br i1 %0, label %6, label %2

6:                                                ; preds = %5
  ret void
}

attributes #0 = { "target-features"="+v" }

!0 = !{!"function_entry_count", i64 1}

This doesn't crash today but if you add this assert that the BFI is accurate for the loops that the loop vectorizer processes, it fails:

    // Stick this somewhere in LoopVectorizePass::runImpl
    if (!L->isInvalid() && L->isInnermost() && !L->hasNoExitBlocks())
      for (BasicBlock *BB : L->blocks())
        assert(BFI->getBlockFreq(L->getHeader()) >= BFI->getBlockFreq(BB))

I'll try and see if I can get a test case with print<block-freq>. So far I've had no luck since I can't seem to trigger the assertion when manually specifying the pipeline with the output of print-pipeline-passes, i.e. it seems to need -p 'lto<O3>'

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 15, 2025

@alinas @aeubanks I gave this a try, but unfortunately I don't think it's (easily) possible to add a test which shows any difference in LoopVectorizer behaviour today. LoopVectorizer currently only uses BFI to check if it's optimizing for size, but it's very difficult to recreate the conditions to trigger this in combination with the loop simplifications needed to trigger the invalid BFI state.

Also to recreate this UseBlockFrequencyInfo needs to be set to true in createFunctionToLoopPassAdaptor, but we only do this with the LTO pipelines in e.g. buildLTODefaultPipeline, so AFAICT we can't do 'print'.

I've instead posted the motivating LoopVectorizer patch which includes a test that crashes without this fix: https://github.com/llvm/llvm-project/pull/158690/files#diff-a388af039904a990257f3235bf23eb5271c54ea5a31f916b760b36143d937d81

Specifically, it crashes because the BFI results are inaccurate: The loop header will have a lower frequency than one of the blocks it dominates, which causes a divide by zero when calculating the probability of the block. I hope this illustrates the problem a bit better.

@aeubanks aeubanks requested a review from mtrofin September 17, 2025 21:52
@aeubanks
Copy link
Contributor

paging back in new pass manager into my brain... I'm remembering the compile-time reason loop passes must preserve function analyses (search "quadratic" in https://llvm.org/docs/NewPassManager.html#using-analyses). but this is doing something different, it's not preserving after all the loop passes run in the function -> loop adaptor. I think that should be fine compile-time-wise, it's equivalent to a function pass using and not preserving BFI

I think this makes sense to me, does anyone else have any objections?

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

but this is doing something different, it's not preserving after all the loop passes run in the function -> loop adaptor. I think that should be fine compile-time-wise, it's equivalent to a function pass using and not preserving BFI

Thanks, that matches my understanding too.

FWIW it looks like after LoopPredication was changed to not use BFI in https://reviews.llvm.org/D111668, SimpleLoopUnswitch is the last user of BFI in LoopStandardAnalysisResults. I wonder if it's possible to do something similar to LoopPredication to remove the dependency?

BranchProbabilityInfo also seems to be unused, I'll try and open a PR to remove it.

@lukel97 lukel97 enabled auto-merge (squash) September 18, 2025 05:07
@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

FWIW it looks like after LoopPredication was changed to not use BFI in https://reviews.llvm.org/D111668, SimpleLoopUnswitch is the last user of BFI in LoopStandardAnalysisResults.

Whoops sorry it removed the use of BPI not BFI. It looks like at the time that BFI was made added to LoopStandardAnalysisResults in https://reviews.llvm.org/D86156#change-bAzAQih0v9Mp LICM was the main user? But LICM removed the use in 1a25d0b 04f3c20

@lukel97 lukel97 merged commit 4663d25 into llvm:main Sep 18, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants