Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopRotate] Set loop back edge weight to not less than exit weight #86496

Merged
merged 4 commits into from
Mar 28, 2024

Conversation

HaohaiWen
Copy link
Contributor

Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.

Branch weight from sample-based PGO may be not inaccurate due to
sampling. This test tracks such case where updateBranchWeights wraps
unsigned.
Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 25, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Haohai Wen (HaohaiWen)

Changes

Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.


Full diff: https://github.com/llvm/llvm-project/pull/86496.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+10)
  • (modified) llvm/test/Transforms/LoopRotate/update-branch-weights.ll (+42)
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index bc671171137199..dd31cbd8376c4b 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -347,9 +347,19 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
         // probabilities as if there are only 0-trip and 1-trip cases.
         ExitWeight0 = OrigLoopExitWeight - OrigLoopBackedgeWeight;
       }
+    } else {
+      if (OrigLoopExitWeight > OrigLoopBackedgeWeight) {
+        LLVM_DEBUG(
+            dbgs() << "WARNING: Bad loop back edge weight. Adjust it from "
+                   << OrigLoopBackedgeWeight << " to " << OrigLoopExitWeight
+                   << "\n");
+        OrigLoopBackedgeWeight = OrigLoopExitWeight;
+      }
     }
+    assert(OrigLoopExitWeight >= ExitWeight0 && "Bad branch weight");
     ExitWeight1 = OrigLoopExitWeight - ExitWeight0;
     EnterWeight = ExitWeight1;
+    assert(OrigLoopBackedgeWeight >= EnterWeight && "Bad branch weight");
     LoopBackWeight = OrigLoopBackedgeWeight - EnterWeight;
   } else if (OrigLoopExitWeight == 0) {
     if (OrigLoopBackedgeWeight == 0) {
diff --git a/llvm/test/Transforms/LoopRotate/update-branch-weights.ll b/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
index 5d742b64e0adbf..9a1f36ec5ff2be 100644
--- a/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
+++ b/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
@@ -232,6 +232,46 @@ loop_exit:
   ret void
 }
 
+; BFI_BEFORE-LABEL: block-frequency-info: func6_inaccurate_branch_weight
+; BFI_BEFORE: - entry: {{.*}} count = 1024
+; BFI_BEFORE: - loop_header: {{.*}} count = 2047
+; BFI_BEFORE: - loop_body: {{.*}} count = 1023
+; BFI_BEFORE: - loop_exit: {{.*}} count = 1024
+
+; BFI_AFTER-LABEL: block-frequency-info: func6_inaccurate_branch_weight
+; BFI_AFTER: - entry: {{.*}} count = 1024
+; BFI_AFTER: - loop_body: {{.*}} count = 1024
+; BFI_AFTER: - loop_exit: {{.*}} count = 1024
+
+; IR-LABEL: define void @func6_inaccurate_branch_weight(
+; IR: entry:
+; IR:   br label %loop_body
+; IR: loop_body:
+; IR:   br i1 %cmp, label %loop_body, label %loop_exit, !prof [[PROF_FUNC6_0:![0-9]+]]
+; IR: loop_exit:
+; IR:   ret void
+
+; Branch weight from sample-based PGO may be inaccurate due to sampling.
+; Count for loop_body in following case should be not less than loop_exit.
+; However this may not hold for Sample-based PGO.
+define void @func6_inaccurate_branch_weight() !prof !3 {
+entry:
+  br label %loop_header
+
+loop_header:
+  %i = phi i32 [0, %entry], [%i_inc, %loop_body]
+  %cmp = icmp slt i32 %i, 2
+  br i1 %cmp, label %loop_body, label %loop_exit, !prof !9
+
+loop_body:
+  store volatile i32 %i, ptr @g, align 4
+  %i_inc = add i32 %i, 1
+  br label %loop_header
+
+loop_exit:
+  ret void
+}
+
 !0 = !{!"function_entry_count", i64 1}
 !1 = !{!"branch_weights", i32 1000, i32 1}
 !2 = !{!"branch_weights", i32 3000, i32 1000}
@@ -241,6 +281,7 @@ loop_exit:
 !6 = !{!"branch_weights", i32 0, i32 1}
 !7 = !{!"branch_weights", i32 1, i32 0}
 !8 = !{!"branch_weights", i32 0, i32 0}
+!9 = !{!"branch_weights", i32 1023, i32 1024}
 
 ; IR: [[PROF_FUNC0_0]] = !{!"branch_weights", i32 2000, i32 1000}
 ; IR: [[PROF_FUNC0_1]] = !{!"branch_weights", i32 999, i32 1}
@@ -251,3 +292,4 @@ loop_exit:
 ; IR: [[PROF_FUNC3_0]] = !{!"branch_weights", i32 0, i32 1}
 ; IR: [[PROF_FUNC4_0]] = !{!"branch_weights", i32 1, i32 0}
 ; IR: [[PROF_FUNC5_0]] = !{!"branch_weights", i32 0, i32 0}
+; IR: [[PROF_FUNC6_0]] = !{!"branch_weights", i32 0, i32 1024}

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

✅ With the latest revision this PR passed the Python code formatter.

@HaohaiWen
Copy link
Contributor Author

Any more comments?

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I suggest you wait for @MatzeB to take a look.

Comment on lines 352 to 355
LLVM_DEBUG(
dbgs() << "WARNING: Bad loop back edge weight. Adjust it from "
<< OrigLoopBackedgeWeight << " to " << OrigLoopExitWeight
<< "\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need debug print warning for this. There is a lot of guessing and fixing already in the heuristic above, and it'd be inconsistent to just print debug warning for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heuristic wasn't based on guessing. I think those three expressions were bullet proofed (@MatzeB , please correct me if I am wrong). It just have a pre request the profile data is accurate. What I'm trying is to fix inaccurate profile data.

  //  -  x == x0 + x1        # counts to "exit" must stay the same.
  //  - y0 == x - x0 == x1   # how often loop was entered at all.
  //  - y1 == y - y0         # How often loop was repeated (after first iter.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heuristic wasn't based on guessing. I think those three expressions were bullet proofed

I was referring to the existing heuristic. With loop rotate, you cannot recover ground truth counts. And you have to make assumptions like "as if 0-trip count nearly never happens"/ "as if there are only 0-trip and 1-trip cases".

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm a bit confused looking at the test case..

  1. You can have zero-trip loop, in which case loop_exit can be reached without going into loop_body, why loop back edge needs to be not less than exit weight?

  2. In principle, profile inference for sample PGO should make input counts consistent. If there's input count inconsistency, we need track down the source of inconsistency. Fixing the input count inconsistency in a transformation isn't the right approach IMO. (The existing heuristic are making guesses for loop rotation which is different and is fine)

@MatzeB
Copy link
Contributor

MatzeB commented Mar 28, 2024

I don't follow why this would be a situation that needs fixup. While it is less common, it is perfectly fine for a loop header to exit more often than entering the body and using the loop backedge (when 0-trip counts happen often enough for a given loop to push the average trip count below 1).

@MatzeB
Copy link
Contributor

MatzeB commented Mar 28, 2024

Oh I see, this is for the case where the condition can be constant folded. I guess this makes sense in that case and you want to avoid an underflow? (may be good to mention in a comment/commit message).

@MatzeB
Copy link
Contributor

MatzeB commented Mar 28, 2024

I suspect some amount of noise in the sample / mapping samples to actual blocks etc. is unavoidable and I am not sure this situation (that only becomes "bad" when you consider the fact that the particular loop condition cannot be false in the first iteration) will be considered by smoothing algos. So seems fine to me, but leaving final call to @WenleiHe about whether some inaccuracies like this are considered normal (and hence must be worked around).

@HaohaiWen
Copy link
Contributor Author

Actually I'm a bit confused looking at the test case..

  1. You can have zero-trip loop, in which case loop_exit can be reached without going into loop_body, why loop back edge needs to be not less than exit weight?

Entering line 351 means HasConditionalPreHeader is false. That means this loop body must be executed at least once. Each execution of loop body must have a back edge to latch. Then it either go back to loop body or exit.

  1. In principle, profile inference for sample PGO should make input counts consistent. If there's input count inconsistency, we need track down the source of inconsistency. Fixing the input count inconsistency in a transformation isn't the right approach IMO. (The existing heuristic are making guesses for loop rotation which is different and is fine)

Ideally, It should be fixed when loading profile in SampleProfileLoader. For this case, unless SampleProfilerLoader detecting the loop body must be executed at least (I think it's not easy for SampleProfileLoader), it have no idea whether this profile data is good.

@WenleiHe
Copy link
Member

Ok, thanks for clarification. Please add a comment in the code explaining this case. Also clean up the debug prints.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest revision LGTM, thanks.

@HaohaiWen HaohaiWen merged commit 896037c into llvm:main Mar 28, 2024
4 checks passed
@HaohaiWen HaohaiWen deleted the looprotate branch March 28, 2024 12:07
@alexfh
Copy link
Contributor

alexfh commented Apr 4, 2024

We see regressions of up to 50% (depending on microarchitecture) in llvm-test-suite microbenchmarks, when compiled with PGO+thinlto:

I don't think we consider these important, but I thought you might be interested in this observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants