-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Coroutines] Conditional elide coroutines based on hot/cold information #162276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coroutines] Conditional elide coroutines based on hot/cold information #162276
Conversation
Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with [[clang::coro_await_elidable]], now we will always elide the call to bar() into the frame of foo(). But this may be a regression instead of an optimization if b is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9, but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targetted, etc.) Co-Authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a minor nit
bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine(); | ||
bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe); | ||
if (IsCallerPresplitCoroutine && HasAttr) { | ||
static BranchProbability MinBranchProbability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a static variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to make it a global variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the static
and kept it as a local variable.
I don't think we can make this a global variable, as we would run into initialization order issues between CoroElideBranchRatio
, MinBlockCounterExecution
(which are already global static variables today) and the new global MinBranchProbability
bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine(); | ||
bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe); | ||
if (IsCallerPresplitCoroutine && HasAttr) { | ||
static BranchProbability MinBranchProbability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to make it a global variable?
@llvm/pr-subscribers-llvm-transforms Author: Adrian Vogelsgesang (vogelsgesang) ChangesUnconditionally eliding all For example,
Assume Task is marked with This patch tries to mitigate the problem by leveraging hot/cold This patch was originally written by ChuanqiXu9 (#145831), but stalled Co-Authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com> Full diff: https://github.com/llvm/llvm-project/pull/162276.diff 5 Files Affected:
diff --git a/llvm/include/llvm/IR/DiagnosticInfo.h b/llvm/include/llvm/IR/DiagnosticInfo.h
index 5f7225e878eb1..a426fb079ec04 100644
--- a/llvm/include/llvm/IR/DiagnosticInfo.h
+++ b/llvm/include/llvm/IR/DiagnosticInfo.h
@@ -20,6 +20,7 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"
#include "llvm/IR/DebugLoc.h"
+#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CBindingWrapping.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/ErrorHandling.h"
@@ -555,6 +556,7 @@ class LLVM_ABI DiagnosticInfoOptimizationBase
Argument(StringRef Key, bool B) : Key(Key), Val(B ? "true" : "false") {}
LLVM_ABI Argument(StringRef Key, DebugLoc dl);
LLVM_ABI Argument(StringRef Key, InstructionCost C);
+ LLVM_ABI Argument(StringRef Key, BranchProbability P);
};
/// \p PassName is the name of the pass emitting this diagnostic. \p
diff --git a/llvm/lib/IR/DiagnosticInfo.cpp b/llvm/lib/IR/DiagnosticInfo.cpp
index 4f3762427e012..8e6d654f6afb3 100644
--- a/llvm/lib/IR/DiagnosticInfo.cpp
+++ b/llvm/lib/IR/DiagnosticInfo.cpp
@@ -273,6 +273,13 @@ DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key,
C.print(OS);
}
+DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key,
+ BranchProbability P)
+ : Key(std::string(Key)) {
+ raw_string_ostream OS(Val);
+ P.print(OS);
+}
+
DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key, DebugLoc Loc)
: Key(std::string(Key)), Loc(Loc) {
if (Loc) {
diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index 9115946d205a4..f166fef74941e 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -24,6 +24,9 @@
#include "llvm/IR/Instruction.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"
+#include "llvm/Support/BranchProbability.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileSystem.h"
#include "llvm/Transforms/Utils/CallGraphUpdater.h"
#include "llvm/Transforms/Utils/Cloning.h"
@@ -33,6 +36,11 @@ using namespace llvm;
#define DEBUG_TYPE "coro-annotation-elide"
+static cl::opt<float> CoroElideBranchRatio(
+ "coro-elide-branch-ratio", cl::init(0.55), cl::Hidden,
+ cl::desc("Minimum BranchProbability to consider a elide a coroutine."));
+extern cl::opt<unsigned> MinBlockCounterExecution;
+
static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
for (Instruction &I : F->getEntryBlock())
if (!isa<AllocaInst>(&I))
@@ -145,6 +153,30 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
if (IsCallerPresplitCoroutine && HasAttr) {
+ BranchProbability MinBranchProbability(
+ static_cast<int>(CoroElideBranchRatio * MinBlockCounterExecution),
+ MinBlockCounterExecution);
+
+ auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*Caller);
+
+ auto Prob = BranchProbability::getBranchProbability(
+ BFI.getBlockFreq(CB->getParent()).getFrequency(),
+ BFI.getEntryFreq().getFrequency());
+
+ if (Prob < MinBranchProbability) {
+ ORE.emit([&]() {
+ return OptimizationRemarkMissed(
+ DEBUG_TYPE, "CoroAnnotationElideUnlikely", Caller)
+ << "'" << ore::NV("callee", Callee->getName())
+ << "' not elided in '"
+ << ore::NV("caller", Caller->getName())
+ << "' because of low probability: "
+ << ore::NV("probability", Prob) << " (threshold: "
+ << ore::NV("threshold", MinBranchProbability) << ")";
+ });
+ continue;
+ }
+
auto *CallerN = CG.lookup(*Caller);
auto *CallerC = CallerN ? CG.lookupSCC(*CallerN) : nullptr;
// If CallerC is nullptr, it means LazyCallGraph hasn't visited Caller
@@ -156,7 +188,7 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
return OptimizationRemark(DEBUG_TYPE, "CoroAnnotationElide", Caller)
<< "'" << ore::NV("callee", Callee->getName())
<< "' elided in '" << ore::NV("caller", Caller->getName())
- << "'";
+ << "' (probability: " << ore::NV("probability", Prob) << ")";
});
FAM.invalidate(*Caller, PreservedAnalyses::none());
diff --git a/llvm/lib/Transforms/IPO/PartialInlining.cpp b/llvm/lib/Transforms/IPO/PartialInlining.cpp
index 2583249e65484..1a00d173d3ae0 100644
--- a/llvm/lib/Transforms/IPO/PartialInlining.cpp
+++ b/llvm/lib/Transforms/IPO/PartialInlining.cpp
@@ -109,7 +109,7 @@ static cl::opt<float> MinRegionSizeRatio(
"outline candidate and original function"));
// Used to tune the minimum number of execution counts needed in the predecessor
// block to the cold edge. ie. confidence interval.
-static cl::opt<unsigned>
+cl::opt<unsigned>
MinBlockCounterExecution("min-block-execution", cl::init(100), cl::Hidden,
cl::desc("Minimum block executions to consider "
"its BranchProbabilityInfo valid"));
diff --git a/llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll b/llvm/test/Transforms/Coroutines/coro-elide-safe.ll
similarity index 74%
rename from llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll
rename to llvm/test/Transforms/Coroutines/coro-elide-safe.ll
index 4eec7edad8b0f..722693d4caafd 100644
--- a/llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll
+++ b/llvm/test/Transforms/Coroutines/coro-elide-safe.ll
@@ -1,4 +1,8 @@
-; Testing elide performed its job for calls to coroutines marked safe.
+; Coroutine calls marked with `coro_elide_safe` should be elided.
+; Inside `caller`, we expect the `callee` coroutine to be elided.
+; Inside `caller_conditional`, `callee` is only called on an unlikely
+; path, hence we expect the `callee` coroutine NOT to be elided.
+;
; RUN: opt < %s -S -passes='cgscc(coro-annotation-elide)' | FileCheck %s
%struct.Task = type { ptr }
@@ -57,7 +61,7 @@ define ptr @callee.noalloc(i8 %arg, ptr dereferenceable(32) align(8) %frame) {
; Function Attrs: presplitcoroutine
define ptr @caller() #0 {
entry:
- %task = call ptr @callee(i8 0) #1
+ %task = call ptr @callee(i8 0) coro_elide_safe
ret ptr %task
; CHECK: %[[TASK:.+]] = alloca %struct.Task, align 8
; CHECK-NEXT: %[[FRAME:.+]] = alloca [32 x i8], align 8
@@ -69,6 +73,25 @@ entry:
; CHECK-NEXT: ret ptr %[[TASK]]
}
+; CHECK-LABEL: define ptr @caller_conditional(i1 %cond)
+; Function Attrs: presplitcoroutine
+define ptr @caller_conditional(i1 %cond) #0 {
+entry:
+ br i1 %cond, label %call, label %ret
+
+call:
+ ; CHECK-NOT: alloca
+ ; CHECK-NOT: @llvm.coro.id({{.*}}, ptr @callee, {{.*}})
+ ; CHECK: %task = call ptr @callee(i8 0)
+ ; CHECK-NEXT: br label %ret
+ %task = call ptr @callee(i8 0) coro_elide_safe
+ br label %ret
+
+ret:
+ %retval = phi ptr [ %task, %call ], [ null, %entry ]
+ ret ptr %retval
+}
+
declare token @llvm.coro.id(i32, ptr, ptr, ptr)
declare ptr @llvm.coro.begin(token, ptr)
declare ptr @llvm.coro.frame()
@@ -76,4 +99,3 @@ declare ptr @llvm.coro.subfn.addr(ptr, i8)
declare i1 @llvm.coro.alloc(token)
attributes #0 = { presplitcoroutine }
-attributes #1 = { coro_elide_safe }
|
…on (#162276) Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will always elide the call to `bar()` into the frame of `foo()`. But this may be a regression instead of an optimization if `b` is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9 (#145831), but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targeted, etc.) Co-Authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
Unconditionally eliding all
[[clang::coro_await_elidable]]
coroutinesis not good.
For example,
Assume Task is marked with
[[clang::coro_await_elidable]]
, now we willalways elide the call to
bar()
into the frame offoo()
. But this may bea regression instead of an optimization if
b
is always false.This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.
This patch was originally written by ChuanqiXu9 (#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch, integrated
it with the optimization remarks, and did a couple of smaller cosmetic
changes (e.g., made the test case expectations more targetted, etc.)
Co-Authored-by: Chuanqi Xu yedeng.yd@linux.alibaba.com