Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coroutine] Implement llvm.coro.await.suspend intrinsic #79712

Merged
merged 19 commits into from
Mar 11, 2024

Conversation

fpasserby
Copy link
Contributor

@fpasserby fpasserby commented Jan 27, 2024

Implement llvm.coro.await.suspend intrinsics, to deal with performance regression after prohibiting .await_suspend inlining, as suggested in #64945.
Actually, there are three new intrinsics, which directly correspond to each of three forms of await_suspend:

void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)

There are three different versions instead of one, because in bool case it's result is used for resuming via a branch, and in coroutine_handle case exceptions from await_suspend are handled in the coroutine, and exceptions from the subsequent .resume() are propagated to the caller.

Await-suspend block is simplified down to intrinsic calls only, for example for symmetric transfer:

%id = call token @llvm.coro.save(ptr null)
%handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
call void @llvm.coro.resume(%handle)
%result = call i8 @llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...

All await-suspend logic is moved out into a wrapper function, generated for each suspension point.
The signature of the function is <type> wrapperFunction(ptr %awaiter, ptr %frame) where <type> is one of void i1 or ptr, depending on the return type of await_suspend.
Intrinsic calls are lowered during CoroSplit pass, right after the split.

Because I'm new to LLVM, I'm not sure if the helper function generation, calls to them and lowering are implemented in the right way, especially with regard to various metadata and attributes, i. e. for TBAA. All things that seemed questionable are marked with FIXME comments.

There is another detail: in case of symmetric transfer raw pointer to the frame of coroutine, that should be resumed, is returned from the helper function and a direct call to @llvm.coro.resume is generated. C++ standard demands, that .resume() method is evaluated. Not sure how important is this, because code has been generated in the same way before, sans helper function.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen coroutines C++20 coroutines llvm:ir llvm:transforms labels Jan 27, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 27, 2024

@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-coroutines

@llvm/pr-subscribers-clang

Author: None (fpasserby)

Changes

Implement llvm.coro.await.suspend intrinsics, to deal with performance regression after prohibiting .await_suspend inlining, as suggested in #64945.
Actually, there are three new intrinsics, which directly correspond to each of three forms of await_suspend:

void llvm.coro.await.suspend(ptr %awaiter, ptr %frame, ptr @<!-- -->helperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @<!-- -->helperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @<!-- -->helperFunction)

There are three different versions instead of one, because in bool case it's result is used for resuming via a branch, and in coroutine_handle case exceptions from await_suspend are handled in the coroutine, and exceptions from the subsequent .resume() are propagated to the caller.

Await-suspend block is simplified down to intrinsic calls only, for example for the symmetric transfer case:

%id = call token @<!-- -->llvm.coro.save(ptr null)
%handle = call ptr @<!-- -->llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @<!-- -->helperFunction)
call void @<!-- -->llvm.coro.resume(%handle)
%result = call i8 @<!-- -->llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...

All the await-suspend logic is moved out into a helper function, generated for each suspension point.
The signature of the function is &lt;type&gt; helperFunction(ptr %awaiter, ptr %frame) where &lt;type&gt; is one of void i1 or ptr, depending on the return type of await_suspend.
The intrinsic calls are lowered and helper functions bodies are inlined during CoroSplit pass, right after the split.

Because I'm new to LLVM, I'm not sure if the helper function generation, calls to them and lowering are implemented in the right way, especially with regard to various metadata and attributes, i. e. for TBAA. All things, that seemed questionable, are marked with FIXME comments.

There is another detail: in case of symmetric transfer raw pointer to the frame of coroutine, that should be resumed, is returned from the helper function and a direct call to @<!-- -->llvm.coro.resume is generated. C++ standard demands, that .resume() method is evaluated. Not sure how important is this, because code has been generated in the same way before, sans helper function.


Patch is 58.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79712.diff

20 Files Affected:

  • (modified) clang/include/clang/AST/ExprCXX.h (+20-8)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+132-11)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4)
  • (modified) clang/lib/Sema/SemaCoroutine.cpp (+84-76)
  • (modified) clang/test/AST/coroutine-locals-cleanup.cpp (+2-8)
  • (modified) clang/test/CodeGenCoroutines/coro-always-inline.cpp (+1-1)
  • (modified) clang/test/CodeGenCoroutines/coro-await.cpp (+62-17)
  • (removed) clang/test/CodeGenCoroutines/coro-awaiter-noinline-suspend.cpp (-168)
  • (modified) clang/test/CodeGenCoroutines/coro-dwarf.cpp (+12)
  • (modified) clang/test/CodeGenCoroutines/coro-function-try-block.cpp (+1-1)
  • (added) clang/test/CodeGenCoroutines/coro-simplify-suspend-point.cpp (+67)
  • (modified) clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp (+2-6)
  • (modified) clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp (+4-8)
  • (modified) clang/test/CodeGenCoroutines/pr59181.cpp (+5-4)
  • (modified) clang/test/SemaCXX/co_await-ast.cpp (+4-3)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+12)
  • (modified) llvm/lib/IR/Verifier.cpp (+3)
  • (modified) llvm/lib/Transforms/Coroutines/CoroInstr.h (+30)
  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+142-25)
  • (modified) llvm/lib/Transforms/Coroutines/Coroutines.cpp (+3)
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index a0e467b35778c5c..57a505036def409 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -5035,15 +5035,19 @@ class CoroutineSuspendExpr : public Expr {
   enum SubExpr { Operand, Common, Ready, Suspend, Resume, Count };
 
   Stmt *SubExprs[SubExpr::Count];
+  bool IsSuspendNoThrow = false;
   OpaqueValueExpr *OpaqueValue = nullptr;
+  OpaqueValueExpr *OpaqueFramePtr = nullptr;
 
 public:
   CoroutineSuspendExpr(StmtClass SC, SourceLocation KeywordLoc, Expr *Operand,
                        Expr *Common, Expr *Ready, Expr *Suspend, Expr *Resume,
-                       OpaqueValueExpr *OpaqueValue)
+                       bool IsSuspendNoThrow, OpaqueValueExpr *OpaqueValue,
+                       OpaqueValueExpr *OpaqueFramePtr)
       : Expr(SC, Resume->getType(), Resume->getValueKind(),
              Resume->getObjectKind()),
-        KeywordLoc(KeywordLoc), OpaqueValue(OpaqueValue) {
+        KeywordLoc(KeywordLoc), IsSuspendNoThrow(IsSuspendNoThrow),
+        OpaqueValue(OpaqueValue), OpaqueFramePtr(OpaqueFramePtr) {
     SubExprs[SubExpr::Operand] = Operand;
     SubExprs[SubExpr::Common] = Common;
     SubExprs[SubExpr::Ready] = Ready;
@@ -5080,6 +5084,9 @@ class CoroutineSuspendExpr : public Expr {
   /// getOpaqueValue - Return the opaque value placeholder.
   OpaqueValueExpr *getOpaqueValue() const { return OpaqueValue; }
 
+  /// getOpaqueFramePtr - Return coroutine frame pointer placeholder.
+  OpaqueValueExpr *getOpaqueFramePtr() const { return OpaqueFramePtr; }
+
   Expr *getReadyExpr() const {
     return static_cast<Expr*>(SubExprs[SubExpr::Ready]);
   }
@@ -5097,6 +5104,8 @@ class CoroutineSuspendExpr : public Expr {
     return static_cast<Expr *>(SubExprs[SubExpr::Operand]);
   }
 
+  bool isSuspendNoThrow() const { return IsSuspendNoThrow; }
+
   SourceLocation getKeywordLoc() const { return KeywordLoc; }
 
   SourceLocation getBeginLoc() const LLVM_READONLY { return KeywordLoc; }
@@ -5125,10 +5134,12 @@ class CoawaitExpr : public CoroutineSuspendExpr {
 
 public:
   CoawaitExpr(SourceLocation CoawaitLoc, Expr *Operand, Expr *Common,
-              Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue, bool IsImplicit = false)
+              Expr *Ready, Expr *Suspend, Expr *Resume, bool IsSuspendNoThrow,
+              OpaqueValueExpr *OpaqueValue, OpaqueValueExpr *OpaqueFramePtr,
+              bool IsImplicit = false)
       : CoroutineSuspendExpr(CoawaitExprClass, CoawaitLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {
+                             Ready, Suspend, Resume, IsSuspendNoThrow,
+                             OpaqueValue, OpaqueFramePtr) {
     CoawaitBits.IsImplicit = IsImplicit;
   }
 
@@ -5206,10 +5217,11 @@ class CoyieldExpr : public CoroutineSuspendExpr {
 
 public:
   CoyieldExpr(SourceLocation CoyieldLoc, Expr *Operand, Expr *Common,
-              Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue)
+              Expr *Ready, Expr *Suspend, Expr *Resume, bool IsSuspendNoThrow,
+              OpaqueValueExpr *OpaqueValue, OpaqueValueExpr *OpaqueFramePtr)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {}
+                             Ready, Suspend, Resume, IsSuspendNoThrow,
+                             OpaqueValue, OpaqueFramePtr) {}
   CoyieldExpr(SourceLocation CoyieldLoc, QualType Ty, Expr *Operand,
               Expr *Common)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Ty, Operand,
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 888d30bfb3e1d6a..1c768df41d43c09 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -212,9 +212,10 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
                                     bool ignoreResult, bool forLValue) {
   auto *E = S.getCommonExpr();
 
-  auto Binder =
+  auto CommonBinder =
       CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
-  auto UnbindOnExit = llvm::make_scope_exit([&] { Binder.unbind(CGF); });
+  auto UnbindCommonOnExit =
+      llvm::make_scope_exit([&] { CommonBinder.unbind(CGF); });
 
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
@@ -232,16 +233,57 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
   auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
-  CGF.CurCoro.InSuspendBlock = true;
-  auto *SuspendRet = CGF.EmitScalarExpr(S.getSuspendExpr());
-  CGF.CurCoro.InSuspendBlock = false;
+  auto SuspendHelper = CodeGenFunction(CGF.CGM).generateAwaitSuspendHelper(
+      CGF.CurFn->getName(), Prefix, S);
 
-  if (SuspendRet != nullptr && SuspendRet->getType()->isIntegerTy(1)) {
-    // Veto suspension if requested by bool returning await_suspend.
-    BasicBlock *RealSuspendBlock =
-        CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
-    CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
-    CGF.EmitBlock(RealSuspendBlock);
+  llvm::CallBase *SuspendRet = nullptr;
+
+  {
+    CGF.CurCoro.InSuspendBlock = true;
+
+    auto FramePtrBinder = CodeGenFunction::OpaqueValueMappingData::bind(
+        CGF, S.getOpaqueFramePtr(), S.getOpaqueFramePtr()->getSourceExpr());
+    auto UnbindFramePtrOnExit =
+        llvm::make_scope_exit([&] { FramePtrBinder.unbind(CGF); });
+
+    SmallVector<llvm::Value *, 3> SuspendHelperCallArgs;
+    SuspendHelperCallArgs.push_back(
+        CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
+    SuspendHelperCallArgs.push_back(
+        CGF.getOrCreateOpaqueRValueMapping(S.getOpaqueFramePtr())
+            .getScalarVal());
+    SuspendHelperCallArgs.push_back(SuspendHelper);
+
+    auto IID = llvm::Intrinsic::coro_await_suspend;
+    if (S.getSuspendExpr()->getType()->isBooleanType())
+      IID = llvm::Intrinsic::coro_await_suspend_bool;
+    else if (S.getSuspendExpr()->getType()->isVoidPointerType())
+      IID = llvm::Intrinsic::coro_await_suspend_handle;
+
+    llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(IID);
+    // FIXME: add call attributes?
+    if (S.isSuspendNoThrow()) {
+      SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
+                                               SuspendHelperCallArgs);
+    } else {
+      SuspendRet =
+          CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendHelperCallArgs);
+    }
+
+    CGF.CurCoro.InSuspendBlock = false;
+  }
+
+  if (SuspendRet != nullptr) {
+    if (SuspendRet->getType()->isIntegerTy(1)) {
+      // Veto suspension if requested by bool returning await_suspend.
+      BasicBlock *RealSuspendBlock =
+          CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
+      CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
+      CGF.EmitBlock(RealSuspendBlock);
+    } else if (SuspendRet->getType()->isPointerTy()) {
+      auto ResumeIntrinsic = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_resume);
+      Builder.CreateCall(ResumeIntrinsic, SuspendRet);
+    }
   }
 
   // Emit the suspend point.
@@ -338,6 +380,85 @@ static QualType getCoroutineSuspendExprReturnType(const ASTContext &Ctx,
 }
 #endif
 
+llvm::Function *
+CodeGenFunction::generateAwaitSuspendHelper(Twine const &CoroName,
+                                            Twine const &SuspendPointName,
+                                            CoroutineSuspendExpr const &S) {
+  std::string FuncName = "__await_suspend_helper_";
+  FuncName += CoroName.str();
+  FuncName += SuspendPointName.str();
+
+  ASTContext &C = getContext();
+
+  FunctionArgList args;
+
+  ImplicitParamDecl AwaiterDecl(C, C.VoidPtrTy, ImplicitParamKind::Other);
+  ImplicitParamDecl FrameDecl(C, C.VoidPtrTy, ImplicitParamKind::Other);
+  QualType ReturnTy = S.getSuspendExpr()->getType();
+
+  if (ReturnTy->isBooleanType()) {
+    ReturnTy = C.BoolTy;
+  } else if (ReturnTy->isVoidPointerType()) {
+    ReturnTy = C.VoidPtrTy;
+  } else {
+    ReturnTy = C.VoidTy;
+  }
+
+  args.push_back(&AwaiterDecl);
+  args.push_back(&FrameDecl);
+
+  const CGFunctionInfo &FI =
+      CGM.getTypes().arrangeBuiltinFunctionDeclaration(ReturnTy, args);
+
+  llvm::FunctionType *LTy = CGM.getTypes().GetFunctionType(FI);
+
+  llvm::Function *Fn = llvm::Function::Create(
+      LTy, llvm::GlobalValue::PrivateLinkage, FuncName, &CGM.getModule());
+
+  Fn->addParamAttr(0, llvm::Attribute::AttrKind::NonNull);
+  Fn->addParamAttr(0, llvm::Attribute::AttrKind::NoUndef);
+
+  Fn->addParamAttr(1, llvm::Attribute::AttrKind::NoUndef);
+
+  Fn->setMustProgress();
+  Fn->addFnAttr(llvm::Attribute::AttrKind::AlwaysInline);
+
+  if (S.isSuspendNoThrow()) {
+    Fn->addFnAttr(llvm::Attribute::AttrKind::NoUnwind);
+  }
+
+  StartFunction(GlobalDecl(), ReturnTy, Fn, FI, args);
+
+  llvm::Value *AwaiterPtr = Builder.CreateLoad(GetAddrOfLocalVar(&AwaiterDecl));
+  auto AwaiterLValue =
+      MakeNaturalAlignAddrLValue(AwaiterPtr, AwaiterDecl.getType());
+
+  // FIXME: mark as aliasing with awaiter?
+  // FIXME: TBAA?
+  // FIXME: emit in a better way (maybe egenerate AST in SemaCoroutine)?
+  auto FramePtrRValue =
+      RValue::get(Builder.CreateLoad(GetAddrOfLocalVar(&FrameDecl)));
+
+  auto AwaiterBinder = CodeGenFunction::OpaqueValueMappingData::bind(
+      *this, S.getOpaqueValue(), AwaiterLValue);
+  auto FramePtrBinder = CodeGenFunction::OpaqueValueMappingData::bind(
+      *this, S.getOpaqueFramePtr(), FramePtrRValue);
+
+  auto *SuspendRet = EmitScalarExpr(S.getSuspendExpr());
+
+  auto UnbindCommonOnExit =
+      llvm::make_scope_exit([&] { AwaiterBinder.unbind(*this); });
+  auto UnbindFramePtrOnExit =
+      llvm::make_scope_exit([&] { FramePtrBinder.unbind(*this); });
+  if (SuspendRet != nullptr) {
+    Fn->addRetAttr(llvm::Attribute::AttrKind::NoUndef);
+    Builder.CreateStore(SuspendRet, ReturnValue);
+  }
+
+  FinishFunction();
+  return Fn;
+}
+
 LValue
 CodeGenFunction::EmitCoawaitLValue(const CoawaitExpr *E) {
   assert(getCoroutineSuspendExprReturnType(getContext(), E)->isReferenceType() &&
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 143ad64e8816b12..811dbd9aed44e46 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -350,6 +350,10 @@ class CodeGenFunction : public CodeGenTypeCache {
     return isCoroutine() && CurCoro.InSuspendBlock;
   }
 
+  llvm::Function *generateAwaitSuspendHelper(Twine const &CoroName,
+                                             Twine const &SuspendPointName,
+                                             CoroutineSuspendExpr const &S);
+
   /// CurGD - The GlobalDecl for the current function being compiled.
   GlobalDecl CurGD;
 
diff --git a/clang/lib/Sema/SemaCoroutine.cpp b/clang/lib/Sema/SemaCoroutine.cpp
index a969b9383563b22..a46744205d0f298 100644
--- a/clang/lib/Sema/SemaCoroutine.cpp
+++ b/clang/lib/Sema/SemaCoroutine.cpp
@@ -31,6 +31,8 @@
 using namespace clang;
 using namespace sema;
 
+static bool isNoThrow(Sema &S, const Stmt *E);
+
 static LookupResult lookupMember(Sema &S, const char *Name, CXXRecordDecl *RD,
                                  SourceLocation Loc, bool &Res) {
   DeclarationName DN = S.PP.getIdentifierInfo(Name);
@@ -266,7 +268,7 @@ static ExprResult buildOperatorCoawaitCall(Sema &SemaRef, Scope *S,
 }
 
 static ExprResult buildCoroutineHandle(Sema &S, QualType PromiseType,
-                                       SourceLocation Loc) {
+                                       Expr *FramePtr, SourceLocation Loc) {
   QualType CoroHandleType = lookupCoroutineHandleType(S, PromiseType, Loc);
   if (CoroHandleType.isNull())
     return ExprError();
@@ -280,9 +282,6 @@ static ExprResult buildCoroutineHandle(Sema &S, QualType PromiseType,
     return ExprError();
   }
 
-  Expr *FramePtr =
-      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_frame, {});
-
   CXXScopeSpec SS;
   ExprResult FromAddr =
       S.BuildDeclarationNameExpr(SS, Found, /*NeedsADL=*/false);
@@ -296,6 +295,8 @@ struct ReadySuspendResumeResult {
   enum AwaitCallType { ACT_Ready, ACT_Suspend, ACT_Resume };
   Expr *Results[3];
   OpaqueValueExpr *OpaqueValue;
+  OpaqueValueExpr *OpaqueFramePtr;
+  bool IsSuspendNoThrow;
   bool IsInvalid;
 };
 
@@ -380,66 +381,7 @@ static Expr *maybeTailCall(Sema &S, QualType RetType, Expr *E,
   // __builtin_coro_resume so that the cleanup code are not inserted in-between
   // the resume call and return instruction, which would interfere with the
   // musttail call contract.
-  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
-  return S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_resume,
-                                JustAddress);
-}
-
-/// The await_suspend call performed by co_await is essentially asynchronous
-/// to the execution of the coroutine. Inlining it normally into an unsplit
-/// coroutine can cause miscompilation because the coroutine CFG misrepresents
-/// the true control flow of the program: things that happen in the
-/// await_suspend are not guaranteed to happen prior to the resumption of the
-/// coroutine, and things that happen after the resumption of the coroutine
-/// (including its exit and the potential deallocation of the coroutine frame)
-/// are not guaranteed to happen only after the end of await_suspend.
-///
-/// See https://github.com/llvm/llvm-project/issues/56301 and
-/// https://reviews.llvm.org/D157070 for the example and the full discussion.
-///
-/// The short-term solution to this problem is to mark the call as uninlinable.
-/// But we don't want to do this if the call is known to be trivial, which is
-/// very common.
-///
-/// The long-term solution may introduce patterns like:
-///
-///  call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
-///                                ptr @awaitSuspendFn)
-///
-/// Then it is much easier to perform the safety analysis in the middle end.
-/// If it is safe to inline the call to awaitSuspend, we can replace it in the
-/// CoroEarly pass. Otherwise we could replace it in the CoroSplit pass.
-static void tryMarkAwaitSuspendNoInline(Sema &S, OpaqueValueExpr *Awaiter,
-                                        CallExpr *AwaitSuspend) {
-  // The method here to extract the awaiter decl is not precise.
-  // This is intentional. Since it is hard to perform the analysis in the
-  // frontend due to the complexity of C++'s type systems.
-  // And we prefer to perform such analysis in the middle end since it is
-  // easier to implement and more powerful.
-  CXXRecordDecl *AwaiterDecl =
-      Awaiter->getType().getNonReferenceType()->getAsCXXRecordDecl();
-
-  if (AwaiterDecl && AwaiterDecl->field_empty())
-    return;
-
-  FunctionDecl *FD = AwaitSuspend->getDirectCallee();
-
-  assert(FD);
-
-  // If the `await_suspend()` function is marked as `always_inline` explicitly,
-  // we should give the user the right to control the codegen.
-  if (FD->hasAttr<NoInlineAttr>() || FD->hasAttr<AlwaysInlineAttr>())
-    return;
-
-  // This is problematic if the user calls the await_suspend standalone. But on
-  // the on hand, it is not incorrect semantically since inlining is not part
-  // of the standard. On the other hand, it is relatively rare to call
-  // the await_suspend function standalone.
-  //
-  // And given we've already had the long-term plan, the current workaround
-  // looks relatively tolerant.
-  FD->addAttr(
-      NoInlineAttr::CreateImplicit(S.getASTContext(), FD->getLocation()));
+  return S.MaybeCreateExprWithCleanups(JustAddress);
 }
 
 /// Build calls to await_ready, await_suspend, and await_resume for a co_await
@@ -461,7 +403,11 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
 
   // Assume valid until we see otherwise.
   // Further operations are responsible for setting IsInalid to true.
-  ReadySuspendResumeResult Calls = {{}, Operand, /*IsInvalid=*/false};
+  ReadySuspendResumeResult Calls = {{},
+                                    Operand,
+                                    /*OpaqueFramePtr=*/nullptr,
+                                    /*IsSuspendNoThrow=*/false,
+                                    /*IsInvalid=*/false};
 
   using ACT = ReadySuspendResumeResult::AwaitCallType;
 
@@ -495,8 +441,17 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
       Calls.Results[ACT::ACT_Ready] = S.MaybeCreateExprWithCleanups(Conv.get());
   }
 
+  Expr *GetFramePtr =
+      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_frame, {});
+
+  OpaqueValueExpr *FramePtr = new (S.Context)
+      OpaqueValueExpr(Loc, GetFramePtr->getType(), VK_PRValue,
+                      GetFramePtr->getObjectKind(), GetFramePtr);
+
+  Calls.OpaqueFramePtr = FramePtr;
+
   ExprResult CoroHandleRes =
-      buildCoroutineHandle(S, CoroPromise->getType(), Loc);
+      buildCoroutineHandle(S, CoroPromise->getType(), FramePtr, Loc);
   if (CoroHandleRes.isInvalid()) {
     Calls.IsInvalid = true;
     return Calls;
@@ -513,10 +468,6 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
     //     type Z.
     QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
-    // We need to mark await_suspend as noinline temporarily. See the comment
-    // of tryMarkAwaitSuspendNoInline for details.
-    tryMarkAwaitSuspendNoInline(S, Operand, AwaitSuspend);
-
     // Support for coroutine_handle returning await_suspend.
     if (Expr *TailCallSuspend =
             maybeTailCall(S, RetType, AwaitSuspend, Loc))
@@ -542,6 +493,10 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
     }
   }
 
+  if (Calls.Results[ACT::ACT_Suspend]) {
+    Calls.IsSuspendNoThrow = isNoThrow(S, Calls.Results[ACT::ACT_Suspend]);
+  }
+
   BuildSubExpr(ACT::ACT_Resume, "await_resume", std::nullopt);
 
   // Make sure the awaiter object gets a chance to be cleaned up.
@@ -694,6 +649,59 @@ static FunctionScopeInfo *checkCoroutineContext(Sema &S, SourceLocation Loc,
   return ScopeInfo;
 }
 
+/// Recursively check \p E and all its children to see if any call target
+/// (including constructor call) is declared noexcept. Also any value returned
+/// from the call has a noexcept destructor.
+static bool isNoThrow(Sema &S, const Stmt *E) {
+  auto isDeclNoexcept = [&](const Decl *D, bool IsDtor = false) -> bool {
+    // In the case of dtor, the call to dtor is implicit and hence we should
+    // pass nullptr to canCalleeThrow.
+    if (Sema::canCalleeThrow(S, IsDtor ? nullptr : cast<Expr>(E), D)) {
+      return false;
+    }
+    return true;
+  };
+
+  if (auto *CE = dyn_cast<CXXConstructExpr>(E)) {
+    CXXConstructorDecl *Ctor = CE->getConstructor();
+    if (!isDeclNoexcept(Ctor)) {
+      return false;
+    }
+    // Check the corresponding destructor of the constructor.
+    if (!isDeclNoexcept(Ctor->getParent()->getDestructor(), /*IsDtor=*/true)) {
+      return false;
+    }
+  } else if (auto *CE = dyn_cast<CallExpr>(E)) {
+    if (CE->isTypeDependent())
+      return false;
+
+    if (!isDeclNoexcept(CE->getCalleeDecl())) {
+      return false;
+    }
+
+    QualType ReturnType = CE->getCallReturnType(S.getASTContext());
+    // Check the destructor of the call return type, if any.
+    if (ReturnType.isDestructedType() ==
+        QualType::DestructionKind::DK_cxx_destructor) {
+      const auto *T =
+          cast<RecordType>(ReturnType.getCanonicalType().getTypePtr());
+      if (!isDeclNoexcept(cast<CXXRecordDecl>(T->getDecl())->getDestructor(),
+                          /*IsDtor=*/true)) {
+        return false;
+      }
+    }
+  }
+  for (const auto *Child : E->children()) {
+    if (!Child)
+      continue;
+    if (!isNoThrow(S, Child)) {
+      return false;
+    }
+  }
+
+  return true;
+}
+
 /// Recursively check \p E and all its children to see if any call target
 /// (including constructor call) is declared noexcept. Also any value returned
 /// from the call has a noexcept destructor.
@@ -992,9 +1000,9 @@ ExprResult Sema::BuildR...
[truncated]

Copy link

github-actions bot commented Jan 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@fpasserby fpasserby changed the title Implement llvm.coro.await.suspend intrinsic [coroutine] Implement llvm.coro.await.suspend intrinsic Jan 27, 2024
@ChuanqiXu9 ChuanqiXu9 self-requested a review January 29, 2024 01:58
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. I haven't looked it in details. Given this is complex, it should take a relative longer time.

Here is some quick feedbacks:

  • Every time we change the non-static data members of AST nodes, we need to update the serializations. Otherwise, modules won't work.
  • Every time we change the LLVM intrinsics, we need to change the documentation.
  • Every time we change the LLVM with a functional change, we need a test in the LLVM part.

And there is something I haven't fully understand:

  • I feel odd about helperFunction. I don't understand the necessity and I feel we should make it in the middle end.
  • I feel odd about IsSuspendNoThrow. Can't we make it simply during the CodeGen time by the attribute of await_suspend?.
  • I am wondering if we can get rid of the introduction of OpaqueFramePtr from CoroutineSuspendExpr either in CodeGen part or in Sema part.

But I am not sure if my feelings are correct. I need more time looking into it.

@llvmbot llvmbot added the clang:modules C++20 modules and Clang Header Modules label Jan 29, 2024
@fpasserby
Copy link
Contributor Author

Thank you for replying so quickly.
Your suggestion about IsSuspendNoThrow makes sense, it's totally possible to perform the check during codegen.
About helperFunction - this is how I understood the comment in SemaCoroutine.cpp near the place where std::coroutine_handle::address() is marked as always inline. As far as I can see, it makes perfect sense to hide all the temporary objects till coroutine frame is formed, given the history of optimizations and sanitizers messing it up. For example, current implementation seems to work correctly with the reproducer from #72006.
For other suggestions - I hope, I've fixed up AST serialization and added some documentation. Will add LLVM transformation tests a bit later.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the reason why I felt __await_suspend_helper_ was odd. In my imagination, we only need to emit the function await_suspend(handle) to an LLVM function and pass that to llvm.coro.await.suspend directly.

But the conversion to the actual code may need some additional code. So the idea to wrap them into a helper function makes sense. I feel the name wrapper sounds better to me. Helper is too broad.

clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
llvm/docs/Coroutines.rst Outdated Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenCoroutines/pr59181.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
Comment on lines 271 to 277
// FIXME: add call attributes?
if (AwaitSuspendCanThrow)
SuspendRet =
CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendHelperCallArgs);
else
SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
SuspendHelperCallArgs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CGF.EmitCallOrInvoke should be sufficient. We don't need to perform such optimizations here.

@ChuanqiXu9
Copy link
Member

BTW, in my original comment, I said:

Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase.

But this patch only implements the otherwise part. This is not a blocking issue. But we need a TODO or a FIXME here.

@fpasserby fpasserby force-pushed the await-suspend-intrinisics branch 2 times, most recently from 1a824f2 to f651c50 Compare February 10, 2024 14:24
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks much better now.

Given the CoroAwaitSuspendInst is lowered before splitting coroutines, I think we don't need to handle it specially in CoroSplit any more.

clang/include/clang/AST/ExprCXX.h Outdated Show resolved Hide resolved
clang/include/clang/AST/ExprCXX.h Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
Comment on lines +422 to +419
FuncName += CoroName.str();
FuncName += '_';
FuncName += SuspendPointName.str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is better to concat the name of the awaiter instead of the coroutines and suspend number.

clang/lib/Sema/SemaCoroutine.cpp Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks good to me except few nit comments.

Have you tested on a real world workloads?

llvm/lib/Transforms/Coroutines/CoroSplit.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGCoroutine.cpp Outdated Show resolved Hide resolved
@fpasserby fpasserby force-pushed the await-suspend-intrinisics branch 2 times, most recently from e208bf4 to 164998d Compare March 10, 2024 12:22
@fpasserby
Copy link
Contributor Author

Thanks. This looks good to me except few nit comments.

Have you tested on a real world workloads?

I've tested it on seastar and libcppcoro tests and on the example from #72006.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then. Thanks.

@ChuanqiXu9
Copy link
Member

I'll merge this since I find you may not have commit access.

@ChuanqiXu9 ChuanqiXu9 merged commit f786881 into llvm:main Mar 11, 2024
5 checks passed
// FIXME: add TBAA metadata to the loads
llvm::Value *AwaiterPtr = Builder.CreateLoad(GetAddrOfLocalVar(&AwaiterDecl));
auto AwaiterLValue =
MakeNaturalAlignAddrLValue(AwaiterPtr, AwaiterDecl.getType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the type passed to MakeNaturalAlignAddrLValue is correct. AwaiterDecl.getType() is void *, but shouldn't S.getOpaqueValue() be passed?

I think this was causing a libc++ test to crash when ubsan was enabled (see #86898).

S.getOpaqueValue() and AwaiterLValue are passed to OpaqueValueMappingData::bind. clang mis-compiles when the lvalue has a larger alignment than the type of S.getOpaqueValue().

https://github.com/llvm/llvm-project/pull/79712/files#diff-da3676b91275038b801aabe1e10b6fdedfbe50b679eac152080c2a26fbd8ec7dR458

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. it looks like it would be better to pass the type of S.getOpaqueValue(). The void* is simply a placeholder.

Would you like to handle this? I feel you are the better option since you can test it with your own patch. Otherwise I'll try to make it myself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a patch: #87134

ahatanak added a commit to ahatanak/llvm-project that referenced this pull request Mar 30, 2024
…pendWrapper

Use the correct type for AwaiterDecl instead of using `void *`.

See the discussion here:
llvm#79712 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants