[Coroutines] Drop dead instructions more aggressively in addMustTailToCoroResumes() #85271

zmodem · 2024-03-14T17:22:43Z

The old code used isInstructionTriviallyDead() when walking the path from a resume call to function return to check if the call is in tail position.

However, since the code was walking forwards it was not able to get past instructions such as:

%gep = getelementptr inbounds i64, ptr %alloc.var, i32 0
%foo = ptrtoint ptr %gep to i64

This patch calls SimplifyInstructionsInBlock() before walking a block to remove any dead instructions in a more rigorous fashion.

…oCoroResumes() The old code used isInstructionTriviallyDead() when walking the path from a resume call to function return to check if the call is in tail position. However, since the code was walking forwards it was not able to get past instructions such as: %gep = getelementptr inbounds i64, ptr %alloc.var, i32 0 %foo = ptrtoint ptr %gep to i64 This patch calls SimplifyInstructionsInBlock() before walking a block to remove any dead instructions in a more rigorous fashion.

llvmbot · 2024-03-14T17:23:15Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-coroutines

@llvm/pr-subscribers-llvm-transforms

Author: Hans (zmodem)

Changes

The old code used isInstructionTriviallyDead() when walking the path from a resume call to function return to check if the call is in tail position.

However, since the code was walking forwards it was not able to get past instructions such as:

%gep = getelementptr inbounds i64, ptr %alloc.var, i32 0
%foo = ptrtoint ptr %gep to i64

This patch calls SimplifyInstructionsInBlock() before walking a block to remove any dead instructions in a more rigorous fashion.

Full diff: https://github.com/llvm/llvm-project/pull/85271.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+2-4)
(modified) llvm/test/Transforms/Coroutines/coro-split-musttail7.ll (+6-1)

diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 0fce596d3e2ca0..5d3106f157edcd 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -1203,10 +1203,6 @@ static bool simplifyTerminatorLeadingToRet(Instruction *InitialInst) {
       if (isa<BitCastInst>(I) || I->isDebugOrPseudoInst() ||
           I->isLifetimeStartOrEnd())
         I = I->getNextNode();
-      else if (isInstructionTriviallyDead(I))
-        // Duing we are in the middle of the transformation, we need to erase
-        // the dead instruction manually.
-        I = &*I->eraseFromParent();
       else
         break;
     }
@@ -1245,6 +1241,7 @@ static bool simplifyTerminatorLeadingToRet(Instruction *InitialInst) {
       }
 
       BasicBlock *Succ = BR->getSuccessor(SuccIndex);
+      SimplifyInstructionsInBlock(Succ);
       scanPHIsAndUpdateValueMap(I, Succ, ResolvedValues);
       I = GetFirstValidInstruction(Succ->getFirstNonPHIOrDbgOrLifetime());
 
@@ -1288,6 +1285,7 @@ static bool simplifyTerminatorLeadingToRet(Instruction *InitialInst) {
         return false;
 
       BasicBlock *BB = SI->findCaseValue(Cond)->getCaseSuccessor();
+      SimplifyInstructionsInBlock(BB);
       scanPHIsAndUpdateValueMap(I, BB, ResolvedValues);
       I = GetFirstValidInstruction(BB->getFirstNonPHIOrDbgOrLifetime());
       continue;
diff --git a/llvm/test/Transforms/Coroutines/coro-split-musttail7.ll b/llvm/test/Transforms/Coroutines/coro-split-musttail7.ll
index 2257d5aee473ee..241de5cfb25a7d 100644
--- a/llvm/test/Transforms/Coroutines/coro-split-musttail7.ll
+++ b/llvm/test/Transforms/Coroutines/coro-split-musttail7.ll
@@ -1,6 +1,6 @@
 ; Tests that sinked lifetime markers wouldn't provent optimization
 ; to convert a resuming call to a musttail call.
-; The difference between this and coro-split-musttail5.ll and coro-split-musttail5.ll
+; The difference between this and coro-split-musttail5.ll and coro-split-musttail6.ll
 ; is that this contains dead instruction generated during the transformation,
 ; which makes the optimization harder.
 ; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S | FileCheck %s
@@ -27,6 +27,11 @@ await.suspend:
   %save2 = call token @llvm.coro.save(ptr null)
   call fastcc void @fakeresume1(ptr align 8 null)
   %suspend2 = call i8 @llvm.coro.suspend(token %save2, i1 false)
+
+  ; These (non-trivially) dead instructions are in the way.
+  %gep = getelementptr inbounds i64, ptr %alloc.var, i32 0
+  %foo = ptrtoint ptr %gep to i64
+
   switch i8 %suspend2, label %exit [
     i8 0, label %await.ready
     i8 1, label %exit

aeubanks · 2024-03-14T17:26:57Z

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

-      else if (isInstructionTriviallyDead(I))
-        // Duing we are in the middle of the transformation, we need to erase
-        // the dead instruction manually.
-        I = &*I->eraseFromParent();


can we use RecursivelyDeleteTriviallyDeadInstructions() here instead? simplify seems overkill.

also do you know why we would end up with dead instructions at this point? we should have already simplified the function when we get to corosplit

can we use RecursivelyDeleteTriviallyDeadInstructions() here instead? simplify seems overkill.

In my test case the first instruction isn't trivially dead, so I don't think that would work.

do you know why we would end up with dead instructions at this point? we should have already simplified the function when we get to corosplit

I think such instructions can emerge during corosplit. Maybe @ChuanqiXu9 can answer that one.

can we use RecursivelyDeleteTriviallyDeadInstructions() here instead? simplify seems overkill.

In my test case the first instruction isn't trivially dead, so I don't think that would work.

if we call it on the second instruction, it'll delete the first one as well

do you know why we would end up with dead instructions at this point? we should have already simplified the function when we get to corosplit

I think such instructions can emerge during corosplit. Maybe @ChuanqiXu9 can answer that one.

understanding this a bit better would be good to see if we're doing extra unnecessary work, or if those instructions should be cleaned up earlier in the pass

Yeah, since these instructions become dead during CoroSplit pass. The internals are

%save2 = call token @llvm.coro.save(ptr null) call fastcc void @fakeresume1(ptr align 8 null) %suspend2 = call i8 @llvm.coro.suspend(token %save2, i1 false) ; These (non-trivially) dead instructions are in the way. %gep = getelementptr inbounds i64, ptr %alloc.var, i32 0 %foo = ptrtoint ptr %gep to i64 switch i8 %suspend2, label %exit [ i8 0, label %await.ready i8 1, label %exit ] await.ready: call void @consume(ptr %alloc.var) call void @llvm.lifetime.end.p0(i64 1, ptr %alloc.var) br label %exit exit: call i1 @llvm.coro.end(ptr null, i1 false, token none) ret void }

During the splitting, the value %suspend2 will become a constant in different splitted functions. So it makes sense if the previous passes can't catch such things.

understanding this a bit better would be good to see if we're doing extra unnecessary work, or if those instructions should be cleaned up earlier in the pass

+1 I'd like to understand this better too :)

In the original code I was debugging, the presplit coroutine has a return value:

169: ; preds = %166, %163, %135, %68, %48 %170 = getelementptr inbounds i8, ptr %17, i64 -16 %171 = ptrtoint ptr %170 to i64 %172 = call i1 @llvm.coro.end(ptr null, i1 false) #19 ret i64 %171 }

corosplit clones the function into various versions, and the modifications of those clones are what causes dead code, makes some values constant, etc.

In my case, CoroCloner::create() will make the original return unreachable:

switch (Shape.ABI) { // In these ABIs, the cloned functions always return 'void', and the // existing return sites are meaningless. Note that for unique // continuations, this includes the returns associated with suspends; // this is fine because we can't suspend twice. case coro::ABI::Switch: case coro::ABI::RetconOnce: // Remove old returns. for (ReturnInst *Return : Returns) changeToUnreachable(Return); break;

and replaceCoroEnds() will replace the @llvm.coro.end by ret void.

So perhaps CoroCloner is the one who should clean up these instructions.

On the other hand, it seems corosplit relies on later passes to clean up this stuff, which is maybe fine (especially since it also runs at -O0), except that it's a problem for addMustTailToCoroResumes().

simplify seems overkill

Yes, maybe it's too big a hammer. On the other hand, the ad-hoc dce and constant folding in simplifyTerminatorLeadingToRet() makes me a little nervous.

Instead of deleting those dead instructions, maybe we could just look past them. What the code is really trying to do is follow the control flow to a return instruction. I think it should be able to skip any side-effect free instructions on the way as long as the values aren't needed for a conditional branch or such. I'll push a version which does that. Let me know what you think.

So perhaps CoroCloner is the one who should clean up these instructions.

Generally this is not something we'd like to do for the middle end passes. However, this is (was) a defect in the clang/llvm coroutines implementations. Since the symmetric transfer is semantics about C++ and according to our design for modules, we have to make it in the middle end. So this is the why we have this... But probably we can never get rid of this if we (Clang/LLVM) don't want to be conforming to C++ anymore.

Yes, maybe it's too big a hammer. On the other hand, the ad-hoc dce and constant folding in simplifyTerminatorLeadingToRet() makes me a little nervous.

So my line is, if the problem comes from a particular C++ programs and the corresponding LLVM passes applied, yes we need to solve it. However, if it comes from a manual constructed IR only, then possibly we shouldn't do that.

ChuanqiXu9

LGTM. Thanks.

zmodem · 2024-03-18T12:42:15Z

Any thoughts on the new commit? @ChuanqiXu9 @aeubanks

ChuanqiXu9

I like the previous one version. And if the problem comes from a C++ program (I believe it may be the case), we can add the (reduced) C++ program as a test. If it is not the case, I guess we shouldn't do this OR we need to revisit the motivation or the issue or the use cases of switching IR.

ChuanqiXu9 · 2024-03-18T13:16:52Z

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

      continue;
    }

-    return false;
+    break;


IIUC, we shouldn't do this. This is the pattern we generally do for optimizations. But here, it is not an pure optimizations. We are implementing a C++ semantics. (See my above comments for some backgrounds.) So it is better to break early if we have something we are not able to handle.

Oh, I took another look on this to make sure that I understood this patch correctly. And it looks like my previous comments were confusing and maybe unrelated. So ignore the above comments if it is confusing.

And the comment about wanting a new test from C++ still works.

Since the dirty properties of the corosplit pass mentioned above, it is hard to tell if this patch is correct or not easily. It depends on how the code will be generated from C++ frontend and other passes. So conservatively, this change doesn't cover the original codes since the trivally dead instructions may have side effects. So the conservative change is to make this patch covers previous behavior.

Otherwise, if we want to proceed aggressively, we may need to understand the original motivation issue and the corresponding C++ codes and invest if this is regression or new code patterns we never thought before. But as I said, it may be annoying since the root cause is dirty.

I can accept either the conservative or the aggressive method to proceed.

Thanks for the update. I've added a C++ reproducer.

If an instruction has side effects, it will not be considered trivially dead, so I think my patch covers the original cases, and more.

If an instruction has side effects, it will not be considered trivially dead, so I think my patch covers the original cases, and more.

It is true abstractly. But for LLVM, at least from the comment, it says:

/// Return true if the result produced by the instruction is not used, and the /// instruction will return. Certain side-effecting instructions are also /// considered dead if there are no uses of the instruction. bool isInstructionTriviallyDead(Instruction *I, const TargetLibraryInfo *TLI = nullptr);

Okay, I'll change it to call wouldInstructionBeTriviallyDead() instead.

zmodem · 2024-03-19T15:53:25Z

My test case does come from a C++ program. Here is a reasonably small reproducer:

$ cat /tmp/b.cc
#include <coroutine>
struct Task {
  struct promise_type {
    Task get_return_object() { return Task(std::coroutine_handle<promise_type>::from_promise(*this)); }
    std::suspend_always initial_suspend() { return {}; }
    struct FinalAwaiter {
      bool await_ready() noexcept { return false; }
      std::coroutine_handle<> await_suspend(std::coroutine_handle<promise_type> handle) {
        return handle.promise().continuation_;
      }
      void await_resume() noexcept {}
    };
    FinalAwaiter final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() { }

    std::coroutine_handle<> continuation_;
  };

  explicit Task(std::coroutine_handle<promise_type> handle) : handle_(handle) {}

  std::coroutine_handle<promise_type> handle_;
  int x = 42;
};

void g();
extern "C" Task foo() {
  g();
  co_return;
}

$ build/bin/clang++ -c -fno-exceptions /tmp/b.cc -std=c++20 -O1 -mllvm -print-after=coro-split -mllvm -filter-print-funcs=foo.resume -o /dev/null -w
; *** IR Dump After CoroSplitPass on (foo.resume) ***
; Function Attrs: mustprogress nounwind uwtable
define internal fastcc void @foo.resume(ptr nocapture noundef nonnull align 8 dereferenceable(32) %0) #0 {
entry.resume:
  %__promise.reload.addr = getelementptr inbounds i8, ptr %0, i64 16
  call void @_Z1gv() #2
  store ptr null, ptr %0, align 8
  %retval.sroa.0.0.copyload.i.i = load ptr, ptr %__promise.reload.addr, align 8, !tbaa !5
  %1 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.copyload.i.i, i8 0)
  call fastcc void %1(ptr %retval.sroa.0.0.copyload.i.i) #2
  ret void
}

Note the missing musttail.

(In this case, later passes will make that a tail call anyway, at least on X86, but that wasn't the case in the original program which overflowed the stack instead.)

(The trick to the reproducer is that the coroutine has a return value the requires a couple of instructions to put together, which blocks simplifyTerminatorLeadingToRet()'s search for the return. At -O0 those instructions come after the llvm.coro.end call so they don't block the search for return, but at -O1 they come before.)

ChuanqiXu9 · 2024-03-20T05:37:58Z

My test case does come from a C++ program. Here is a reasonably small reproducer:

$ cat /tmp/b.cc
#include <coroutine>
struct Task {
  struct promise_type {
    Task get_return_object() { return Task(std::coroutine_handle<promise_type>::from_promise(*this)); }
    std::suspend_always initial_suspend() { return {}; }
    struct FinalAwaiter {
      bool await_ready() noexcept { return false; }
      std::coroutine_handle<> await_suspend(std::coroutine_handle<promise_type> handle) {
        return handle.promise().continuation_;
      }
      void await_resume() noexcept {}
    };
    FinalAwaiter final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() { }

    std::coroutine_handle<> continuation_;
  };

  explicit Task(std::coroutine_handle<promise_type> handle) : handle_(handle) {}

  std::coroutine_handle<promise_type> handle_;
  int x = 42;
};

void g();
extern "C" Task foo() {
  g();
  co_return;
}

$ build/bin/clang++ -c -fno-exceptions /tmp/b.cc -std=c++20 -O1 -mllvm -print-after=coro-split -mllvm -filter-print-funcs=foo.resume -o /dev/null -w
; *** IR Dump After CoroSplitPass on (foo.resume) ***
; Function Attrs: mustprogress nounwind uwtable
define internal fastcc void @foo.resume(ptr nocapture noundef nonnull align 8 dereferenceable(32) %0) #0 {
entry.resume:
  %__promise.reload.addr = getelementptr inbounds i8, ptr %0, i64 16
  call void @_Z1gv() #2
  store ptr null, ptr %0, align 8
  %retval.sroa.0.0.copyload.i.i = load ptr, ptr %__promise.reload.addr, align 8, !tbaa !5
  %1 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.copyload.i.i, i8 0)
  call fastcc void %1(ptr %retval.sroa.0.0.copyload.i.i) #2
  ret void
}

Note the missing musttail.

(In this case, later passes will make that a tail call anyway, at least on X86, but that wasn't the case in the original program which overflowed the stack instead.)

(The trick to the reproducer is that the coroutine has a return value the requires a couple of instructions to put together, which blocks simplifyTerminatorLeadingToRet()'s search for the return. At -O0 those instructions come after the llvm.coro.end call so they don't block the search for return, but at -O1 they come before.)

Thanks. The explanation looks slightly odd since I believe we have such cases before. I am wondering if this is a regression from f786881 (this may not be important)

And could you put the reduced example into the patch in somewhere of clang/test/CodeGenCoroutines as a regression test?

zmodem · 2024-03-20T09:45:13Z

I am wondering if this is a regression from f786881 (this may not be important)

No, it failed before that commit also.

And could you put the reduced example into the patch in somewhere of clang/test/CodeGenCoroutines as a regression test?

Added coro-symmetric-transfer-04.cpp.

Please take another look.

ChuanqiXu9

LGTM. Thanks.

clang/test/CodeGenCoroutines/coro-symmetric-transfer-04.cpp

…roResumes() (llvm#85271) The old code used isInstructionTriviallyDead() and removed instructions when walking the path from a resume call to function return to check if the call is in tail position. However, since the code was walking forwards it was not able to get past instructions such as: %gep = getelementptr inbounds i64, ptr %alloc.var, i32 0 %foo = ptrtoint ptr %gep to i64 This patch instead ignores such instructions as long as their values are not needed. This enables the code to emit tail calls in more situations.

zmodem requested review from aeubanks and ChuanqiXu9 March 14, 2024 17:22

llvmbot added coroutines C++20 coroutines llvm:transforms labels Mar 14, 2024

aeubanks reviewed Mar 14, 2024

View reviewed changes

ChuanqiXu9 approved these changes Mar 15, 2024

View reviewed changes

skip any side-effect free instructions when looking for the return

e268280

ChuanqiXu9 requested changes Mar 18, 2024

View reviewed changes

zmodem added 3 commits March 20, 2024 10:16

return %foo in the lit test

bbf499c

add clang codegen test

98f917b

use wouldInstructionBeTriviallyDead() instead

9b30f17

llvmbot added the clang Clang issues not falling into any other category label Mar 20, 2024

ChuanqiXu9 approved these changes Mar 20, 2024

View reviewed changes

clang/test/CodeGenCoroutines/coro-symmetric-transfer-04.cpp Show resolved Hide resolved

add link in coro-symmetric-transfer-04.cpp

e1ec1a5

zmodem merged commit 9d1cb18 into llvm:main Mar 20, 2024
3 of 4 checks passed

zmodem deleted the coro_musttail_fixing branch March 20, 2024 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coroutines] Drop dead instructions more aggressively in addMustTailToCoroResumes() #85271

[Coroutines] Drop dead instructions more aggressively in addMustTailToCoroResumes() #85271

zmodem commented Mar 14, 2024

llvmbot commented Mar 14, 2024 •

edited

aeubanks Mar 14, 2024

zmodem Mar 14, 2024

aeubanks Mar 14, 2024

ChuanqiXu9 Mar 15, 2024

zmodem Mar 15, 2024

ChuanqiXu9 Mar 18, 2024

ChuanqiXu9 left a comment

zmodem commented Mar 18, 2024

ChuanqiXu9 left a comment

ChuanqiXu9 Mar 18, 2024

ChuanqiXu9 Mar 18, 2024

zmodem Mar 19, 2024

ChuanqiXu9 Mar 20, 2024

zmodem Mar 20, 2024

zmodem commented Mar 19, 2024

ChuanqiXu9 commented Mar 20, 2024 •

edited

zmodem commented Mar 20, 2024

ChuanqiXu9 left a comment

[Coroutines] Drop dead instructions more aggressively in addMustTailToCoroResumes() #85271

[Coroutines] Drop dead instructions more aggressively in addMustTailToCoroResumes() #85271

Conversation

zmodem commented Mar 14, 2024

llvmbot commented Mar 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuanqiXu9 left a comment

Choose a reason for hiding this comment

zmodem commented Mar 18, 2024

ChuanqiXu9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmodem commented Mar 19, 2024

ChuanqiXu9 commented Mar 20, 2024 • edited

zmodem commented Mar 20, 2024

ChuanqiXu9 left a comment

Choose a reason for hiding this comment

llvmbot commented Mar 14, 2024 •

edited

ChuanqiXu9 commented Mar 20, 2024 •

edited