Skip to content

[Clang] stack frame is way too large in coroutine at low optimization levels #57638

Open
@jacobsa

Description

Our internal build system just shipped opaque pointers, by removing -Xclang=-no-opaque-pointers from our build arguments. When this happened I noticed a regression: at the default optimization level, clang makes coroutine resume function stacks much larger than necessary.

Here is a simple program with a function ArrayOnCoroutineFrame that creates a large local array that must go on the coroutine frame because it may need to survive a suspension:

#include <array>
#include <coroutine>
#include <optional>

struct MyTask{
  struct promise_type {
    MyTask get_return_object() { return {std::coroutine_handle<promise_type>::from_promise(*this)}; }
    std::suspend_always initial_suspend() { return {}; }

    void unhandled_exception();
    void return_void() {} 

    auto await_transform(MyTask task) {
      struct Awaiter {
        bool await_ready() { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) {
          caller.resume_when_done = h;
          return std::coroutine_handle<promise_type>::from_promise(callee);
        }

        void await_resume() {
          std::coroutine_handle<promise_type>::from_promise(callee).destroy();
        }

        promise_type& caller;
        promise_type& callee;
      };

      return Awaiter{*this, task.handle.promise()};
    }
    
    auto final_suspend() noexcept {
      struct Awaiter {
        bool await_ready() noexcept { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) noexcept {
          return to_resume;
        }

        void await_resume() noexcept;

        std::coroutine_handle<promise_type> to_resume;
      };

      return Awaiter{resume_when_done};
    }

    // The coroutine to resume when we're done.
    std::coroutine_handle<promise_type> resume_when_done;
  };


  // A handle for the coroutine that returned this task.
  std::coroutine_handle<promise_type> handle;
};

MyTask DoSomething();

MyTask ArrayOnCoroutineFrame() {
  std::array<std::optional<int>, 10'000> vals;
  for (auto& val : vals) {
    (void)val;
    co_await DoSomething();
  }
}

When compiled with -std=c++20 -Xclang=-no-opaque-pointers, clang correctly observes that ArrayOnCoroutineFrame.resume needs only a small stack size, since the array is on the coroutine frame:

ArrayOnCoroutineFrame() [clone .resume]:      # @ArrayOnCoroutineFrame() [clone .resume]
        push    rbp
        mov     rbp, rsp
        sub     rsp, 304
        mov     qword ptr [rbp - 168], rdi      # 8-byte Spill
        mov     qword ptr [rbp - 8], rdi
[...]

But when you compile with just -std=c++20 it fails to do this, giving it a huge stack frame despite the fact that it does seem to build the array on the coroutine frame:

ArrayOnCoroutineFrame() [clone .resume]:      # @ArrayOnCoroutineFrame() [clone .resume]
        push    rbp
        mov     rbp, rsp
        sub     rsp, 80368
        mov     qword ptr [rbp - 80240], rdi    # 8-byte Spill
        mov     qword ptr [rbp - 8], rdi
        mov     rax, rdi
        add     rax, 80081
        mov     qword ptr [rbp - 80232], rax    # 8-byte Spill
        mov     rax, rdi
        add     rax, 80
        mov     qword ptr [rbp - 80224], rax    # 8-byte Spill
[...]
        mov     rdi, qword ptr [rbp - 80224]    # 8-byte Reload
        call    std::array<std::optional<int>, 10000ul>::array() [base object constructor]
[...]

This is not a bug so much as a missed optimization. I don't know if it's reasonable to expect this optimization to be done at the default optimization level, but it was done before opaque pointers shipped so it's sort of a regression. Is it easy to make it work again?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions