Clang misaligns variables stored in coroutine frames #56671

jacobsa · 2022-07-22T11:56:40Z

Clang seems to fail to correctly align objects in a coroutine frame that need more than 8 bytes of alignment on x86-64. Here is a program that demonstrates this by having a 64-byte-aligned object persist across a co_await (Compiler Explorer):

#include <stdlib.h>

#include <coroutine>
#include <cstdint>
#include <iostream>

struct Promise;

struct Task {
  using promise_type = Promise;
};

// A promise type that never suspends.
struct Promise {
  std::suspend_never initial_suspend() { return {}; }
  void unhandled_exception() {}
  std::suspend_never final_suspend() noexcept { return {}; }
  Task get_return_object() { return Task{}; }
  void return_value(const bool value) {}
};

// An awaitable that is always immediately ready with the value true.
struct Awaitable final {
  // __attribute__((noinline)) apparently prevents the compiler from seeing that
  // co_await can be a no-op.
  bool __attribute__((noinline)) await_ready() { return true; }

  void await_suspend(std::coroutine_handle<> h) {}
  bool await_resume() { return true; }
};

// Crash if p is not aligned to 64 bytes.
void __attribute__((noinline)) CheckAlignment(void* const p) {
  if (reinterpret_cast<intptr_t>(p) % 64 == 0) {
    return;
  }

  std::cerr << "Bad alignment: " << p << "\n";
  exit(1);
}

// A class that needs to be aligned to 64 bytes, and crashes if it isn't.
class HighlyAligned final {
 public:
  explicit HighlyAligned() { CheckAlignment(this); }

 private:
  alignas(64) char storage[16];
};

static_assert(alignof(HighlyAligned) == 64);

// A coroutine that creates an object that needs to be aligned to 64 bytes in
// the coroutine frame, ensuring it exists across a co_await.
Task Coroutine() {
  auto a = HighlyAligned();
  const bool result = co_await Awaitable{};
  static_cast<void>(a);

  co_return result;
}

int main(int argc, char** argv) {
  Coroutine();
}

When built from trunk with -std=c++20, this dies pretty reliably with a Bad alignment message. If you instead change the program so that it creates HighlyAligned objects on the stack in main, it doesn't crash.

(For the record, here is a gtest version that is a little easier to work with but has more dependencies.)

The text was updated successfully, but these errors were encountered:

yuanfang-chen · 2022-07-22T20:36:43Z

wg21.link/p2014r0 ? I didn't track the most recent development though, nor I'm aware of any consensus yet.

Also see https://reviews.llvm.org/D102147

llvmbot · 2022-07-22T20:37:26Z

@llvm/issue-subscribers-c-20

jacobsa · 2022-07-22T21:44:25Z

@yuanfang-chen Ah nice, thank you very much for the context. I'm surprised to find that this was left out of the standard. But also surprised that it needs to be in the standard in order to be fixed: wouldn't it be possible to achieve an alignment of N by allocating size + N - 1 bytes and shifting appropriately? It's unfortunate to over-allocate, but correctly aligning types seems more important than that.

Anyway failing that, it sounds like maybe I'll get what I want if I wait for https://reviews.llvm.org/D102147, set -fcoroutines-aligned-alloc, and define an overload of operator new that accepts the appropriate argument. Won't that defeat the heap allocation elision optimization though?

yuanfang-chen · 2022-07-23T04:57:02Z

But also surprised that it needs to be in the standard in order to be fixed: wouldn't it be possible to achieve an alignment of N by allocating size + N - 1 bytes and shifting appropriately?

It is possible. https://reviews.llvm.org/D97915 (parent patch of https://reviews.llvm.org/D102147) will do that when aligned allocator is not available.

Anyway failing that, it sounds like maybe I'll get what I want if I wait for https://reviews.llvm.org/D102147, set -fcoroutines-aligned-alloc, and define an overload of operator new that accepts the appropriate argument. Won't that defeat the heap allocation elision optimization though?

As mentioned above, define an overload of operator **new`` is optional. I'm not very familiar with HALO but my intuition is that the patch shouldn't prevent it. Could you elaborate on it?

@rjmccall @ChuanqiXu9

jacobsa · 2022-07-23T13:48:53Z

I'm not sure I see why overloading operator new would be optional. You need the version that accepts the alignment argument, right? Or are you saying it would use the free-standing operator new otherwise?

I'm also not very familiar with HALO. My thought was that if there is a user-provided operator new then the compiler would be required to use it, in case it has side effects. I could definitely be wrong about this.

jacobsa · 2022-07-24T21:17:39Z

@yuanfang-chen I just noticed that https://reviews.llvm.org/D97915 hasn't had a comment in over a year. :-( Is there any hope for getting the over-allocation strategy committed any time soon? This is a correctness problem, so it seems worth doing that before the standard is perfect.

ChuanqiXu9 · 2022-07-25T03:46:21Z

Won't that defeat the heap allocation elision optimization though?

This should be irrelevant with HALO.

I prefer the direction of D102147 than D97915 since this is a standard problem instead of an implementation problem. As we could find, GCC has the same problem. I would try to push this forward in 3 months.

For the practical side, I think you could use the workaround of MLIR. MLIR is also a user of LLVM Coroutines. Previously, MLIR met a similar problem, and their solution in C++ may look like: https://godbolt.org/z/eYderhTTW

jacobsa · 2022-07-25T19:00:53Z

Yes, I have a similar workaround. It's not totally correct though, because there is no upper bound on the alignment that might be needed (right?).

I agree it's a problem that the standard doesn't let you define an operator new that takes std::align_val_t. But that's a bug in customization, not correctness. It seems like if the promise doesn't define an operator new, clang could at least give the correct behavior while being compatible with the current standard.

yuanfang-chen · 2022-07-25T21:39:03Z

You need the version that accepts the alignment argument, right? Or are you saying it would use the free-standing operator new otherwise?

Yes. Either the standard library-provided version or a user-defined version. If the user-defined version is not found, it should use the standard library-provided version.

I agree it's a problem that the standard doesn't let you define an operator new that takes std::align_val_t. But that's a bug in customization, not correctness.

It concerns both correctness and customization because the current standard wording prevents the compiler from searching for and using the aligned allocator (either the standard library-provided version or a user-defined version). So it is the standard that caused this bug (the compiler could fix it as an extension though, like D97915).

I prefer the direction of D102147 than D97915 since this is a standard problem instead of an implementation problem. As we could find, GCC has the same problem. I would try to push this forward in 3 months.

Agreed. I'll find some time to rebase the patch.

ChuanqiXu9 · 2022-07-26T02:46:13Z

Yes, I have a similar workaround. It's not totally correct though, because there is no upper bound on the alignment that might be needed (right?).

Strictly, there is no upper bound on the alignment. But from my experiences, '64' works in most cases of X86_64. I saw some use cases (not coroutines) in AArch64 which uses '128' to speedup hardware cache accessing. But all of these are low-level softwares like kernels. And many use cases for coroutines are the higher level applications.
So I feel like the workaround won't be too bad in a short time.

ChuanqiXu9 · 2022-08-08T03:33:15Z

Update: Wg21 is going to discuss P2014(cplusplus/papers#750) in 8/18.

ChuanqiXu9 · 2022-08-19T05:45:26Z

Update: No consensus in the meeting. And we feel good to have some implementation experience and use experience first.

And option1 has some problems:
(1) the ambiguous overloading. Consider the case for coro foo(std::align_val_t).
(2) we need the middle end to do the overload resolution instead of the frontend. This may bring some problems. For example, if the two allocation functions are template functions, we must instantiate both of the template functions. However, it is problematic if one of the templates are not instantiable. (This is common as a SFINAE pattern).
(3) Some other overload resolution problems.
(4) The complexity in wording and implementation.

And option2 is much simpler and option2 wouldn't meet these problems. But option2 will break more existing codes. Also we concern about the ABI problems.

But personally I strongly prefer option2. Since for the coroutine uses, the number of coroutine type definitions are really rare. And although option2 is going to break existing codes, we need to edit rare places. Also the option1 may break existing codes too.

@yuanfang-chen due to D102147 wants to implement option1. So we may implement a new revision. How do you think about this? If you don't have time, I'm Ok to implement option2 seperately.

yuanfang-chen · 2022-08-19T06:27:31Z

@ChuanqiXu9 Thanks for the updates. Glad to see that the discussion on this issue resumed. Option2 looks better to me too. Making it consistent with normal new/delete rules seems too much burden to carry without justifiable benefits. Let's just implement option2 and see what the user experiences are. I won't have the time to implement this though but happy to help review patches.

rjmccall · 2022-08-22T20:56:41Z

I don't think the middle-end strictly needs to do the overload resolution as long as the language says that it's permitted to ODR-use both functions — basically, the language needs to authorize the frontend to emit an if instead of completely statically resolving it.

ChuanqiXu9 · 2022-08-23T03:29:54Z

Yes, as long as the language permits us, it might not be a problem for the vendor. The reason why ewg don't get consensus is that:
(1) The complexity in wording and implementation. For example, what if the user provides an aligned/non-aligned allocation function only? Should the frontend provides the default non-aligned/aligned allocation function?
(2) The end user is hard to know which allocation function is going to be called. Unless we say that the user shouldn't/couldn't care about it.
(3) The user may be forced to provide redundant allocation functions.
(4) And a potential ambiguous problem.

And option2 don't have these problems except it will break more existing user codes. So if option2 is in the standard, the user need to edit their codes when upgrading. But personally I feel it might not be a big deal. And we don't any consensus yet. So I think it may be good to try to implement option2 and see the feedbacks at first.

jacobsa · 2022-08-27T04:17:43Z

For what it’s worth I’d be happy with option 2 even though it’s going to break my code. Will it be difficult to implement?

ChuanqiXu9 · 2022-08-29T02:20:52Z

Will it be difficult to implement?

As far as I can see, it wouldn't be difficult to implement. I just have too many TODOs to start working on it...

jacobsa · 2022-09-22T08:52:29Z

Nice work, thank you!

implement the option2 of P2014R0 This implements the option2 of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf. This also fixes llvm/llvm-project#56671. Although wg21 didn't get consensus for the direction of the problem, we're happy to have some implementation and user experience first. And from issue56671, the option2 should be the pursued one. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D133341

github-actions bot added the new issue label Jul 22, 2022

EugeneZelenko added coroutines C++20 coroutines and removed new issue labels Jul 22, 2022

yuanfang-chen added the c++20 label Jul 22, 2022

usx95 added this to C++ 20 in Clang Aug 21, 2022

ChuanqiXu9 self-assigned this Aug 23, 2022

ChuanqiXu9 closed this as completed in 327141f Sep 22, 2022

ChuanqiXu9 moved this to Done in C++ 20 in Clang Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang misaligns variables stored in coroutine frames #56671

Clang misaligns variables stored in coroutine frames #56671

jacobsa commented Jul 22, 2022 •

edited

Loading

yuanfang-chen commented Jul 22, 2022

llvmbot commented Jul 22, 2022

jacobsa commented Jul 22, 2022

yuanfang-chen commented Jul 23, 2022

jacobsa commented Jul 23, 2022

jacobsa commented Jul 24, 2022

ChuanqiXu9 commented Jul 25, 2022

jacobsa commented Jul 25, 2022

yuanfang-chen commented Jul 25, 2022

ChuanqiXu9 commented Jul 26, 2022

ChuanqiXu9 commented Aug 8, 2022

ChuanqiXu9 commented Aug 19, 2022

yuanfang-chen commented Aug 19, 2022

rjmccall commented Aug 22, 2022

ChuanqiXu9 commented Aug 23, 2022

jacobsa commented Aug 27, 2022

ChuanqiXu9 commented Aug 29, 2022

jacobsa commented Sep 22, 2022

Clang misaligns variables stored in coroutine frames #56671

Clang misaligns variables stored in coroutine frames #56671

Comments

jacobsa commented Jul 22, 2022 • edited Loading

yuanfang-chen commented Jul 22, 2022

llvmbot commented Jul 22, 2022

jacobsa commented Jul 22, 2022

yuanfang-chen commented Jul 23, 2022

jacobsa commented Jul 23, 2022

jacobsa commented Jul 24, 2022

ChuanqiXu9 commented Jul 25, 2022

jacobsa commented Jul 25, 2022

yuanfang-chen commented Jul 25, 2022

ChuanqiXu9 commented Jul 26, 2022

ChuanqiXu9 commented Aug 8, 2022

ChuanqiXu9 commented Aug 19, 2022

yuanfang-chen commented Aug 19, 2022

rjmccall commented Aug 22, 2022

ChuanqiXu9 commented Aug 23, 2022

jacobsa commented Aug 27, 2022

ChuanqiXu9 commented Aug 29, 2022

jacobsa commented Sep 22, 2022

jacobsa commented Jul 22, 2022 •

edited

Loading