Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang misaligns variables stored in coroutine frames #56671

Closed
jacobsa opened this issue Jul 22, 2022 · 18 comments
Closed

Clang misaligns variables stored in coroutine frames #56671

jacobsa opened this issue Jul 22, 2022 · 18 comments
Assignees
Labels
c++20 coroutines C++20 coroutines

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Jul 22, 2022

Clang seems to fail to correctly align objects in a coroutine frame that need more than 8 bytes of alignment on x86-64. Here is a program that demonstrates this by having a 64-byte-aligned object persist across a co_await (Compiler Explorer):

#include <stdlib.h>

#include <coroutine>
#include <cstdint>
#include <iostream>

struct Promise;

struct Task {
  using promise_type = Promise;
};

// A promise type that never suspends.
struct Promise {
  std::suspend_never initial_suspend() { return {}; }
  void unhandled_exception() {}
  std::suspend_never final_suspend() noexcept { return {}; }
  Task get_return_object() { return Task{}; }
  void return_value(const bool value) {}
};

// An awaitable that is always immediately ready with the value true.
struct Awaitable final {
  // __attribute__((noinline)) apparently prevents the compiler from seeing that
  // co_await can be a no-op.
  bool __attribute__((noinline)) await_ready() { return true; }

  void await_suspend(std::coroutine_handle<> h) {}
  bool await_resume() { return true; }
};

// Crash if p is not aligned to 64 bytes.
void __attribute__((noinline)) CheckAlignment(void* const p) {
  if (reinterpret_cast<intptr_t>(p) % 64 == 0) {
    return;
  }

  std::cerr << "Bad alignment: " << p << "\n";
  exit(1);
}

// A class that needs to be aligned to 64 bytes, and crashes if it isn't.
class HighlyAligned final {
 public:
  explicit HighlyAligned() { CheckAlignment(this); }

 private:
  alignas(64) char storage[16];
};

static_assert(alignof(HighlyAligned) == 64);

// A coroutine that creates an object that needs to be aligned to 64 bytes in
// the coroutine frame, ensuring it exists across a co_await.
Task Coroutine() {
  auto a = HighlyAligned();
  const bool result = co_await Awaitable{};
  static_cast<void>(a);

  co_return result;
}

int main(int argc, char** argv) {
  Coroutine();
}

When built from trunk with -std=c++20, this dies pretty reliably with a Bad alignment message. If you instead change the program so that it creates HighlyAligned objects on the stack in main, it doesn't crash.

(For the record, here is a gtest version that is a little easier to work with but has more dependencies.)

@EugeneZelenko EugeneZelenko added coroutines C++20 coroutines and removed new issue labels Jul 22, 2022
@yuanfang-chen
Copy link
Collaborator

wg21.link/p2014r0 ? I didn't track the most recent development though, nor I'm aware of any consensus yet.

Also see https://reviews.llvm.org/D102147

@llvmbot
Copy link

llvmbot commented Jul 22, 2022

@llvm/issue-subscribers-c-20

@jacobsa
Copy link
Contributor Author

jacobsa commented Jul 22, 2022

@yuanfang-chen Ah nice, thank you very much for the context. I'm surprised to find that this was left out of the standard. But also surprised that it needs to be in the standard in order to be fixed: wouldn't it be possible to achieve an alignment of N by allocating size + N - 1 bytes and shifting appropriately? It's unfortunate to over-allocate, but correctly aligning types seems more important than that.

Anyway failing that, it sounds like maybe I'll get what I want if I wait for https://reviews.llvm.org/D102147, set -fcoroutines-aligned-alloc, and define an overload of operator new that accepts the appropriate argument. Won't that defeat the heap allocation elision optimization though?

@yuanfang-chen
Copy link
Collaborator

But also surprised that it needs to be in the standard in order to be fixed: wouldn't it be possible to achieve an alignment of N by allocating size + N - 1 bytes and shifting appropriately?

It is possible. https://reviews.llvm.org/D97915 (parent patch of https://reviews.llvm.org/D102147) will do that when aligned allocator is not available.

Anyway failing that, it sounds like maybe I'll get what I want if I wait for https://reviews.llvm.org/D102147, set -fcoroutines-aligned-alloc, and define an overload of operator new that accepts the appropriate argument. Won't that defeat the heap allocation elision optimization though?

As mentioned above, define an overload of operator **new`` is optional. I'm not very familiar with HALO but my intuition is that the patch shouldn't prevent it. Could you elaborate on it?

@rjmccall @ChuanqiXu9

@jacobsa
Copy link
Contributor Author

jacobsa commented Jul 23, 2022

I'm not sure I see why overloading operator new would be optional. You need the version that accepts the alignment argument, right? Or are you saying it would use the free-standing operator new otherwise?

I'm also not very familiar with HALO. My thought was that if there is a user-provided operator new then the compiler would be required to use it, in case it has side effects. I could definitely be wrong about this.

@jacobsa
Copy link
Contributor Author

jacobsa commented Jul 24, 2022

@yuanfang-chen I just noticed that https://reviews.llvm.org/D97915 hasn't had a comment in over a year. :-( Is there any hope for getting the over-allocation strategy committed any time soon? This is a correctness problem, so it seems worth doing that before the standard is perfect.

@ChuanqiXu9
Copy link
Member

Won't that defeat the heap allocation elision optimization though?

This should be irrelevant with HALO.

I prefer the direction of D102147 than D97915 since this is a standard problem instead of an implementation problem. As we could find, GCC has the same problem. I would try to push this forward in 3 months.

For the practical side, I think you could use the workaround of MLIR. MLIR is also a user of LLVM Coroutines. Previously, MLIR met a similar problem, and their solution in C++ may look like: https://godbolt.org/z/eYderhTTW

@jacobsa
Copy link
Contributor Author

jacobsa commented Jul 25, 2022

Yes, I have a similar workaround. It's not totally correct though, because there is no upper bound on the alignment that might be needed (right?).

I agree it's a problem that the standard doesn't let you define an operator new that takes std::align_val_t. But that's a bug in customization, not correctness. It seems like if the promise doesn't define an operator new, clang could at least give the correct behavior while being compatible with the current standard.

@yuanfang-chen
Copy link
Collaborator

You need the version that accepts the alignment argument, right? Or are you saying it would use the free-standing operator new otherwise?

Yes. Either the standard library-provided version or a user-defined version. If the user-defined version is not found, it should use the standard library-provided version.

I agree it's a problem that the standard doesn't let you define an operator new that takes std::align_val_t. But that's a bug in customization, not correctness.

It concerns both correctness and customization because the current standard wording prevents the compiler from searching for and using the aligned allocator (either the standard library-provided version or a user-defined version). So it is the standard that caused this bug (the compiler could fix it as an extension though, like D97915).

I prefer the direction of D102147 than D97915 since this is a standard problem instead of an implementation problem. As we could find, GCC has the same problem. I would try to push this forward in 3 months.

Agreed. I'll find some time to rebase the patch.

@ChuanqiXu9
Copy link
Member

Yes, I have a similar workaround. It's not totally correct though, because there is no upper bound on the alignment that might be needed (right?).

Strictly, there is no upper bound on the alignment. But from my experiences, '64' works in most cases of X86_64. I saw some use cases (not coroutines) in AArch64 which uses '128' to speedup hardware cache accessing. But all of these are low-level softwares like kernels. And many use cases for coroutines are the higher level applications.
So I feel like the workaround won't be too bad in a short time.

@ChuanqiXu9
Copy link
Member

Update: Wg21 is going to discuss P2014(cplusplus/papers#750) in 8/18.

@ChuanqiXu9
Copy link
Member

Update: No consensus in the meeting. And we feel good to have some implementation experience and use experience first.

And option1 has some problems:
(1) the ambiguous overloading. Consider the case for coro foo(std::align_val_t).
(2) we need the middle end to do the overload resolution instead of the frontend. This may bring some problems. For example, if the two allocation functions are template functions, we must instantiate both of the template functions. However, it is problematic if one of the templates are not instantiable. (This is common as a SFINAE pattern).
(3) Some other overload resolution problems.
(4) The complexity in wording and implementation.

And option2 is much simpler and option2 wouldn't meet these problems. But option2 will break more existing codes. Also we concern about the ABI problems.

But personally I strongly prefer option2. Since for the coroutine uses, the number of coroutine type definitions are really rare. And although option2 is going to break existing codes, we need to edit rare places. Also the option1 may break existing codes too.

@yuanfang-chen due to D102147 wants to implement option1. So we may implement a new revision. How do you think about this? If you don't have time, I'm Ok to implement option2 seperately.

@yuanfang-chen
Copy link
Collaborator

@ChuanqiXu9 Thanks for the updates. Glad to see that the discussion on this issue resumed. Option2 looks better to me too. Making it consistent with normal new/delete rules seems too much burden to carry without justifiable benefits. Let's just implement option2 and see what the user experiences are. I won't have the time to implement this though but happy to help review patches.

@rjmccall
Copy link
Contributor

I don't think the middle-end strictly needs to do the overload resolution as long as the language says that it's permitted to ODR-use both functions — basically, the language needs to authorize the frontend to emit an if instead of completely statically resolving it.

@ChuanqiXu9
Copy link
Member

Yes, as long as the language permits us, it might not be a problem for the vendor. The reason why ewg don't get consensus is that:
(1) The complexity in wording and implementation. For example, what if the user provides an aligned/non-aligned allocation function only? Should the frontend provides the default non-aligned/aligned allocation function?
(2) The end user is hard to know which allocation function is going to be called. Unless we say that the user shouldn't/couldn't care about it.
(3) The user may be forced to provide redundant allocation functions.
(4) And a potential ambiguous problem.

And option2 don't have these problems except it will break more existing user codes. So if option2 is in the standard, the user need to edit their codes when upgrading. But personally I feel it might not be a big deal. And we don't any consensus yet. So I think it may be good to try to implement option2 and see the feedbacks at first.

@ChuanqiXu9 ChuanqiXu9 self-assigned this Aug 23, 2022
@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 27, 2022

For what it’s worth I’d be happy with option 2 even though it’s going to break my code. Will it be difficult to implement?

@ChuanqiXu9
Copy link
Member

Will it be difficult to implement?

As far as I can see, it wouldn't be difficult to implement. I just have too many TODOs to start working on it...

@jacobsa
Copy link
Contributor Author

jacobsa commented Sep 22, 2022

Nice work, thank you!

veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue May 20, 2024
implement the option2 of P2014R0

This implements the option2 of
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf.

This also fixes llvm/llvm-project#56671.

Although wg21 didn't get consensus for the direction of the problem,
we're happy to have some implementation and user experience first. And
from issue56671, the option2 should be the pursued one.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D133341
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++20 coroutines C++20 coroutines
Projects
Status: Done
Development

No branches or pull requests

6 participants