Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++20 coroutine miscompiled if it contains a dispatch table + computed goto #56436

Closed
vogelsgesang opened this issue Jul 7, 2022 · 9 comments
Assignees

Comments

@vogelsgesang
Copy link
Member

Repro: https://godbolt.org/z/j3o96sYd1
clang trunk accepts the program, but miscompiles it.
gcc trunk compiles the program correctly.

The program implements a simple interpreter using the a dispatch table as described, e.g., in https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables.

The problem only occurs when combining coroutines with a dispatch table and a computed goto. A switch-based interpreter works as expected.

@vogelsgesang
Copy link
Member Author

vogelsgesang commented Jul 8, 2022

Looking at the output of -O0 -emit-llvm, I think the problem is that:

  • The pointers stored inside the _ZZ9interpretP11InstructionRiE14dispatch_table, i.e. the static dispatch table, refer to labels from the original, pre-split coroutine
  • The post-split resume clone of the coroutine tries to dispatch from this table, though. The resume function would do an indirect jump back into the pre-split coroutine - which makes no sense
  • The SimplifyCFG correctly identifies those jumps as non-sensical and therefore replaces the complete resume function by unreachable.

I think the underlying root cause here is that the labels stored in the static void* dispatch_table are duplicated by coroutine splitting. I.e., after the coro-split pass, there are in fact two inc labels: one in the ramp function, and one in the resume function. It is not clear which of the both labels the dispatchTable should refer to.

The problem becomes even more apparent in https://godbolt.org/z/7aYaG9xfE, where the inital_suspend was replaced by a suspend_never. In this case, we would need two separate dispatch tables: one storing the label pointers from the ramp function and a second one containing the label pointers for the resume function.

Judging by the assembly generated by gcc (https://godbolt.org/z/PMsqY5e5Y), it seems that they are able to avoid this problem, by splitting the coroutine in a different way: While LLVM duplicates the complete interpreter loop and its labels in both the ramp and the resume function. gcc instead only emits the interpreter loop in the "actor" function (== LLVM's resume function), and calls the actor function from the ramp function.

@ChuanqiXu9 FYI

@ChuanqiXu9
Copy link
Member

Thanks for reporting. I'll take a look.

@ChuanqiXu9 ChuanqiXu9 self-assigned this Jul 8, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 8, 2022

@llvm/issue-subscribers-c-20

@ChuanqiXu9 ChuanqiXu9 added the coroutines C++20 coroutines label Aug 12, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 12, 2022

@llvm/issue-subscribers-coroutines

@ChuanqiXu9
Copy link
Member

Your analysis is basically correct. And this one is not easy to fix. The biggest difference of coroutines between clang and gcc is that gcc handles coroutines in the frontend but clang handles it in the middle end. This brings many differences.

For example, in the frontend, we could scope informations. But in the middle end, we will store them in a global variable and there is no information about scopes. And it doesn't make a lot of sense to copy a global variable when splitting coroutines.

Also the computed goto are not standard C++. It is a GCC extension although clang supports it too. So I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.

@ChuanqiXu9
Copy link
Member

And from the link you pasted, computed goto looks like primarily for performance but it doesn't relate to correctness. So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G

@vogelsgesang
Copy link
Member Author

I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.

I agree. A warning should be fine for the time being. In case there is larger demand for combining computed goto with coroutines, we can still add support for that later.

It took a pretty long time to understand why the interpreter didn't work as expected. (I was trying this on a much bigger, real-world production interpreter for SQL queries originally...) A clear warning/error would have saved me a lot of time.

I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.

computed goto looks like primarily for performance but it doesn't relate to correctness

right

So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G

yes, that works, but it does not give me the performance I needed.
In the actual production interpreter, I solved the problem as shown in https://godbolt.org/z/17G8jGbse:
I did not use C++ coroutines in the interpreter loop itself, but instead kept track of the current suspension point. Doing so is a trivial change because the interpreter has to keep track of the next instruction anyway. When encountering a suspension point, I return from the interpreter-function and signal that I want to be called by returning false instead of true.

For https://godbolt.org/z/17G8jGbse, it is vital that LLVM does not inline the doInterpret function into the interpret function. Afaict, this is currently not forbidden by LLVM, and I was just lucky that code was not inlined in this case. We should probably change LLVM to actually enforce that it never inlines computed-gotos into a coroutine function

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Aug 15, 2022

I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.

Currently, the LLVM will not inline these kind of function by default: https://godbolt.org/z/v4Kvqrfzj. So we could only emit an error for it simply. I'll try to make it.

yes, that works, but it does not give me the performance I needed.

Just out of interesting, how much is the performance gap?

ChuanqiXu9 added a commit that referenced this issue Jan 17, 2023
Closing #56436

We can't support the GNU address of label extension in coroutines well
in current architecture. Since the coroutines are going to split into
pieces in the middle end so the address of labels are ambiguous that
time.

To avoid any further misunderstanding, we try to emit an error here.

Differential Revision: https://reviews.llvm.org/D131938
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 17, 2023
Closing llvm/llvm-project#56436

We can't support the GNU address of label extension in coroutines well
in current architecture. Since the coroutines are going to split into
pieces in the middle end so the address of labels are ambiguous that
time.

To avoid any further misunderstanding, we try to emit an error here.

Differential Revision: https://reviews.llvm.org/D131938
@vogelsgesang
Copy link
Member Author

Closing, as this was fixed in cc526e3 which will ship with clang-16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants