Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang should provide annotations to force or verify tail call optimization #46765

Closed
maximecb mannequin opened this issue Sep 4, 2020 · 10 comments
Closed

Clang should provide annotations to force or verify tail call optimization #46765

maximecb mannequin opened this issue Sep 4, 2020 · 10 comments
Labels
bugzilla Issues migrated from bugzilla c

Comments

@maximecb
Copy link
Mannequin

maximecb mannequin commented Sep 4, 2020

Bugzilla Link 47421
Resolution FIXED
Resolved on Apr 17, 2021 10:35
Version unspecified
OS All
CC @davidbolvansky,@dwblaikie,@DougGregor,@efriedma-quic,@zygoloid
Fixed by commit(s) rG8344675908424ee532d4ae30e5043c5a5834e02c

Extended Description

It's possible to use tail calls to implement bytecode interpreters in C/C++ in a way that is much cleaner than with computed GOTOs, and potentially more efficient as well. This is because, with computed GOTOs, you end up with a large function body with very many labeled code blocks, such that every code block can potentially jump to every other. This makes it hard for any compiler to do register allocation effectively.

With tail calls, you can define a tail-recursive function for every opcode in your interpreter such that every instruction simply calls the next one. Tail calls also potentially give us a clean ABI (function call boundaries) where an interpreter could efficiently dispatch to code generated by a JIT compiler. However, this is only possible if one can ensure that tail recursion is in fact optimized by the C/C++ compiler.

Clang currently does a fairly good job at optimizing tail calls, however, it's not something developers can rely on. It happens behind the scenes, and to know whether tail call optimization was performed or not, one has to carefully study the disassembly. In our case, we have tried to convert the Ruby interpreter to a tail recursive implementation, and found that tail call optimization was performed for some, but not all Ruby opcodes. The Ruby codebase is very large, which makes it difficult to tell which opcodes did not get optimized and why.

The usefulness of tail call optimization is clearly recognized in other programming languages, but the fact that we currently can't rely on them severely limits their usefulness. It would be very desirable to have some kind of annotation to force the compiler to make specific function calls into tail calls (eg: must_tail), or to force every control flow path leaving a given function to become tail calls, or to at least produce a compilation error if this cannot be done, so that the outcome is visible to the programmer.

I believe it's in the spirit of C to allow "unsafe" operations to be performed if the programmer specifies this is what must happen, and so forcing tail calls seems acceptable to me, so long as function call signatures are compatible, but if that isn't possible, at least forcing a compiler error message (or even just a warning) to be produced would already be a useful step forward.

@davidbolvansky
Copy link
Collaborator

Maybe there are some optimization remarks in -Rpass=.* / -Rpass-missed=.* related to tail call optimization (so you can see why tco failed for some functions)

@maximecb
Copy link
Mannequin Author

maximecb mannequin commented Sep 4, 2020

Maybe there are some optimization remarks in -Rpass=.* / -Rpass-missed=.*
related to tail call optimization (so you can see why tco failed for some
functions)

I've tried compiling with:
-foptimize-sibling-calls -Rpass='tailcallelim'

Unfortunately, it doesn't tell you every function that got optimized with tail calls, for some reason, only functions that are self-recursive?

I think this approach isn't great. It would be much more robust if there could be annotations in the code. At the very least to verify that tail call optimization happens for a specific function, but ideally to force tail calls in that function.

@efriedma-quic
Copy link
Collaborator

The IR has support for "musttail", which unconditionally forces a tail call. We currently don't expose that directly to C code, though. Maybe we could expose it somehow, I guess?

Note that this only works if the types of the caller and callee functions precisely match. Would that be sufficient for your use-case?

@maximecb
Copy link
Mannequin Author

maximecb mannequin commented Sep 8, 2020

The IR has support for "musttail", which unconditionally forces a tail call.
We currently don't expose that directly to C code, though. Maybe we could
expose it somehow, I guess?

Note that this only works if the types of the caller and callee functions
precisely match. Would that be sufficient for your use-case?

Yes, that sounds perfect for the use case of implementing an interpreter using tail calls. Our interpreter, and every interpreter looking to do this will likely implement instructions that all have the same function signature.

Exposing "musttail" to force tail calls to happen would do the trick. Is there a way to annotate individual return call() statements in LLVM, or can we only allocate whole functions? Either way, for our use case (and most interpreters), signalling that every return from a given function must be a tail call would be sufficient.

Can you implement something like this? This could definitely have an impact on the way interpreters are typically implemented in C/C++.

@efriedma-quic
Copy link
Collaborator

It's per-call on the LLVM side.

Strawman syntax: "void foo(int x) { __attribute((musttail)) return bar(x); }"

I don't think I'll have time to work on this in the near future, but I can answer any questions.

@maximecb
Copy link
Mannequin Author

maximecb mannequin commented Sep 10, 2020

It's per-call on the LLVM side.

Strawman syntax: "void foo(int x) { __attribute((musttail)) return bar(x); }"

I don't think I'll have time to work on this in the near future, but I can
answer any questions.

That syntax would work very well for us.

Do you think the clang maintainers would be likely to accept a pull request for this feature? How difficult do you think this is to implement?

@efriedma-quic
Copy link
Collaborator

Not hard to implement, relatively speaking. It should be possible to leverage the existing attribute infrastructure without making any big changes to existing code.

In terms of how likely it is to be accepted, see https://clang.llvm.org/get_involved.html . I see two sticking points:

  1. It's sort of niche; unless you're writing a few specific sorts of compilers/interpreters, it's not useful. But usefulness isn't just raw number of users; if we expect that major projects like the Ruby interpreter would take advantage of it, that would be enough to justify it.
  2. There isn't any C/C++ standard proposal. But obviously that could change.

Probably would want to send an email to cfe-dev before getting too deep into it.

@davidbolvansky
Copy link
Collaborator

https://reviews.llvm.org/D99517

So seems like resolved?

@maximecb
Copy link
Mannequin Author

maximecb mannequin commented Apr 16, 2021

https://reviews.llvm.org/D99517

So seems like resolved?

That looks exactly like what we wanted, awesome. This is merged then? It's official?

@davidbolvansky
Copy link
Collaborator

Yes, it should be in clang 13.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c
Projects
None yet
Development

No branches or pull requests

2 participants