-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang should provide annotations to force or verify tail call optimization #46765
Comments
Maybe there are some optimization remarks in -Rpass=.* / -Rpass-missed=.* related to tail call optimization (so you can see why tco failed for some functions) |
I've tried compiling with: Unfortunately, it doesn't tell you every function that got optimized with tail calls, for some reason, only functions that are self-recursive? I think this approach isn't great. It would be much more robust if there could be annotations in the code. At the very least to verify that tail call optimization happens for a specific function, but ideally to force tail calls in that function. |
The IR has support for "musttail", which unconditionally forces a tail call. We currently don't expose that directly to C code, though. Maybe we could expose it somehow, I guess? Note that this only works if the types of the caller and callee functions precisely match. Would that be sufficient for your use-case? |
Yes, that sounds perfect for the use case of implementing an interpreter using tail calls. Our interpreter, and every interpreter looking to do this will likely implement instructions that all have the same function signature. Exposing "musttail" to force tail calls to happen would do the trick. Is there a way to annotate individual return call() statements in LLVM, or can we only allocate whole functions? Either way, for our use case (and most interpreters), signalling that every return from a given function must be a tail call would be sufficient. Can you implement something like this? This could definitely have an impact on the way interpreters are typically implemented in C/C++. |
It's per-call on the LLVM side. Strawman syntax: "void foo(int x) { __attribute((musttail)) return bar(x); }" I don't think I'll have time to work on this in the near future, but I can answer any questions. |
That syntax would work very well for us. Do you think the clang maintainers would be likely to accept a pull request for this feature? How difficult do you think this is to implement? |
Not hard to implement, relatively speaking. It should be possible to leverage the existing attribute infrastructure without making any big changes to existing code. In terms of how likely it is to be accepted, see https://clang.llvm.org/get_involved.html . I see two sticking points:
Probably would want to send an email to cfe-dev before getting too deep into it. |
https://reviews.llvm.org/D99517 So seems like resolved? |
That looks exactly like what we wanted, awesome. This is merged then? It's official? |
Yes, it should be in clang 13. |
Extended Description
It's possible to use tail calls to implement bytecode interpreters in C/C++ in a way that is much cleaner than with computed GOTOs, and potentially more efficient as well. This is because, with computed GOTOs, you end up with a large function body with very many labeled code blocks, such that every code block can potentially jump to every other. This makes it hard for any compiler to do register allocation effectively.
With tail calls, you can define a tail-recursive function for every opcode in your interpreter such that every instruction simply calls the next one. Tail calls also potentially give us a clean ABI (function call boundaries) where an interpreter could efficiently dispatch to code generated by a JIT compiler. However, this is only possible if one can ensure that tail recursion is in fact optimized by the C/C++ compiler.
Clang currently does a fairly good job at optimizing tail calls, however, it's not something developers can rely on. It happens behind the scenes, and to know whether tail call optimization was performed or not, one has to carefully study the disassembly. In our case, we have tried to convert the Ruby interpreter to a tail recursive implementation, and found that tail call optimization was performed for some, but not all Ruby opcodes. The Ruby codebase is very large, which makes it difficult to tell which opcodes did not get optimized and why.
The usefulness of tail call optimization is clearly recognized in other programming languages, but the fact that we currently can't rely on them severely limits their usefulness. It would be very desirable to have some kind of annotation to force the compiler to make specific function calls into tail calls (eg: must_tail), or to force every control flow path leaving a given function to become tail calls, or to at least produce a compilation error if this cannot be done, so that the outcome is visible to the programmer.
I believe it's in the spirit of C to allow "unsafe" operations to be performed if the programmer specifies this is what must happen, and so forcing tail calls seems acceptable to me, so long as function call signatures are compatible, but if that isn't possible, at least forcing a compiler error message (or even just a warning) to be produced would already be a useful step forward.
The text was updated successfully, but these errors were encountered: