-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit Tail Calls #3407
base: master
Are you sure you want to change the base?
Explicit Tail Calls #3407
Conversation
thanks a ton @phi-go for all the work you put into writing the RFC so far! Really appreciated! 🎉 |
While I personally know the abbreviation TCO, I think that it would be helpful to expand the acronym in the issue title for folks who might not know it at first glance. |
An alternative is to mark a function (like in OCaml), not a recursive fn x() {
let a = Box::new(());
let b = Box::new(());
y(a);
} |
The RFC already highlights an alternative design with markers on function declarations and states that tail calls are a property of the function call and not a property of a function declaration since there are use cases where the same function is used in a normal call and a tail call. |
Note: this may be suitable either as a comment on #2691 or here. I'm assuming interested parties are watching both anyway. The restriction on caller and callee having the exact same signature sounds quite restrictive in practice. Comparing it with [Pre-RFC] Safe goto with value (which does a similar thing to the If we translate it to the
I don't think this should be handled by an actual calling convention label though. Calling these "rusttail" functions for discussion purposes, we have the following limitations:
The user doesn't have to see either of these limitations directly though; the compiler can just generate a shim with the usual (or specified) calling convention which calls the "rusttail" function if required. Instead, they just observe the following restrictions, which are strictly weaker than the one in the RFC:
|
In I don't see what's so bad about reasoning in terms of bounded stack usage. Consider termination as an analogy. Just as the compiler can legally reduce or increase stack usage, the compiler can legally make a program faster or slower. In fact, it can make the program arbitrarily slower. But it can't make it infinitely slower, i.e. change a program from terminating to non-terminating. @digama0 said before that "the AM generally does not deal in asymptotics, either something is observable or it's not". And later, in reference to stack usage: "It is growing unboundedly, but at what point can you say that it has violated its requirements?" But in the case of termination, there is likewise no point at which you can say "this program is taking too long to run, so it must have been miscompiled". Only if you can prove that the compiled program will run for infinitely long, and the original program would not, can you say it was miscompiled. The same should apply to stack usage. Consider a program that never terminates but, contrary to this comment, might have input and output. For example, this could include a program that keeps serving requests until it's interrupted – where the interesting case would be if it uses tail recursion between requests: fn server() {
serve_next_request();
become server();
} There is no finite level of stack consumption at which you can say "this is too much, so the program was miscompiled". But if you can prove that stack consumption will increase indefinitely, despite the program only making a bounded number of nested non- |
TL;DR at the end. I've (hopefully) said my part in full.
The problem with this is that under no proposal is Even locals without drop glue can pretty easily prevent TCO, because their valid lifetime still extends until after the function call. [playground] If any such potentially escaping locals are This makes TCO surprisingly difficult to perform in Rust. These cases aren't that the optimizer doesn't think TCO would be desired/beneficial, but that it cannot prove that performing TCO would not change the behavior of the program. Case 3 (TCE required for validity) may be a more pressing need than case 1 (TCO preferred for performance), sure. But case 1 is still a valid use case, and shouldn't be outright dismissed. I think it's fairly clear that we want both That bringing it up led to such vocal debate only reinforces that idea.
If that's the only difference, it's just case 1 from the beginning, not case 2 (two different code paths, one which requires TCE and one which doesn't). Which is fundamentally why I would like to be able to just write case 1 with I would expect authored examples of case 2 to be extremely few, as it would require authoring two different implementations of some functionality, one of which relies on TCE, and one of which doesn't, with the TCE impl beating the iterative, but the iterative beating a trampolined compilation of the TCE version. Generating case 2 from case 3 out of necessity, e.g. by the use of a trampoline, is the likely scenario in which it would exist.
The stability pledge of Rust 1.0 is important. That code which compiles today will continue to successfully compile with any future compiler (with very few exceptions carved out for stability without stagnation, and assuming the same library versions are used) shouldn't be trespassed against by the language without strong motivation. It's okay if things that sit on top of/outside the language like This is why I'm hesitant to attach codegen-dependent semantics to core language functionality — it's an unnecessary mixing of layers. Compilers do in practice end up mixing layers, because doing so confers practical benefit, but it's still preferable if it's avoidable. Which is why my main question remains if case 3 can be satisfied by a codegen guarantee (if you use
There's a big difference between a static error from the compiler and a runtime error, though. Yes, it's ideal if the compiler tooling can guarantee the lack of the runtime error and/or warn you if it's possible you might hit it. But where a runtime error only occurs when you get to it, a static error prevents the program from doing anything. It doesn't matter if the stack blowup only happens in a single code path; because you've made it into a static error, you can't even use the program to perform other tasks which aren't at risk of triggering the error condition. Again, this is a reasonable choice to make, but it isn't a great fit for the language to force this onto users who want to be able to use some functionality.
The compiler can indeed transform code into a trampolined form, but so can a sufficiently advanced proc macro. Given the restriction of exactly matching function signatures, it's not even that advanced of a macro. Trampolines aren't like Now, if we were asking
Ah right, that does pose a problem to trying to make stack non-exhaustion consistent. Loop unrolling at least isn't separately problematic (since unrolled loop locals would hopefully be trivially overlapped). But this is a discussion for the UCG, if/when I actually get around to opening it.
I'd double-check that, actually; Rust doesn't have the forward progress guarantee that C++ does. It's actually quite tricky to both guarantee that the next observable effect will occur while also permitting an infinite loop without observable effects to exist. As far as I'm aware, we haven't done anything to address it Forward progress and stack usage can be QOI issues while still being guaranteed by the compiler. And to the end user, the difference needn't matter much, especially if the compiler and language are developed in tandem instead of separately. If rustc provides a guarantee that Rust code using Because of that perspective, I'd much prefer the language to not needlessly make things weirder by mixing layers (e.g. by leaking codegen concerns into language validity rules) without good reason. But with good reason, it's justifiable. Rust governance isn't a popularity contest, nor is it democratic12 — we've made our points, and the lang team will consider them and make a decision one way or another. I'd obviously prefer one way, and others can prefer another, but it's the lang team's call in the end. So there's no need to continue to litigate this further unless bringing new information to the discussion. (E.g. data comparing how often Footnotes
|
@CAD97 Well, I'd expect forward progress to be guaranteed at least if you stay clear of the problem spots of (a) infinite loops with no side effects and (b) atomics. But it's true the issue hasn't seen much UCG discussion. |
Just wanted to share a potential syntax idea I had that would fit all mentioned use cases: become <expr> as demanded by the RFC for become? <expr> as demanded by @CAD97 for Both versions are short enough to be not unnecessarily verbose in either case. The only downside I see is that something like However, I still think that we should for now concentrate on |
tail calls generally don't cause problems for branch target prediction. what happens is the call/ret branch target prediction stack just doesn't get modified by the tail call, so when the tail-called function finally returns, it pops the correct return address off the prediction stack and goes to the right place. e.g.: pub fn foo() {
// before bar call
bar();
// back in foo
}
pub fn bar() {
// before baz call
become baz();
}
pub fn baz() {
// in baz
}
|
@comex If the original program would terminate, and the compiled program does not (and I assume it does so without taking any observable steps either, since taking more observable steps than the original program would be an obs. eqv. violation), then that means that the original program took some step which took "infinitely long" to be completed in the compiled program. That step was not completed after all, and this is a violation. There is no point at which you can say that the compiled program is definitely off the rails, but the overall execution failed to complete that step.
The difference between the previous example and this example is that now there are an infinite number of observable AM steps involved, where previously we had a finite number of AM steps and then a hang on the last one (where an infinite number of CM steps got shoved in the middle and the AM step was not completed). As long as the CM continues to be "responsive", in the sense that it is taking finitely many steps to perform each AM observable step, then there isn't anything to complain about. We are only interested in the behavior of the machine up to finitely many observable AM steps anyway, since every program delivers its results in finitely many observable steps. This is actually encoded in the notion of observational equivalence (it is a bisimulation relation, meaning that programs which are equivalent on each finite initial segment of the trace are also equivalent on the whole trace), which means in particular that you could periodically delete some stack frames that will never again be revisited in a runaway non-tail recursive loop (with an effect in it, like your server loop example but without the The conclusion is what I said at the start: the AM and the tools we use to analyze it do not make it possible to measure stack usage, asymptotically or otherwise. Changing this situation would require some rather deep opsem changes, and I will await @CAD97 's proposal. |
Unfortunately it still ends up breaking inlining if applied to imperative code, so it's essentially dead outside of potentially it might still work for musttail. Since it's become meaningless without some form of
This makes dynamically JITing code involved in a musttail call cycle difficult, since doing so could increase stack consumption. But for a purely AOT compiler, I think this is both a weak enough requirement as to not prevent optimizations and (barely) strong enough to create the desired stack usage guarantee. I'm not fully confident, but it looks to potentially be workable, and to not interfere with inlining after loop unrolling anymore. I'm actually more worried that it doesn't quite get to a proof of finite stack consumption because there could be an infinite set of potential control flow paths through a given function. |
this doesn't work for dynamically-sized |
@digama0 I created a thread in the unsafe-code-guidelines repo (for lack of a better place) to discuss how feasible it is to formally require tail call elimination. Going back to this RFC, I agree that it doesn't need to be blocked by a precise definition. In lieu of one, I would prefer to go with a handwavy requirement to bound stack usage rather than just calling it QoI. But whatever. There probably won't be some smart-aleck author of an alternative implementation who comes in and says "I'm not implementing proper tail calls because the spec doesn't require it". Probably. I agree with others that it's critical for rustc, at least, to implement explicit tail calls in a way that guarantees bounded stack usage on all supported targets. So either the 'MVP WebAssembly' target needs to be meaningfully deprecated (in favor of WebAssembly with the more recent tail call feature), or we need to mark tail-callable functions with an attribute. I don't see what's so bad about the attribute. I don't want to go in circles here, but… the RFC says of requiring an attribute that "while quite noisy it is also less flexible than the chosen approach." But requiring an exact function signature match is drastically less flexible! Regardless of whether the initial implementation allows non-matching signatures, we should want to support them eventually. Without an attribute on tail-callable functions, supporting non-signature-matched calls would require changing the default If we require an attribute for now, then we preserve the ability to support non-signature-matched calls (either in the initial implementation or in the future), regardless of the feasibility of changing the default calling convention. And we also make it possible to support tail calls on MVP WebAssembly via trampolines, at least during the deprecation process. If in the future changing the default calling convention turns out to be feasible, and MVP WebAssembly has been properly deprecated, then the attribute can just become a no-op at that point. (Regarding a C backend, I've always wanted Rust to have one, but I don't think it's important enough to block tail calls, particularly given that it currently does not exist.) |
I'll take a moment here to pitch again my earlier suggestion to use same-crate calls instead of an attribute. The compiler can make this Just Work™ by analyzing the call structure and making shims to interface with any other code that wants to take a reference to the function pointer or call the function from outside the crate. The overall effect is that you can |
@comex: So there is actually a trick one can use to avoid needing the attribute: the only functions that need to be able to be tail called are the ones that themselves contain tail calls. Therefore, one can include “has tail calls” in the ABI of a function, and use callee-pops calling conventions/trampolines/etc for precisely the functions that themselves contain |
i doubt that will work because those functions can be called through function pointers and the cast to |
While somewhat annoying to keep track of in the compiler, the workaround is quite simple: you have two entry points to the function. When called statically it uses the tail call convention, but when a function pointer is taken, it gets a small additional shim to the standard call convention. Unless the attribute would change the function to not be Though it should still be noted explicitly that such a scheme of course prevents tail calling a function pointer, since it's using the standard ABI rather than the tail ABI. |
but tail calling function pointers is necessary for fast interpreters which is a major motivation for union Imm {
branch_target: *const Inst,
value: isize,
}
struct Inst {
run: unsafe fn(pc: *const Inst, stack: *mut u64, mem: *mut u8),
imm: Imm,
}
unsafe fn add_imm(pc: *const Inst, stack: *mut u64, mem: *mut u8) {
*stack = (*stack).wrapping_add((*pc).imm.value as u64);
let pc = pc.add(1);
become (*pc).run(pc, stack, mem)
}
unsafe fn branch_if(pc: *const Inst, stack: *mut u64, mem: *mut u8) {
let v = *stack;
let stack = stack.add(1);
let pc = if v != 0 {
(*pc).imm.branch_target
} else {
pc.add(1)
};
become (*pc).run(pc, stack, mem)
}
// more instructions... |
Wow, I'm feeling quite un-heard here. You can have both! There is no technical restriction to having both same-sig function pointer calls powered by the mechanism in the RFC, and same-crate arbitrary-sig tail calls powered by a compiler transform and a shim for interfacing with code that needs the standard calling convention. You don't even need an annotation to disambiguate them, the compiler can figure out what case you are in automatically. Your example is fine since it is covered by the mechanism specified in the RFC. And I think it will be a general pattern with |
Having the same-crate restriction in place without the restriction of same-signature for those calls would be another game changer for interpreters built on tail calls as it would allow us avoid type Register = usize;
enum Trap {
DivisionByZero,
}
enum Op {
I32Add { result: Register, lhs: Register, rhs: Register },
I32Div { result: Register, lhs: Register, rhs: Register },
BrIf { condition: Register, offset: isize },
}
struct Executor {
stack: Vec<i32>,
ops: Vec<Op>,
sp: usize,
pc: usize,
}
impl Executor {
fn dispatch(&mut self) -> Result<(), Trap> {
match self.ops[self.pc] {
Op::I32Add { result, lhs, rhs } => become self.execute_i32_add(result, lhs, rhs),
Op::I32Div { result, lhs, rhs } => become self.execute_i32_div(result, lhs, rhs),
Op::BrIf { condition, offset } => become self.execute_br_if(condition, offset),
}
}
fn execute_i32_add(&mut self, result: Register, lhs: Register, rhs: Register) -> Result<(), Trap> {
self.stack[self.sp + result] = self.stack[self.sp + lhs].wrapping_add(self.stack[self.sp + rhs]);
self.pc += 1;
become self.dispatch()
}
fn execute_i32_div(&mut self, result: Register, lhs: Register, rhs: Register) -> Result<(), Trap> {
let rhs = self.stack[self.sp + rhs];
if rhs == 0 {
return Err(Trap::DivisionByZero)
}
self.stack[self.sp + result] = self.stack[self.sp + lhs].wrapping_div(rhs);
self.pc += 1;
become self.dispatch()
}
fn execute_br_if(&mut self, condition: Register, offset: isize) -> Result<(), Trap> {
if self.stack[self.sp + condition] != 0 {
self.pc = self.pc.wrapping_add_signed(offset);
} else {
self.pc += 1;
}
become self.dispatch()
}
} |
@comex wrote:
To be clear the RFC currently only discusses guaranteed TCE given matching function signatures to keep the feature as small as possible. This is also the context this sentence in the RFC should be seen in. Regarding the attribute on the function declaration are you thinking of requiring it in addition to (2) is the version discussed in the RFC and I hope it is clear why it is described as less flexible. (1) Has not really been discussed as far as I remember. However, to my understanding, even LLVM only supports functions that have nearly identical ABI's. WebAssembly currently does not even allow TCO. In both cases I expect the push to support non-matching (or matching for WebAssembly) function signatures to be done at a later point, at that time we can still require attributes on function declarations if needed. In the meantime I would expect the WebAssembly backend to raise a compiler error as per the RFC. Though, I can see that it could be unappealing to introduce this attribute later, so I'm not sure this would be the right approach. Note that I expect that the attribute is mainly interesting for tail calling function pointers and it's "marker" would need to become part of the function signature. For same-crate static calls I think the suggestion by @digama0 should be preferred. This, however, would imply that the attribute need only be added for function that are tail called via function pointer with mismatched function signature, which could be confusing. |
It would seem entirely reasonable to me to add a I would expect that to be less -- or at least simpler -- work than all the syntax and drop order changes in the experiment, so adding more options to try out and see how they fit with different things that people wish to be able to do sounds like a good way to help find out which way work more smoothly for things people want to be able to do. (We could have |
Is there a proposal that actually requires having a separate ABI for tail-callable functions (or is it tail-calling functions?)? The RFC proposal works just fine with I think @DemiMarie mentioned that a caller-pops calling convention could be more flexible wrt varying arguments, but I'm not sure this is a good default choice and if we have a specific |
Well, given that LLVM has a specific
Do we need it? I don't know. But it at least seems like a plausible experiment, like we have an existing experiment for But |
@digama0 the
I'll try to implement those relaxations in the experiment (once the things described in the RFC right now are fully implemented and merged...), but I also want to highlight @phi-go's mention that those relaxations can be RFC-ed separately from this RFC (and they probably should, to keep the scope smaller!). |
@Robbepop Using |
Un-nominating the RFC for now. To be explicitly clear on next steps: 👍 for going ahead with one or more experiments (including a tail-call with placeholder keyword, and a tail call calling-convention as @scottmcm suggested), blocking concerns for an experiment or RFC that uses the demo-looks-done approach of |
I wouldn't say it is a non-starter. The same reasoning could have been applied to the highly debated
If during all the time of the RFC thread I'd have seen a single alternative syntax that was actually cutting it, I'd wholeheartedly agree. Yet, we do not even has consensus what an agreeable placeholder syntax could look like. A bit of direction from the people who are deciding over nomination could probably help here. In light of the RFC's un-nomination I would like to know what the status of the work behind the RFC is as this RFC thread has been very silent since quite a few weeks now. What in particular is the stance of @WaffleLapkin (RFC implementer) and @phi-go (RFC author) about the next steps provided by @joshtriplett ? I personally just hope that this RFC won't die due to too much bikeshedding. It would be really sad to lose the momentum that has been built up for this long awaited Rust feature. |
To my understanding we are just waiting for the implementation of the experiment. Here is the tracking issue: rust-lang/rust#112788. So un-nominating until the implementation is done seems fine to me. Regarding the placeholder syntax, we are indeed waiting on a decision for the actual syntax. The current state is that the current implementation uses I find |
feature: Add basic support for `become` expr/tail calls This follows rust-lang/rfcs#3407 and my WIP implementation in the compiler. Notice that I haven't even *opened* a compiler PR (although I plan to soon), so this feature doesn't really exist outside of my WIP branches. I've used this to help me test my implementation; opening a PR before I forget. (feel free to ignore this for now, given all of the above)
This RFC proposes a feature to provide a guarantee that function calls are tail-call eliminated via the
become
keyword. If this guarantee can not be provided an error is generated instead.Rendered
For reference, previous RFCs #81 and #1888, as well as an earlier issue #271, and the currently active issue #2691.