Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Tail Calls #3407

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open

Explicit Tail Calls #3407

wants to merge 86 commits into from

Conversation

phi-go
Copy link

@phi-go phi-go commented Apr 6, 2023

This RFC proposes a feature to provide a guarantee that function calls are tail-call eliminated via the become keyword. If this guarantee can not be provided an error is generated instead.

Rendered

For reference, previous RFCs #81 and #1888, as well as an earlier issue #271, and the currently active issue #2691.

text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Apr 6, 2023
@Robbepop
Copy link
Contributor

Robbepop commented Apr 6, 2023

thanks a ton @phi-go for all the work you put into writing the RFC so far! Really appreciated! 🎉

text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
@clarfonthey
Copy link
Contributor

While I personally know the abbreviation TCO, I think that it would be helpful to expand the acronym in the issue title for folks who might not know it at first glance.

@phi-go phi-go changed the title Guaranteed TCO Guaranteed TCO (tail call optimization) Apr 6, 2023
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
@VitWW
Copy link

VitWW commented Apr 6, 2023

An alternative is to mark a function (like in OCaml), not a return keyword:

recursive fn x() {
    let a = Box::new(());
    let b = Box::new(());
    y(a);
}

@Robbepop
Copy link
Contributor

Robbepop commented Apr 6, 2023

An alternative is to mark a function (like in OCaml), not a return keyword:

recursive fn x() {
    let a = Box::new(());
    let b = Box::new(());
    y(a);
}

The RFC already highlights an alternative design with markers on function declarations and states that tail calls are a property of the function call and not a property of a function declaration since there are use cases where the same function is used in a normal call and a tail call.

@digama0
Copy link
Contributor

digama0 commented Apr 6, 2023

Note: this may be suitable either as a comment on #2691 or here. I'm assuming interested parties are watching both anyway.

The restriction on caller and callee having the exact same signature sounds quite restrictive in practice. Comparing it with [Pre-RFC] Safe goto with value (which does a similar thing to the become keyword but for intra-function control flow instead of across functions), the reason that proposal doesn't have any requirements on labels having the same arguments is because all parameters in all labels are part of the same call frame, and when a local is used by different label than the current one it is still there in memory, just uninitialized.

If we translate it to the become approach, that basically means that each function involved in a mutual recursion would (internally) take the union of all the parameters of all of the functions, and any parameters not used by the current function would just be uninitialized. There are two limitations to this approach:

  • It changes the calling convention of the function (since you have to pad out the arguments with these uninitialized locals)
  • The calling convention depends on what other functions are called in the body

I don't think this should be handled by an actual calling convention label though. Calling these "rusttail" functions for discussion purposes, we have the following limitations:

  • You can't get a function pointer with "rusttail" convention
  • You can't use a "rusttail" function cross-crate

The user doesn't have to see either of these limitations directly though; the compiler can just generate a shim with the usual (or specified) calling convention which calls the "rusttail" function if required. Instead, they just observe the following restrictions, which are strictly weaker than the one in the RFC:

  • If you become a function in another crate, the arguments of caller and callee have to match exactly
  • If you only become a function in the same crate, there are no restrictions
  • (If f tail-calls a cross-crate function g and also an internal function h, then f,g,h must all have the same arguments)

text/0000-guaranteed-tco.md Outdated Show resolved Hide resolved
@comex
Copy link

comex commented Jul 2, 2023

I don't particularly like it, but there might be a possible way to specify that musttail semantics prevents stack consumption from increasing without causing issues. Specifically, by making stack exhaustion an observable state of the AM1 that can be entered at any point2, we can add a guarantee that in f(); f();, if the same operations are done both times and the first did not encounter stack exhaustion, that the repetition is guaranteed not to either. We could then extend the same guarantee to reentrantly calling the same function through exclusively musttail calls.

In f(); f();, one call might be inlined and the other not, resulting in different stack usage. Even in the case where the same call is performed twice in a loop (for _ in 0..2 { f(); }), the loop might be unrolled into two separate calls, at which point you have the same problem.

I don't see what's so bad about reasoning in terms of bounded stack usage. Consider termination as an analogy. Just as the compiler can legally reduce or increase stack usage, the compiler can legally make a program faster or slower. In fact, it can make the program arbitrarily slower. But it can't make it infinitely slower, i.e. change a program from terminating to non-terminating.

@digama0 said before that "the AM generally does not deal in asymptotics, either something is observable or it's not". And later, in reference to stack usage: "It is growing unboundedly, but at what point can you say that it has violated its requirements?" But in the case of termination, there is likewise no point at which you can say "this program is taking too long to run, so it must have been miscompiled". Only if you can prove that the compiled program will run for infinitely long, and the original program would not, can you say it was miscompiled.

The same should apply to stack usage. Consider a program that never terminates but, contrary to this comment, might have input and output. For example, this could include a program that keeps serving requests until it's interrupted – where the interesting case would be if it uses tail recursion between requests:

fn server() {
    serve_next_request();
    become server();
}

There is no finite level of stack consumption at which you can say "this is too much, so the program was miscompiled". But if you can prove that stack consumption will increase indefinitely, despite the program only making a bounded number of nested non-become function calls, then that should count as a miscompilation.

@CAD97
Copy link

CAD97 commented Jul 2, 2023

TL;DR at the end. I've (hopefully) said my part in full.

If some code SHOULD be tail call, and only for save a little memory under its expected inputs, then we SHOULD NOT use become. [@oxalica]

The problem with this is that under no proposal is become exclusively a musttail requirement. become also carries the semantic change in meaning compared to return that any locals and temporaries are dropped before calling the called function, instead of after.

Even locals without drop glue can pretty easily prevent TCO, because their valid lifetime still extends until after the function call. [playground] If any such potentially escaping locals are Copy, the blocker can't even be solved by inserting code (e.g. drop calls), and the function would have to be largely restructured.

This makes TCO surprisingly difficult to perform in Rust. These cases aren't that the optimizer doesn't think TCO would be desired/beneficial, but that it cannot prove that performing TCO would not change the behavior of the program.

Case 3 (TCE required for validity) may be a more pressing need than case 1 (TCO preferred for performance), sure. But case 1 is still a valid use case, and shouldn't be outright dismissed.

I think it's fairly clear that we want both become k#maytail call() and become k#musttail call() semantics available. The precise syntax used doesn't particularly matter. The reason I brought it up at all was to note that it's not inherently obvious that become must have musttail semantics, and thus the request to use bikeshed avoidance syntax for the experimental implementation is justified.

That bringing it up led to such vocal debate only reinforces that idea.

(2) can be divided into (1) if the alternative is just the same code with return instead of become [@oxalica]

If that's the only difference, it's just case 1 from the beginning, not case 2 (two different code paths, one which requires TCE and one which doesn't). Which is fundamentally why I would like to be able to just write case 1 with become k#maytail call() initially, rather than have to make separate become and return versions just to satisfy the compiler, and to get subtly different behavior between the two implementations for the trouble.

I would expect authored examples of case 2 to be extremely few, as it would require authoring two different implementations of some functionality, one of which relies on TCE, and one of which doesn't, with the TCE impl beating the iterative, but the iterative beating a trampolined compilation of the TCE version.

Generating case 2 from case 3 out of necessity, e.g. by the use of a trampoline, is the likely scenario in which it would exist.

This means that a program that used to work in a given Rust version may fail to work in the next version, after some new optimization is merged (probably after upgrading llvm) and this is completely okay, expected, and appreciated ... [@dlight]

The stability pledge of Rust 1.0 is important. That code which compiles today will continue to successfully compile with any future compiler (with very few exceptions carved out for stability without stagnation, and assuming the same library versions are used) shouldn't be trespassed against by the language without strong motivation.

It's okay if things that sit on top of/outside the language like #[no_panic] can make compilation success rely on optimization, because they aren't the compiler. The compiler is held to a very high standard, but that doesn't extend beyond the scope of the language itself. (std, since it is effectively part of the language/compiler since it is versioned and distributed alongside it, has similarly strong guarantees, but they are weaker than the languages' in some meaningful ways.)

This is why I'm hesitant to attach codegen-dependent semantics to core language functionality — it's an unnecessary mixing of layers. Compilers do in practice end up mixing layers, because doing so confers practical benefit, but it's still preferable if it's avoidable.

Which is why my main question remains if case 3 can be satisfied by a codegen guarantee (if you use become in specific ways, it's guaranteed to be TCOd) rather than a language guarantee (you're only allowed to use become in places where the codegen guarantee exists). But also, on the other hand, it's possible that maytail use cases would prefer not to have the TCE forced, leaving it to the optimizer's decision. (My experience has been that LLVM will generally always use a tail call when permissible, however.)

... because this kind of breakage may already happen (and nowadays it breaks at runtime rather than compile time) [@dlight]

There's a big difference between a static error from the compiler and a runtime error, though. Yes, it's ideal if the compiler tooling can guarantee the lack of the runtime error and/or warn you if it's possible you might hit it.

But where a runtime error only occurs when you get to it, a static error prevents the program from doing anything. It doesn't matter if the stack blowup only happens in a single code path; because you've made it into a static error, you can't even use the program to perform other tasks which aren't at risk of triggering the error condition.

Again, this is a reasonable choice to make, but it isn't a great fit for the language to force this onto users who want to be able to use some functionality.

I think that transforming become into trampolines or otherwise pessimizing execution is okay as long as the stack doesn't actually grow. [@dlight]

The compiler can indeed transform code into a trampolined form, but so can a sufficiently advanced proc macro. Given the restriction of exactly matching function signatures, it's not even that advanced of a macro.

Trampolines aren't like async, where the compiler is doing something that would be unreasonable to ask the user to do in a sound manner. Manual trampoline TCE is fairly straightforward, and I've actually written macros that do so, utilizing both single and independent compilation for different occasions.

Now, if we were asking become to create trampolines to support arbitrary signature TCE on all targets, then it'd be a bit more interesting, since that does require compiler information and can't be done by macros in the face of separate compilation. (As far as I'm aware, anyway. I know some pretty involved and powerful macro_rules! tricks, but not all of them.)

one call might be inlined and the other not, resulting in different stack usage [@comex]

Ah right, that does pose a problem to trying to make stack non-exhaustion consistent. Loop unrolling at least isn't separately problematic (since unrolled loop locals would hopefully be trivially overlapped). But this is a discussion for the UCG, if/when I actually get around to opening it.

Consider termination as an analogy. Just as the compiler can legally reduce or increase stack usage, the compiler can legally make a program faster or slower. In fact, it can make the program arbitrarily slower. But it can't make it infinitely slower, i.e. change a program from terminating to non-terminating. [@comex]

I'd double-check that, actually; Rust doesn't have the forward progress guarantee that C++ does. It's actually quite tricky to both guarantee that the next observable effect will occur while also permitting an infinite loop without observable effects to exist. As far as I'm aware, we haven't done anything to address it

Forward progress and stack usage can be QOI issues while still being guaranteed by the compiler. And to the end user, the difference needn't matter much, especially if the compiler and language are developed in tandem instead of separately.

If rustc provides a guarantee that Rust code using become k#musttail call() will get TCE and not increase stack usage unboundedly, it doesn't matter to 99.999% of developers how that guarantee is classified, just that it exists. It matters to the people trying to be exact about what the guarantee actually means, but that's our problem, not the compiler users'.

Because of that perspective, I'd much prefer the language to not needlessly make things weirder by mixing layers (e.g. by leaking codegen concerns into language validity rules) without good reason. But with good reason, it's justifiable.


Rust governance isn't a popularity contest, nor is it democratic12 — we've made our points, and the lang team will consider them and make a decision one way or another. I'd obviously prefer one way, and others can prefer another, but it's the lang team's call in the end.

So there's no need to continue to litigate this further unless bringing new information to the discussion. (E.g. data comparing how often become k#maytail call() would enable further optimization but become k#musttail call() wouldn't be valid, which would be good evidence for an argument for either to be the default mode, depending on results.)

Footnotes

  1. Yes, it's "democratic" in that the responsible teams ask for the input of the general populace through the RFC process, but they still get final call, completely independent from the general user opinions. The team decision will typically align with the general consensus (when it exists), since they want to find the ideal solution for the most people, but it doesn't have to, and the lang team has certainly made controversial-at-the-time decisions before (e.g. .await).

  2. Broadly speaking, this is a good thing, actually! Since the outcome of a more democratic process is a gradual dilution of vision and the artifact known as design by committee.

@comex
Copy link

comex commented Jul 2, 2023

@CAD97 Well, I'd expect forward progress to be guaranteed at least if you stay clear of the problem spots of (a) infinite loops with no side effects and (b) atomics. But it's true the issue hasn't seen much UCG discussion.

@Robbepop
Copy link
Contributor

Robbepop commented Jul 2, 2023

Just wanted to share a potential syntax idea I had that would fit all mentioned use cases:

become <expr>

as demanded by the RFC for musttail use cases and

become? <expr>

as demanded by @CAD97 for maytail use cases.

Both versions are short enough to be not unnecessarily verbose in either case. The only downside I see is that something like become? kinda is a "new" syntactic structure in Rust that people might find confusing at first.

However, I still think that we should for now concentrate on become <expr> as proposed by the RFC and propose something like become? <expr> as proposed by @CAD97 as a potential future extension to the feature. Not all at once but step by step!

@programmerjake
Copy link
Member

JMP instead of CALL/RET also has penalties due to the breakage of the hardwired CALL/RET branch target prediction stack.

tail calls generally don't cause problems for branch target prediction. what happens is the call/ret branch target prediction stack just doesn't get modified by the tail call, so when the tail-called function finally returns, it pops the correct return address off the prediction stack and goes to the right place.

e.g.:

pub fn foo() {
    // before bar call
    bar();
    // back in foo
}

pub fn bar() {
    // before baz call
    become baz();
}

pub fn baz() {
    // in baz
}
step stack op to run next stack before running stack op
before bar call push "back in foo" foo's caller
before baz call no-op foo's caller, back in foo
in baz pop "back in foo" foo's caller, back in foo
back in foo pop "foo's caller" foo's caller
foo's caller ... ...

@digama0
Copy link
Contributor

digama0 commented Jul 3, 2023

@digama0 said before that "the AM generally does not deal in asymptotics, either something is observable or it's not". And later, in reference to stack usage: "It is growing unboundedly, but at what point can you say that it has violated its requirements?" But in the case of termination, there is likewise no point at which you can say "this program is taking too long to run, so it must have been miscompiled". Only if you can prove that the compiled program will run for infinitely long, and the original program would not, can you say it was miscompiled.

@comex If the original program would terminate, and the compiled program does not (and I assume it does so without taking any observable steps either, since taking more observable steps than the original program would be an obs. eqv. violation), then that means that the original program took some step which took "infinitely long" to be completed in the compiled program. That step was not completed after all, and this is a violation. There is no point at which you can say that the compiled program is definitely off the rails, but the overall execution failed to complete that step.

The same should apply to stack usage. Consider a program that never terminates but, contrary to this comment, might have input and output. For example, this could include a program that keeps serving requests until it's interrupted – where the interesting case would be if it uses tail recursion between requests:

fn server() {
    serve_next_request();
    become server();
}

There is no finite level of stack consumption at which you can say "this is too much, so the program was miscompiled". But if you can prove that stack consumption will increase indefinitely, despite the program only making a bounded number of nested non-become function calls, then that should count as a miscompilation.

The difference between the previous example and this example is that now there are an infinite number of observable AM steps involved, where previously we had a finite number of AM steps and then a hang on the last one (where an infinite number of CM steps got shoved in the middle and the AM step was not completed). As long as the CM continues to be "responsive", in the sense that it is taking finitely many steps to perform each AM observable step, then there isn't anything to complain about. We are only interested in the behavior of the machine up to finitely many observable AM steps anyway, since every program delivers its results in finitely many observable steps.

This is actually encoded in the notion of observational equivalence (it is a bisimulation relation, meaning that programs which are equivalent on each finite initial segment of the trace are also equivalent on the whole trace), which means in particular that you could periodically delete some stack frames that will never again be revisited in a runaway non-tail recursive loop (with an effect in it, like your server loop example but without the become) to turn a O(n) stack usage program into a O(1) stack usage program, while preserving observational equivalence.

The conclusion is what I said at the start: the AM and the tools we use to analyze it do not make it possible to measure stack usage, asymptotically or otherwise. Changing this situation would require some rather deep opsem changes, and I will await @CAD97 's proposal.

@CAD97
Copy link

CAD97 commented Jul 3, 2023

@CAD97 's proposal [for making memory non-exhaustion observable]

Unfortunately it still ends up breaking inlining if applied to imperative code, so it's essentially dead outside of potentially it might still work for musttail. Since it's become meaningless without some form of become, I'll outline the proposal here:

  • Consider stack exhaustion as a special AM state.
  • Any AM operation is allowed to result in entering the stack exhaustion state instead of its usual semantics.
  • The effect of entering the stack exhaustion effect is left unspecified. [Note: On managed OSes, this is typically a controlled abort; on unmanaged, UB.]
  • When exiting a function via a musttail call, record the control flow path that was taken while evaluating the function.
    • If this function was itself called without musttail, this record starts a fresh musttail chain.
    • If this function was itself called with musttail, this record adds to the same musttail chain as the calling function.
  • When entering a function called with musttail, check the active musttail chain. While the control flow path evaluating the function exactly matches a prefix of a path already recorded, the stack exhaustion state must not be entered. [Note: Once the control flow path has diverged, the restriction is no longer active.] [Note: the control flow path is more than just branch coverage; e.g. a differing number of loop repetitions is a divergence between two control flow paths.]
  • [Note: Once a function entered by a musttail call exits by means other than a musttail call, the record of the musttail chain may be discarded, as it will not be referenced again.]

This makes dynamically JITing code involved in a musttail call cycle difficult, since doing so could increase stack consumption. But for a purely AOT compiler, I think this is both a weak enough requirement as to not prevent optimizations and (barely) strong enough to create the desired stack usage guarantee. I'm not fully confident, but it looks to potentially be workable, and to not interfere with inlining after loop unrolling anymore. I'm actually more worried that it doesn't quite get to a proof of finite stack consumption because there could be an infinite set of potential control flow paths through a given function.

@programmerjake
Copy link
Member

  • When entering a function called with musttail, check the active musttail chain. While the control flow path evaluating the function exactly matches a prefix of a path already recorded, the stack exhaustion state must not be entered.

this doesn't work for dynamically-sized allocas, such as unsized locals, since they can take the exact same control flow path and yet allocate a different amount of stack space. it could be modified to work by adding allocas with their sizes to the control flow paths that are compared.

@comex
Copy link

comex commented Jul 4, 2023

@digama0 I created a thread in the unsafe-code-guidelines repo (for lack of a better place) to discuss how feasible it is to formally require tail call elimination.

Going back to this RFC, I agree that it doesn't need to be blocked by a precise definition. In lieu of one, I would prefer to go with a handwavy requirement to bound stack usage rather than just calling it QoI. But whatever. There probably won't be some smart-aleck author of an alternative implementation who comes in and says "I'm not implementing proper tail calls because the spec doesn't require it". Probably.

I agree with others that it's critical for rustc, at least, to implement explicit tail calls in a way that guarantees bounded stack usage on all supported targets. So either the 'MVP WebAssembly' target needs to be meaningfully deprecated (in favor of WebAssembly with the more recent tail call feature), or we need to mark tail-callable functions with an attribute.

I don't see what's so bad about the attribute. I don't want to go in circles here, but… the RFC says of requiring an attribute that "while quite noisy it is also less flexible than the chosen approach." But requiring an exact function signature match is drastically less flexible! Regardless of whether the initial implementation allows non-matching signatures, we should want to support them eventually.

Without an attribute on tail-callable functions, supporting non-signature-matched calls would require changing the default extern "Rust" calling convention on all targets from caller-cleanup to callee-cleanup. I remember someone claiming somewhere that there might be a performance cost to that. Maybe there isn't. But right now we don't know.

If we require an attribute for now, then we preserve the ability to support non-signature-matched calls (either in the initial implementation or in the future), regardless of the feasibility of changing the default calling convention. And we also make it possible to support tail calls on MVP WebAssembly via trampolines, at least during the deprecation process.

If in the future changing the default calling convention turns out to be feasible, and MVP WebAssembly has been properly deprecated, then the attribute can just become a no-op at that point.

(Regarding a C backend, I've always wanted Rust to have one, but I don't think it's important enough to block tail calls, particularly given that it currently does not exist.)

@digama0
Copy link
Contributor

digama0 commented Jul 4, 2023

I don't see what's so bad about the attribute. I don't want to go in circles here, but… the RFC says of requiring an attribute that "while quite noisy it is also less flexible than the chosen approach." But requiring an exact function signature match is drastically less flexible! Regardless of whether the initial implementation allows non-matching signatures, we should want to support them eventually.

I'll take a moment here to pitch again my earlier suggestion to use same-crate calls instead of an attribute. The compiler can make this Just Work™ by analyzing the call structure and making shims to interface with any other code that wants to take a reference to the function pointer or call the function from outside the crate. The overall effect is that you can become any function in the same crate with an arbitrary signature, and become any function in any other crate with the same signature. This can be done with no impact on extern "Rust" fn calling convention.

@DemiMarie
Copy link

@comex: So there is actually a trick one can use to avoid needing the attribute: the only functions that need to be able to be tail called are the ones that themselves contain tail calls. Therefore, one can include “has tail calls” in the ABI of a function, and use callee-pops calling conventions/trampolines/etc for precisely the functions that themselves contain become.

@programmerjake
Copy link
Member

programmerjake commented Jul 5, 2023

@comex: So there is actually a trick one can use to avoid needing the attribute: the only functions that need to be able to be tail called are the ones that themselves contain tail calls. Therefore, one can include “has tail calls” in the ABI of a function, and use callee-pops calling conventions/trampolines/etc for precisely the functions that themselves contain become.

i doubt that will work because those functions can be called through function pointers and the cast to extern "Rust" function pointers loses the tail/not-tail ABI distinction.

@CAD97
Copy link

CAD97 commented Jul 5, 2023

While somewhat annoying to keep track of in the compiler, the workaround is quite simple: you have two entry points to the function. When called statically it uses the tail call convention, but when a function pointer is taken, it gets a small additional shim to the standard call convention.

Unless the attribute would change the function to not be extern "Rust" and/or prevent making annotated functions into function pointers, the same thing needs to be handled there as well. And the same goes for any sort of trampoline transform; the standard entry looks normal from the outside and sets up the trampoline for the real implementation.

Though it should still be noted explicitly that such a scheme of course prevents tail calling a function pointer, since it's using the standard ABI rather than the tail ABI.

@programmerjake
Copy link
Member

Though it should still be noted explicitly that such a scheme of course prevents tail calling a function pointer, since it's using the standard ABI rather than the tail ABI.

but tail calling function pointers is necessary for fast interpreters which is a major motivation for become:

union Imm {
    branch_target: *const Inst,
    value: isize,
}

struct Inst {
    run: unsafe fn(pc: *const Inst, stack: *mut u64, mem: *mut u8),
    imm: Imm,
}

unsafe fn add_imm(pc: *const Inst, stack: *mut u64, mem: *mut u8) {
    *stack = (*stack).wrapping_add((*pc).imm.value as u64);
    let pc = pc.add(1);
    become (*pc).run(pc, stack, mem)
}

unsafe fn branch_if(pc: *const Inst, stack: *mut u64, mem: *mut u8) {
    let v = *stack;
    let stack = stack.add(1);
    let pc = if v != 0 {
        (*pc).imm.branch_target
    } else {
        pc.add(1)
    };
    become (*pc).run(pc, stack, mem)
}

// more instructions...

@digama0
Copy link
Contributor

digama0 commented Jul 5, 2023

Though it should still be noted explicitly that such a scheme of course prevents tail calling a function pointer, since it's using the standard ABI rather than the tail ABI.

but tail calling function pointers is necessary for fast interpreters which is a major motivation for become:

Wow, I'm feeling quite un-heard here. You can have both! There is no technical restriction to having both same-sig function pointer calls powered by the mechanism in the RFC, and same-crate arbitrary-sig tail calls powered by a compiler transform and a shim for interfacing with code that needs the standard calling convention. You don't even need an annotation to disambiguate them, the compiler can figure out what case you are in automatically.

Your example is fine since it is covered by the mechanism specified in the RFC. And I think it will be a general pattern with become -to- function pointer that they will satisfy the same signature restriction, because you need them all to have the same function pointer type to put them in an array or what have you. The correct form of the restriction @CAD97 is highlighting is that because function pointers use the standard ABI, they must unconditionally satisfy the same signature restriction, even for same-crate calls.

@Robbepop
Copy link
Contributor

Robbepop commented Jul 5, 2023

Having the same-crate restriction in place without the restriction of same-signature for those calls would be another game changer for interpreters built on tail calls as it would allow us avoid unsafe code alltogether:

type Register = usize;

enum Trap {
    DivisionByZero,
}

enum Op {
    I32Add { result: Register, lhs: Register, rhs: Register },
    I32Div { result: Register, lhs: Register, rhs: Register },
    BrIf { condition: Register, offset: isize },
}

struct Executor {
    stack: Vec<i32>,
    ops: Vec<Op>,
    sp: usize,
    pc: usize,
}

impl Executor {
    fn dispatch(&mut self) -> Result<(), Trap> {
        match self.ops[self.pc] {
            Op::I32Add { result, lhs, rhs } => become self.execute_i32_add(result, lhs, rhs),
            Op::I32Div { result, lhs, rhs } => become self.execute_i32_div(result, lhs, rhs),
            Op::BrIf { condition, offset } => become self.execute_br_if(condition, offset),
        }
    }

    fn execute_i32_add(&mut self, result: Register, lhs: Register, rhs: Register) -> Result<(), Trap> {
        self.stack[self.sp + result] = self.stack[self.sp + lhs].wrapping_add(self.stack[self.sp + rhs]);
        self.pc += 1;
        become self.dispatch()
    }

    fn execute_i32_div(&mut self, result: Register, lhs: Register, rhs: Register) -> Result<(), Trap> {
        let rhs = self.stack[self.sp + rhs];
        if rhs == 0 {
            return Err(Trap::DivisionByZero)
        }
        self.stack[self.sp + result] = self.stack[self.sp + lhs].wrapping_div(rhs);
        self.pc += 1;
        become self.dispatch()
    }

    fn execute_br_if(&mut self, condition: Register, offset: isize) -> Result<(), Trap> {
        if self.stack[self.sp + condition] != 0 {
            self.pc = self.pc.wrapping_add_signed(offset);
        } else {
            self.pc += 1;
        }
        become self.dispatch()
    }
}

@phi-go
Copy link
Author

phi-go commented Jul 5, 2023

@comex wrote:

I don't see what's so bad about the attribute. I don't want to go in circles here, but… the RFC says of requiring an attribute that "while quite noisy it is also less flexible than the chosen approach." But requiring an exact function signature match is drastically less flexible! Regardless of whether the initial implementation allows non-matching signatures, we should want to support them eventually.

To be clear the RFC currently only discusses guaranteed TCE given matching function signatures to keep the feature as small as possible. This is also the context this sentence in the RFC should be seen in.

Regarding the attribute on the function declaration are you thinking of requiring it in addition to become (1) or automatically promote a tail call of those functions (2)?

(2) is the version discussed in the RFC and I hope it is clear why it is described as less flexible.

(1) Has not really been discussed as far as I remember. However, to my understanding, even LLVM only supports functions that have nearly identical ABI's. WebAssembly currently does not even allow TCO. In both cases I expect the push to support non-matching (or matching for WebAssembly) function signatures to be done at a later point, at that time we can still require attributes on function declarations if needed. In the meantime I would expect the WebAssembly backend to raise a compiler error as per the RFC. Though, I can see that it could be unappealing to introduce this attribute later, so I'm not sure this would be the right approach. Note that I expect that the attribute is mainly interesting for tail calling function pointers and it's "marker" would need to become part of the function signature. For same-crate static calls I think the suggestion by @digama0 should be preferred. This, however, would imply that the attribute need only be added for function that are tail called via function pointer with mismatched function signature, which could be confusing.

@scottmcm
Copy link
Member

scottmcm commented Jul 5, 2023

It would seem entirely reasonable to me to add a extern "rust-tail" fn ABI to the tail call experiment -- under a separate feature flag! -- that would allow greater flexibility in signatures that can be used in tail calls, even though that-ABI function pointers, at the cost of breaking signature compatibility with normal extern "Rust" fn pointers.

I would expect that to be less -- or at least simpler -- work than all the syntax and drop order changes in the experiment, so adding more options to try out and see how they fit with different things that people wish to be able to do sounds like a good way to help find out which way work more smoothly for things people want to be able to do.

(We could have tail_call_syntax, tail_call_same_signature, tail_call_abi feature gates, perhaps, and figure out which parts of things to RFC & stabilize, if any.)

@digama0
Copy link
Contributor

digama0 commented Jul 6, 2023

It would seem entirely reasonable to me to add a extern "rust-tail" fn ABI to the tail call experiment -- under a separate feature flag! -- that would allow greater flexibility in signatures that can be used in tail calls, even though that-ABI function pointers, at the cost of breaking signature compatibility with normal extern "Rust" fn pointers.

Is there a proposal that actually requires having a separate ABI for tail-callable functions (or is it tail-calling functions?)? The RFC proposal works just fine with extern "Rust" fn, and the same-crate compilation strategy relies on the internal functions not being directly referenced at all (it is for compiler-only use), and as such wouldn't fit as extern "rust-tail" fn since different individual functions with the same argument and returns could nevertheless have a different call ABI, depending on what other functions are involved in the call graph strongly connected component they participate in.

I think @DemiMarie mentioned that a caller-pops calling convention could be more flexible wrt varying arguments, but I'm not sure this is a good default choice and if we have a specific extern "rust-tail" fn ABI then it seems inevitable that people would use it for callee-pops since otherwise what is the point of having a separate calling convention if it's just the same as the old one? I guess there should be some plan for what this calling convention should actually be and what tradeoffs it is making before making it a mandatory part of the become syntax.

@scottmcm
Copy link
Member

scottmcm commented Jul 6, 2023

Is there a proposal that actually requires having a separate ABI for tail-callable functions (or is it tail-calling functions?)?

Well, given that LLVM has a specific tailcc calling convention, which ensures that calls in tail position are always tail-optimized.

rustc currently uses LLVM's ccc (as can be seen by the last of explicit cc in the LLVM IR output) for extern "rust", so assuming it doesn't do nothing, it seems plausibly useful for things like tables of function pointers or cross-crate becomes or something.

Do we need it? I don't know. But it at least seems like a plausible experiment, like we have an existing experiment for coldcc (rust-lang/rust#97544).

But tailcc (or fastcc) uses callee-pops, so calls that aren't TCO'd need to adjust the stack, and thus it sounds like we wouldn't necessarily want to change everything to that cc.

@WaffleLapkin
Copy link
Member

@digama0 the extern "rust-tail" fn proposal is similar to your "same-crate" one, in that it allows more code in some cases. I.e. I can imagine the end-game requirement of (musttail) become be something like

become f(...) requires at least one of the following:

  • Caller and callee function signatures match exactly (modulo lifetimes)
  • Caller and callee are defined in the same crate
  • Caller and callee both use "rust-tail" calling convention

I'll try to implement those relaxations in the experiment (once the things described in the RFC right now are fully implemented and merged...), but I also want to highlight @phi-go's mention that those relaxations can be RFC-ed separately from this RFC (and they probably should, to keep the scope smaller!).

@joshtriplett
Copy link
Member

@Robbepop Using ? with that meaning seems like a non-starter given Rust's existing usage of ?.

@joshtriplett joshtriplett removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Aug 1, 2023
@joshtriplett
Copy link
Member

Un-nominating the RFC for now.

To be explicitly clear on next steps: 👍 for going ahead with one or more experiments (including a tail-call with placeholder keyword, and a tail call calling-convention as @scottmcm suggested), blocking concerns for an experiment or RFC that uses the demo-looks-done approach of become (not dead-set against it but very much currently feeling there are better alternatives).

@Robbepop
Copy link
Contributor

Robbepop commented Aug 10, 2023

@Robbepop Using ? with that meaning seems like a non-starter given Rust's existing usage of ?.

I wouldn't say it is a non-starter. The same reasoning could have been applied to the highly debated .await syntax were .something only referred to field accesses before. I know why it isn't good to introduce "new" syntax but sometimes it takes courage to try something new. And new doesn't always mean it is a bad thing.

blocking concerns for an experiment or RFC that uses the demo-looks-done approach of become (not dead-set against it but very much currently feeling there are better alternatives).

If during all the time of the RFC thread I'd have seen a single alternative syntax that was actually cutting it, I'd wholeheartedly agree. Yet, we do not even has consensus what an agreeable placeholder syntax could look like. A bit of direction from the people who are deciding over nomination could probably help here.

In light of the RFC's un-nomination I would like to know what the status of the work behind the RFC is as this RFC thread has been very silent since quite a few weeks now. What in particular is the stance of @WaffleLapkin (RFC implementer) and @phi-go (RFC author) about the next steps provided by @joshtriplett ?

I personally just hope that this RFC won't die due to too much bikeshedding. It would be really sad to lose the momentum that has been built up for this long awaited Rust feature.

@phi-go
Copy link
Author

phi-go commented Aug 10, 2023

To my understanding we are just waiting for the implementation of the experiment. Here is the tracking issue: rust-lang/rust#112788. So un-nominating until the implementation is done seems fine to me.

Regarding the placeholder syntax, we are indeed waiting on a decision for the actual syntax. The current state is that the current implementation uses become and it seems non-trivial to change but @WaffleLapkin will know better.

I find become? quite intuitive, so I would have hoped for some contemplation. The only other candidate syntaxes that have been discussed for longer are using an attribute, and the return variation.

bors added a commit to rust-lang/rust-analyzer that referenced this pull request Feb 14, 2024
feature: Add basic support for `become` expr/tail calls

This follows rust-lang/rfcs#3407 and my WIP implementation in the compiler.

Notice that I haven't even *opened* a compiler PR (although I plan to soon), so this feature doesn't really exist outside of my WIP branches. I've used this to help me test my implementation; opening a PR before I forget.

(feel free to ignore this for now, given all of the above)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet