New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper tail calls #1888

Closed
wants to merge 8 commits into
from

Conversation

@DemiMarie

DemiMarie commented Feb 7, 2017

Rendered

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

It should be possible to implement tail calls as some sort of transformation in rustc itself, predicated on the backend allowing manipulation of the stack and supporting some form of goto. I assume that WebAssembly at least would allow for us to write our own method calls, but haven't looked at it.

C is a problem, save in the case that the become keyword is used on the function we are in (there is a name for this that I'm forgetting). In that case, you may be able to just reassign to the arguments and reuse variables. More advanced constructions might be possible by wrapping the tail calls in some sort of outer running loop and returning codes as to which to call next, but this doesn't adequately address how to go about passing arguments around. I wouldn't rule out being able to support this in ANSI C, it's just incredibly tricky.

camlorn commented Feb 7, 2017

It should be possible to implement tail calls as some sort of transformation in rustc itself, predicated on the backend allowing manipulation of the stack and supporting some form of goto. I assume that WebAssembly at least would allow for us to write our own method calls, but haven't looked at it.

C is a problem, save in the case that the become keyword is used on the function we are in (there is a name for this that I'm forgetting). In that case, you may be able to just reassign to the arguments and reuse variables. More advanced constructions might be possible by wrapping the tail calls in some sort of outer running loop and returning codes as to which to call next, but this doesn't adequately address how to go about passing arguments around. I wouldn't rule out being able to support this in ANSI C, it's just incredibly tricky.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

Okay, an outline for a specific scheme in C:

  • Post-monomorphization, find all functions that use become. Build a list of the possible tail calls that may be reached from each of these functions.

  • Declare a C variable for all variables in all the functions making up the set. Ad a variable, code, that says which function we're in. code of 0 means we're done. Add another variable, ret, to hold the return value.

  • Give each function a nonzero code and copy the body into a while loop that dispatches based on the code variable. When the while loop exits, return ret. Any instances of return are translated into an assignment to ret and setting code to 0.

  • When entering a tailcall function, redirect the call to the special version instead.

I believe this scheme works in all cases. We can deal with the issue of things that have Drop impls by dropping them: the variable can stay around without a problem, as long as the impl gets called. The key point is that we're declaring the slots up front so that the stack doesn't keep growing. The biggest disadvantage is that we have to declare all the slots for all the functions, and consequently the combined stack frame is potentially (much?) larger than if we had done it in the way the RFC currently proposes. If making sure there is parody in terms of performance is a concern, this could be the used scheme in all backends. Nonetheless, it works for any backend which C can be compiled to.

Unless I'm missing something obvious, anyway.

camlorn commented Feb 7, 2017

Okay, an outline for a specific scheme in C:

  • Post-monomorphization, find all functions that use become. Build a list of the possible tail calls that may be reached from each of these functions.

  • Declare a C variable for all variables in all the functions making up the set. Ad a variable, code, that says which function we're in. code of 0 means we're done. Add another variable, ret, to hold the return value.

  • Give each function a nonzero code and copy the body into a while loop that dispatches based on the code variable. When the while loop exits, return ret. Any instances of return are translated into an assignment to ret and setting code to 0.

  • When entering a tailcall function, redirect the call to the special version instead.

I believe this scheme works in all cases. We can deal with the issue of things that have Drop impls by dropping them: the variable can stay around without a problem, as long as the impl gets called. The key point is that we're declaring the slots up front so that the stack doesn't keep growing. The biggest disadvantage is that we have to declare all the slots for all the functions, and consequently the combined stack frame is potentially (much?) larger than if we had done it in the way the RFC currently proposes. If making sure there is parody in terms of performance is a concern, this could be the used scheme in all backends. Nonetheless, it works for any backend which C can be compiled to.

Unless I'm missing something obvious, anyway.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Feb 7, 2017

Contributor

@camlorn Does that work for function pointers? I don't immediately see any reason it wouldn't, just the "Build a list of the possible tail calls that may be reached from each of these functions." snippet which sticks out otherwise, because in the interesting cases it's presumably "all of them"?

(Also, this feels very like defunctionalization? Is it?)

Contributor

glaebhoerl commented Feb 7, 2017

@camlorn Does that work for function pointers? I don't immediately see any reason it wouldn't, just the "Build a list of the possible tail calls that may be reached from each of these functions." snippet which sticks out otherwise, because in the interesting cases it's presumably "all of them"?

(Also, this feels very like defunctionalization? Is it?)

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

@glaebhoerl
Can you become a function pointer? This was not how I read the RFC, though it would make sense if this were the case. Nonetheless, you are correct: if function pointers are allowed, this probably does indeed break my scheme. It might be possible to get around it, somehow.

I don't know what defunctionalization is. Is this defunctionalization? I'll get back to you once I learn a new word.

camlorn commented Feb 7, 2017

@glaebhoerl
Can you become a function pointer? This was not how I read the RFC, though it would make sense if this were the case. Nonetheless, you are correct: if function pointers are allowed, this probably does indeed break my scheme. It might be possible to get around it, somehow.

I don't know what defunctionalization is. Is this defunctionalization? I'll get back to you once I learn a new word.

@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Feb 7, 2017

Contributor

Is there any benchmarking data regarding the callee-pops calling convention?
AFAICT Windows uses such a calling convention stdcall for most APIs.
I have repeatedly looked for benchmarks comparing stdcall to cdecl, but I have only found minor differences (in either direction, possibly related to the interaction with optimisations) and I was unable to find something providing a conclusive answer on which one results in better performance.

Contributor

ranma42 commented Feb 7, 2017

Is there any benchmarking data regarding the callee-pops calling convention?
AFAICT Windows uses such a calling convention stdcall for most APIs.
I have repeatedly looked for benchmarks comparing stdcall to cdecl, but I have only found minor differences (in either direction, possibly related to the interaction with optimisations) and I was unable to find something providing a conclusive answer on which one results in better performance.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

@ranma42
I'm not sure why there would be a difference: either you do your jmp for return and then pop or you pop and then do your jmp for return, but in either case someone is popping the same amount of stuff?

Also, why does it matter here?

camlorn commented Feb 7, 2017

@ranma42
I'm not sure why there would be a difference: either you do your jmp for return and then pop or you pop and then do your jmp for return, but in either case someone is popping the same amount of stuff?

Also, why does it matter here?

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

@glaebhoerl
Apparently today is idea day:

Instead of making the outer loop be inside a function that declares all the needed variables, make the outer loop something that expects a struct morally equivalent to the tuple (int code, void* ptr, void* args), then have it cast ptr to the appropriate function pointer type by switching on code, cast args to a function-pointer-specific argument structure, then call the function pointer. It should be possible to get the args struct to be inline as opposed to an additional level of indirection somehow, but I'm not sure how to do it without violating strict aliasing. This has the advantage of making the stack frame roughly the same size as what it would be in the LLVM backend, but the disadvantage of being slower (but maybe we can sometimes use the faster while-loop with switch statement approach).

I don't think this is defunctionalization, based off a quick google of that term.

camlorn commented Feb 7, 2017

@glaebhoerl
Apparently today is idea day:

Instead of making the outer loop be inside a function that declares all the needed variables, make the outer loop something that expects a struct morally equivalent to the tuple (int code, void* ptr, void* args), then have it cast ptr to the appropriate function pointer type by switching on code, cast args to a function-pointer-specific argument structure, then call the function pointer. It should be possible to get the args struct to be inline as opposed to an additional level of indirection somehow, but I'm not sure how to do it without violating strict aliasing. This has the advantage of making the stack frame roughly the same size as what it would be in the LLVM backend, but the disadvantage of being slower (but maybe we can sometimes use the faster while-loop with switch statement approach).

I don't think this is defunctionalization, based off a quick google of that term.

@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Feb 7, 2017

Contributor

@camlorn That is my opinion, too, but it is mentioned as "one major drawback of proper tail calls" in the current RFC

Contributor

ranma42 commented Feb 7, 2017

@camlorn That is my opinion, too, but it is mentioned as "one major drawback of proper tail calls" in the current RFC

Show outdated Hide outdated 0000-template.md
Later phases in the compiler assert that these requirements are met.
New nodes are added in HIR and HAIR to correspond to `become`. In MIR, however,
a new flag is added to the `TerminatorKind::Call` varient. This flag is only

This comment has been minimized.

@mglagla

mglagla Feb 7, 2017

Typo: varient -> variant

@mglagla

mglagla Feb 7, 2017

Typo: varient -> variant

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 7, 2017

@camlorn @ranma42 The drawback of a callee-pops calling convention is that for caller-pops calling conventions, much of the stack pointer motion can be eliminated by the optimizer, since it is all in one function. However, with a callee-pops calling convention, you might be able to do the same thing in the callee – but I don't think you gain anything except on Windows, due to the red zone which Windows doesn't have.

I really don't know what I am talking about on the performance front though. Easy way to find out would be to patch the LLVM bindings that Rust uses to always enable tail calls at the LLVM level, then build the compiler, and finally see if the modified compiler is faster or slower than the original.

DemiMarie commented Feb 7, 2017

@camlorn @ranma42 The drawback of a callee-pops calling convention is that for caller-pops calling conventions, much of the stack pointer motion can be eliminated by the optimizer, since it is all in one function. However, with a callee-pops calling convention, you might be able to do the same thing in the callee – but I don't think you gain anything except on Windows, due to the red zone which Windows doesn't have.

I really don't know what I am talking about on the performance front though. Easy way to find out would be to patch the LLVM bindings that Rust uses to always enable tail calls at the LLVM level, then build the compiler, and finally see if the modified compiler is faster or slower than the original.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 7, 2017

@camlorn My intent was that one can become any function or method that uses the Rust ABI or the rust-call ABI (both of which lower to LLVM fastcc), provided that the return types match. Haven't thought about function pointers, but I believe that tail calls on trait object methods are an equivalent problem.

@camlorn My intent was that one can become any function or method that uses the Rust ABI or the rust-call ABI (both of which lower to LLVM fastcc), provided that the return types match. Haven't thought about function pointers, but I believe that tail calls on trait object methods are an equivalent problem.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 7, 2017

@DemiMarie
Good point. They are.

I think my latest idea works out, but I'm not quite sure where you put the supporting structs without heap allocation. I do agree that not being able to do it in all backends might sadly be a deal breaker.

Is there a reason that Rustc doesn't already always enable tail calls in release mode?

camlorn commented Feb 7, 2017

@DemiMarie
Good point. They are.

I think my latest idea works out, but I'm not quite sure where you put the supporting structs without heap allocation. I do agree that not being able to do it in all backends might sadly be a deal breaker.

Is there a reason that Rustc doesn't already always enable tail calls in release mode?

Show outdated Hide outdated 0000-template.md
[implementation]: #implementation
A current, mostly-functioning implementation can be found at
[DemiMarie/rust/tree/explicit-tailcalls](/DemiMarie/rust/tree/explicit-tailcalls).

This comment has been minimized.

@cramertj

cramertj Feb 7, 2017

Member

This 404s for me.

@cramertj

cramertj Feb 7, 2017

Member

This 404s for me.

@cramertj

This comment has been minimized.

Show comment
Hide comment
@cramertj

cramertj Feb 8, 2017

Member

Is there any particular reason this RFC specifies that become should be implemented at an LLVM level rather than through some sort of MIR transformation? I don't know how they work, but it seems like maybe StorageLive and StorageDead could be used to mark the callee's stack as expired prior to the function call.

Member

cramertj commented Feb 8, 2017

Is there any particular reason this RFC specifies that become should be implemented at an LLVM level rather than through some sort of MIR transformation? I don't know how they work, but it seems like maybe StorageLive and StorageDead could be used to mark the callee's stack as expired prior to the function call.

@archshift

Just a note:

You shouldn't be changing the template file, but rather copying the template to a new file (0000-proper-tail-calls.md) and changing that!

@archshift

This comment has been minimized.

Show comment
Hide comment
@archshift

archshift Feb 8, 2017

Contributor

I wonder if one can simulate the behavior of computed goto dispatch using these tail calls. That would be pretty neat indeed!

Contributor

archshift commented Feb 8, 2017

I wonder if one can simulate the behavior of computed goto dispatch using these tail calls. That would be pretty neat indeed!

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 8, 2017

@archshift There is a better way to do that (get rustc to emit the appropriate LLVM IR for a loop wrapped around a match when told to do so, perhaps by an attribute).

@archshift There is a better way to do that (get rustc to emit the appropriate LLVM IR for a loop wrapped around a match when told to do so, perhaps by an attribute).

@DemiMarie

This comment has been minimized.

Show comment
Hide comment

@archshift done.

@Stebalien

This comment has been minimized.

Show comment
Hide comment
@Stebalien

Stebalien Feb 8, 2017

Contributor

As a non-FP/non-PL person, it would be really nice to see some concrete examples of where become is nicer than a simple while loop. Personally, I only ever use recursion when I want a stack.

Contributor

Stebalien commented Feb 8, 2017

As a non-FP/non-PL person, it would be really nice to see some concrete examples of where become is nicer than a simple while loop. Personally, I only ever use recursion when I want a stack.

@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Feb 8, 2017

Contributor

@Stebalien a case where they are typically nicer than a loop is when they are used to encode (the states of a) state machine. That is because instead of explicitly looping and changing the state, it is sufficient to call the appropriate function (i.e. the state is implicitly encoded by the function being run at that time). Note that this often makes it easier for the compiler to detect optimisation opportunities, as in some cases a state can trivially be inlined.

Contributor

ranma42 commented Feb 8, 2017

@Stebalien a case where they are typically nicer than a loop is when they are used to encode (the states of a) state machine. That is because instead of explicitly looping and changing the state, it is sufficient to call the appropriate function (i.e. the state is implicitly encoded by the function being run at that time). Note that this often makes it easier for the compiler to detect optimisation opportunities, as in some cases a state can trivially be inlined.

@Stebalien

This comment has been minimized.

Show comment
Hide comment
@Stebalien

Stebalien Feb 8, 2017

Contributor

@ranma42 I see. Usually, I'd just put the state in an enum and use a while + match loop but I can see how become with a bunch of individual functions could be cleaner. Thanks!

Contributor

Stebalien commented Feb 8, 2017

@ranma42 I see. Usually, I'd just put the state in an enum and use a while + match loop but I can see how become with a bunch of individual functions could be cleaner. Thanks!

@sgrif

This comment has been minimized.

Show comment
Hide comment
@sgrif

sgrif Feb 8, 2017

Contributor

Should this RFC include at least one example of what this syntax looks like in use? (e.g. an entire function body)

Contributor

sgrif commented Feb 8, 2017

Should this RFC include at least one example of what this syntax looks like in use? (e.g. an entire function body)

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Feb 8, 2017

A good example snippet would go a long way. 👍 overall as the surface is fairly small and really helps rust functional mojo.

A good example snippet would go a long way. 👍 overall as the surface is fairly small and really helps rust functional mojo.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 8, 2017

Pinging @thepowersgang because they are the only person working on an alternative Rust compiler to the best of my knowledge, and because since their compiler (mrustc) compiles via C they would need to implement one of the above solutions.

Pinging @thepowersgang because they are the only person working on an alternative Rust compiler to the best of my knowledge, and because since their compiler (mrustc) compiles via C they would need to implement one of the above solutions.

@mrhota

This comment has been minimized.

Show comment
Hide comment
@mrhota

mrhota Feb 9, 2017

Isn't it clearer and more correct to call this explicit tail call optimization/elimination? A "proper" or "explicit" tail call is nothing more than a tail call, perhaps with explicit annotation.

But what the RFC discusses is optimizing explicitly annotated tail calls, right?

My confusion compounds: in the Portability section, we learn that LLVM does not support "proper tail calls" for MIPS and WebAssembly. Does that mean LLVM will not accept a call as the final instruction before a ret on those platforms? Or does that mean that it will not optimize the call in the way described above?

mrhota commented Feb 9, 2017

Isn't it clearer and more correct to call this explicit tail call optimization/elimination? A "proper" or "explicit" tail call is nothing more than a tail call, perhaps with explicit annotation.

But what the RFC discusses is optimizing explicitly annotated tail calls, right?

My confusion compounds: in the Portability section, we learn that LLVM does not support "proper tail calls" for MIPS and WebAssembly. Does that mean LLVM will not accept a call as the final instruction before a ret on those platforms? Or does that mean that it will not optimize the call in the way described above?

@mrhota

This comment has been minimized.

Show comment
Hide comment
@mrhota

mrhota Feb 9, 2017

Do we want to support optimizing explicit mutual tail recursion? If so, can we see an example in the RFC using become?

mrhota commented Feb 9, 2017

Do we want to support optimizing explicit mutual tail recursion? If so, can we see an example in the RFC using become?

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 9, 2017

@mrhota LLVM will fail to turn a call into a jump for MIPS and WebAssembly, even if the call has the tail prefix and is followed immediately by a ret.

@mrhota LLVM will fail to turn a call into a jump for MIPS and WebAssembly, even if the call has the tail prefix and is followed immediately by a ret.

@aturon aturon added the T-lang label Feb 9, 2017

@mrhota

This comment has been minimized.

Show comment
Hide comment
@mrhota

mrhota Feb 10, 2017

@DemiMarie I see. My nit was with the choice of terminology. call ...; ret (or tail call ...; ret) is the very definition of a ("proper") tail call, no matter what LLVM or some other compiler does (or doesn't do) with it. Optimizing it to a jump is called tail call optimization/elimination.

mrhota commented Feb 10, 2017

@DemiMarie I see. My nit was with the choice of terminology. call ...; ret (or tail call ...; ret) is the very definition of a ("proper") tail call, no matter what LLVM or some other compiler does (or doesn't do) with it. Optimizing it to a jump is called tail call optimization/elimination.

@andrestesti

This comment has been minimized.

Show comment
Hide comment
@andrestesti

andrestesti Feb 10, 2017

A #[tailrec] attribute decorating the function doesn't require a reserved word, and is easier to read and declare than a intrusive statement like become. It is also declarative, you don't need to modify your executable code to check an optimization. If the compiler couldn't optimize a #[tailrec] function into a loop, it would raise an error (and maybe a suggestion). You will never get a non-optimizable function, since it won't compile. The annotation tailrec works fine in Scala, I think Rust should follow the same approach.
Another quirk with the become keyword, is that it is not symmetric with return keyword omission. It hits against expression/functional code styling, while you are trying to use a very functional construction block as recursion is.

fn foo(x: i32, accu: i32) -> i32 {
    if x < 0 {
        // return omission, functional like style
        foo(-x,1) + accu
    } else {
        // asimmetry, imperative like style
        become foo(x-1, x*accu);
    }
}

andrestesti commented Feb 10, 2017

A #[tailrec] attribute decorating the function doesn't require a reserved word, and is easier to read and declare than a intrusive statement like become. It is also declarative, you don't need to modify your executable code to check an optimization. If the compiler couldn't optimize a #[tailrec] function into a loop, it would raise an error (and maybe a suggestion). You will never get a non-optimizable function, since it won't compile. The annotation tailrec works fine in Scala, I think Rust should follow the same approach.
Another quirk with the become keyword, is that it is not symmetric with return keyword omission. It hits against expression/functional code styling, while you are trying to use a very functional construction block as recursion is.

fn foo(x: i32, accu: i32) -> i32 {
    if x < 0 {
        // return omission, functional like style
        foo(-x,1) + accu
    } else {
        // asimmetry, imperative like style
        become foo(x-1, x*accu);
    }
}
@thepowersgang

This comment has been minimized.

Show comment
Hide comment
@thepowersgang

thepowersgang Aug 31, 2017

Contributor

musttail appears to have a very restricted set of valid uses - mainly that the caller and callee must have (almost) the same signature. I assume that outside of that set it can't be defined for all platforms (and will probably error in IR validation?)

  • The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space.
Contributor

thepowersgang commented Aug 31, 2017

musttail appears to have a very restricted set of valid uses - mainly that the caller and callee must have (almost) the same signature. I assume that outside of that set it can't be defined for all platforms (and will probably error in IR validation?)

  • The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space.
@jhjourdan

This comment has been minimized.

Show comment
Hide comment
@jhjourdan

jhjourdan Sep 1, 2017

Right, sorry, I should have read the whole paragraph following musttail in the docs. It seems like the purpose of this attribute is to force tail call optimization even if not using the fastcc calling convention.

But still, the tail attribute does not have this restriction and has the guarantee of succeeding under some not-so-terrible restrictions.

Right, sorry, I should have read the whole paragraph following musttail in the docs. It seems like the purpose of this attribute is to force tail call optimization even if not using the fastcc calling convention.

But still, the tail attribute does not have this restriction and has the guarantee of succeeding under some not-so-terrible restrictions.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Sep 2, 2017

@thepowersgang

This comment has been minimized.

Show comment
Hide comment
@thepowersgang

thepowersgang Sep 2, 2017

Contributor

There is the option of accepting this RFC in a weaker form - where become is an optimisation that moves all destructor calls to before the returning function call (opening up the way for LLVM/other to make it a tail call).

Contributor

thepowersgang commented Sep 2, 2017

There is the option of accepting this RFC in a weaker form - where become is an optimisation that moves all destructor calls to before the returning function call (opening up the way for LLVM/other to make it a tail call).

@jhjourdan

This comment has been minimized.

Show comment
Hide comment
@jhjourdan

jhjourdan Sep 2, 2017

I don't think that this is a better solution compared to supporting become only on platforms that do support TCO. Indeed, what you are proposing essentially corresponds to delaying a failure from compile time (i.e., become is rejected) to runtime (i.e., stack overflow), which is usually not Rust's philosophy.

I don't think that this is a better solution compared to supporting become only on platforms that do support TCO. Indeed, what you are proposing essentially corresponds to delaying a failure from compile time (i.e., become is rejected) to runtime (i.e., stack overflow), which is usually not Rust's philosophy.

@rkruppe

This comment has been minimized.

Show comment
Hide comment
@rkruppe

rkruppe Sep 2, 2017

Contributor

Accepting only the become keyword and its largest semantic impact (early drops) right now does not mean we can't later guarantee tail calls, either everywhere or in certain cases (platforms, restrictions on calling conventions, only for known callees, etc.). In the latter case, there could be an opt-in warning/error for code that doesn't fall in those cases, so that programmers can make sure that code they write gets TCO.
And in the mean time, or even afterwards in cases where we don't guarantee TCO, become could still be useful to sometimes allow a slight optimization.

I do think, though, that an accepted RFC along those lines should come with a commitment that some sort of guaranteed TCO is coming (and ideally, an idea of when it will be possible). Reserving a keyword and messing with drop order doesn't seem worthwhile if it would just occasionally, maybe, shave a few instructions off a call.


To be quite honest, after the implementation difficulties described in this thread, I am personally unsure if there realistically can be a satisfactory implementation of guaranteed TCO. At the least, I would consider anything that places strict requirements on function signatures, or introduces costs such as trampolines, to be deeply unsatisfactory. So I'm not really arguing for anything here, I'm just saying provisional acceptance without a settled implementation could potentially be useful.

Contributor

rkruppe commented Sep 2, 2017

Accepting only the become keyword and its largest semantic impact (early drops) right now does not mean we can't later guarantee tail calls, either everywhere or in certain cases (platforms, restrictions on calling conventions, only for known callees, etc.). In the latter case, there could be an opt-in warning/error for code that doesn't fall in those cases, so that programmers can make sure that code they write gets TCO.
And in the mean time, or even afterwards in cases where we don't guarantee TCO, become could still be useful to sometimes allow a slight optimization.

I do think, though, that an accepted RFC along those lines should come with a commitment that some sort of guaranteed TCO is coming (and ideally, an idea of when it will be possible). Reserving a keyword and messing with drop order doesn't seem worthwhile if it would just occasionally, maybe, shave a few instructions off a call.


To be quite honest, after the implementation difficulties described in this thread, I am personally unsure if there realistically can be a satisfactory implementation of guaranteed TCO. At the least, I would consider anything that places strict requirements on function signatures, or introduces costs such as trampolines, to be deeply unsatisfactory. So I'm not really arguing for anything here, I'm just saying provisional acceptance without a settled implementation could potentially be useful.

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Sep 3, 2017

I don't see a reason a add a keyword for something you can already do with an extra pair of curly braces. Making sure that rustc generates tail-optimizable code wherever possible is a separate issue, that can be solved without adding keywords (maybe an attribute would work better, like #[tail], used the same as existing #[inline].

le-jzr commented Sep 3, 2017

I don't see a reason a add a keyword for something you can already do with an extra pair of curly braces. Making sure that rustc generates tail-optimizable code wherever possible is a separate issue, that can be solved without adding keywords (maybe an attribute would work better, like #[tail], used the same as existing #[inline].

@RalfJung

This comment has been minimized.

Show comment
Hide comment
@RalfJung

RalfJung Sep 3, 2017

Member

I don't see a reason a add a keyword for something you can already do with an extra pair of curly braces.

Can it, though? You also need to manually drop all your arguments that are not forwarded.

Member

RalfJung commented Sep 3, 2017

I don't see a reason a add a keyword for something you can already do with an extra pair of curly braces.

Can it, though? You also need to manually drop all your arguments that are not forwarded.

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Sep 3, 2017

Good point. I forgot about those. Still, a keyword seems a bit much.

le-jzr commented Sep 3, 2017

Good point. I forgot about those. Still, a keyword seems a bit much.

@RalfJung

This comment has been minimized.

Show comment
Hide comment
@RalfJung

RalfJung Sep 3, 2017

Member

However, an attribute doesn't solve the problem where some returns are tail calls and some are not.

Member

RalfJung commented Sep 3, 2017

However, an attribute doesn't solve the problem where some returns are tail calls and some are not.

@cramertj

This comment has been minimized.

Show comment
Hide comment
@cramertj

cramertj Sep 3, 2017

Member

I'm not knowledgeable enough about trans to know the answer: would it be possible to compile a tail-recursive function to "musttail" when possible, and a trampoline otherwise? It sounds complicated, for sure, but is it possible? It'd be nice to offer trampolines as a less performant but stack-destroying fallback from native tail calls.

Member

cramertj commented Sep 3, 2017

I'm not knowledgeable enough about trans to know the answer: would it be possible to compile a tail-recursive function to "musttail" when possible, and a trampoline otherwise? It sounds complicated, for sure, but is it possible? It'd be nice to offer trampolines as a less performant but stack-destroying fallback from native tail calls.

@rkruppe

This comment has been minimized.

Show comment
Hide comment
@rkruppe

rkruppe Sep 3, 2017

Contributor

Detecting whether musttail will be applicable seems feasible. However, caller and callee both need to be clued in on the trampoline (you can't call a thunk-returning function expecting it to return its return value, or vice versa), so this strategy would still have some unfortunate limitations. For example, it couldn't in general support tail calls to function pointers or trait object methods (unless we eat the huge cost of generating a trampoline-enabled variant of every function that has its address taken).

Besides these technical problems, I am also philosphically unhappy with aying the cost of trampolines at all, especially if it happens silently and as commonly as it would with the severe restrictions on musttail. While some people may only care about not overflowing the stack, in many cases (e.g., for state machines) the tail call must be cheaper than some alternative implementation strategy (in the state machine case, loop + match) to be really useful.

Contributor

rkruppe commented Sep 3, 2017

Detecting whether musttail will be applicable seems feasible. However, caller and callee both need to be clued in on the trampoline (you can't call a thunk-returning function expecting it to return its return value, or vice versa), so this strategy would still have some unfortunate limitations. For example, it couldn't in general support tail calls to function pointers or trait object methods (unless we eat the huge cost of generating a trampoline-enabled variant of every function that has its address taken).

Besides these technical problems, I am also philosphically unhappy with aying the cost of trampolines at all, especially if it happens silently and as commonly as it would with the severe restrictions on musttail. While some people may only care about not overflowing the stack, in many cases (e.g., for state machines) the tail call must be cheaper than some alternative implementation strategy (in the state machine case, loop + match) to be really useful.

@jhjourdan

This comment has been minimized.

Show comment
Hide comment
@jhjourdan

jhjourdan Sep 4, 2017

@rkruppe The point is that tail has fewer restrictions than musttail, but still, when the llvm compiler is given the right options on the right architectures (and when using the right calling convention), the tail calls are guaranteed.

@rkruppe The point is that tail has fewer restrictions than musttail, but still, when the llvm compiler is given the right options on the right architectures (and when using the right calling convention), the tail calls are guaranteed.

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Sep 4, 2017

(unless we eat the huge cost of generating a trampoline-enabled variant of every function that has its address taken)

A combination of an attribute and a keyword/intrinsic could work. So you'd use become or analogous to make the tail call, but the caller/callee/both would additionally have to be annotated with #[tail], which would generate the necessary sauce on platforms where LLVM doesn't support tail calls natively. Naturally, the compiler would select the most efficient strategy on a given platform.

I don't think the overhead of trampolines is a dealbreaker here. It's better than just failing to compile on some platforms, and vastly better than exhausting stack at runtime.

le-jzr commented Sep 4, 2017

(unless we eat the huge cost of generating a trampoline-enabled variant of every function that has its address taken)

A combination of an attribute and a keyword/intrinsic could work. So you'd use become or analogous to make the tail call, but the caller/callee/both would additionally have to be annotated with #[tail], which would generate the necessary sauce on platforms where LLVM doesn't support tail calls natively. Naturally, the compiler would select the most efficient strategy on a given platform.

I don't think the overhead of trampolines is a dealbreaker here. It's better than just failing to compile on some platforms, and vastly better than exhausting stack at runtime.

@rkruppe

This comment has been minimized.

Show comment
Hide comment
@rkruppe

rkruppe Sep 4, 2017

Contributor

@jhjourdan

@rkruppe The point is that tail has fewer restrictions than musttail, but still, when the llvm compiler is given the right options on the right architectures (and when using the right calling convention), the tail calls are guaranteed.

I'm not sure which point you're referring to, are you talking about @cramertj's proposal?

@le-jzr

A combination of an attribute and a keyword/intrinsic could work. So you'd use become or analogous to make the tail call, but the caller/callee/both would additionally have to be annotated with #[tail], which would generate the necessary sauce on platforms where LLVM doesn't support tail calls natively. Naturally, the compiler would select the most efficient strategy on a given platform.

This still wouldn't work with unknown callees (as you don't know if the callee has been annotated with that attribute), which is what I was talking about in the part you quote. If the callee is known, you can already generate the trampoline variant lazily (i.e., when there's actually a become call to that callee).

I don't think the overhead of trampolines is a dealbreaker here. It's better than just failing to compile on some platforms, and vastly better than exhausting stack at runtime.

It may not be for your use cases, but as I said, for other use cases -- such as efficient state machines -- trampolines are unsuitable. That is not to say exhausting the stack or failing to compile would be better, but there are other alternatives that are probably faster than use constant stack space, work on all platforms (with consistent performance, unlike sometimes-automagically generated trampolines), and are likely faster than trampolines.

Furthermore, while trampolines are annoying to write by hand, they don't require nearly as much integration with the compiler (edit: ... as proper tail calls) to generate automatically. If you're okay with trampolines, and would be okay with modifying functions that would return thunks (e.g., adding an attribute), you can already generate working trampolines with some macros and slightly uglier syntax. So I am not convinced that we need the become keyword and its semantics implications if it would only satisfy the people who are okay with trampolines.

Contributor

rkruppe commented Sep 4, 2017

@jhjourdan

@rkruppe The point is that tail has fewer restrictions than musttail, but still, when the llvm compiler is given the right options on the right architectures (and when using the right calling convention), the tail calls are guaranteed.

I'm not sure which point you're referring to, are you talking about @cramertj's proposal?

@le-jzr

A combination of an attribute and a keyword/intrinsic could work. So you'd use become or analogous to make the tail call, but the caller/callee/both would additionally have to be annotated with #[tail], which would generate the necessary sauce on platforms where LLVM doesn't support tail calls natively. Naturally, the compiler would select the most efficient strategy on a given platform.

This still wouldn't work with unknown callees (as you don't know if the callee has been annotated with that attribute), which is what I was talking about in the part you quote. If the callee is known, you can already generate the trampoline variant lazily (i.e., when there's actually a become call to that callee).

I don't think the overhead of trampolines is a dealbreaker here. It's better than just failing to compile on some platforms, and vastly better than exhausting stack at runtime.

It may not be for your use cases, but as I said, for other use cases -- such as efficient state machines -- trampolines are unsuitable. That is not to say exhausting the stack or failing to compile would be better, but there are other alternatives that are probably faster than use constant stack space, work on all platforms (with consistent performance, unlike sometimes-automagically generated trampolines), and are likely faster than trampolines.

Furthermore, while trampolines are annoying to write by hand, they don't require nearly as much integration with the compiler (edit: ... as proper tail calls) to generate automatically. If you're okay with trampolines, and would be okay with modifying functions that would return thunks (e.g., adding an attribute), you can already generate working trampolines with some macros and slightly uglier syntax. So I am not convinced that we need the become keyword and its semantics implications if it would only satisfy the people who are okay with trampolines.

@jhjourdan

This comment has been minimized.

Show comment
Hide comment
@jhjourdan

jhjourdan Sep 4, 2017

@rkruppe

I'm not sure which point you're referring to, are you talking about @cramertj's proposal?

No. I am just saying that musttail is not the only way to get guaranteed tail calls in LLVM. More precisely, I am referring to the following paragraph in LLVM docs:

Tail call optimization for calls marked tail is guaranteed to occur if the following conditions are met:

Caller and callee both have the calling convention fastcc.
The call is in tail position (ret immediately follows call and ret uses value of call or is void).
Option -tailcallopt is enabled, or llvm::GuaranteedTailCallOpt is true.
Platform-specific constraints are met.

@rkruppe

I'm not sure which point you're referring to, are you talking about @cramertj's proposal?

No. I am just saying that musttail is not the only way to get guaranteed tail calls in LLVM. More precisely, I am referring to the following paragraph in LLVM docs:

Tail call optimization for calls marked tail is guaranteed to occur if the following conditions are met:

Caller and callee both have the calling convention fastcc.
The call is in tail position (ret immediately follows call and ret uses value of call or is void).
Option -tailcallopt is enabled, or llvm::GuaranteedTailCallOpt is true.
Platform-specific constraints are met.
@rkruppe

This comment has been minimized.

Show comment
Hide comment
@rkruppe

rkruppe Sep 4, 2017

Contributor

Okay. I am aware of that, but have (lazily) talked about musttail because @cramertj did. I don't believe the additional cases where TCO is guaranteed by tail significantly shift the balance.

Contributor

rkruppe commented Sep 4, 2017

Okay. I am aware of that, but have (lazily) talked about musttail because @cramertj did. I don't believe the additional cases where TCO is guaranteed by tail significantly shift the balance.

@scottmcm

This comment has been minimized.

Show comment
Hide comment
@scottmcm

scottmcm Sep 4, 2017

Member

What about an initial version of this that only allows tail recursion, not tail calls? That trivially meets the musttail requirements, and is quite useful when dealing with exclusive borrows, particularly &mut [T].

Member

scottmcm commented Sep 4, 2017

What about an initial version of this that only allows tail recursion, not tail calls? That trivially meets the musttail requirements, and is quite useful when dealing with exclusive borrows, particularly &mut [T].

@likeabbas

This comment has been minimized.

Show comment
Hide comment
@likeabbas

likeabbas Jan 24, 2018

I apologize if this is not the place for me to post this, but I was wondering if there has been any updates on this RFC? A basic version that only allows for tail recursion as described by @scottmcm would be a huge improvement to the language imo

I apologize if this is not the place for me to post this, but I was wondering if there has been any updates on this RFC? A basic version that only allows for tail recursion as described by @scottmcm would be a huge improvement to the language imo

@isiahmeadows isiahmeadows referenced this pull request in fantasyland/fantasy-land Jan 25, 2018

Open

Fantasy Land proposal process for ECMAScript #204

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 26, 2018

Contributor

I think that if we were to do tail-calls, this RFC is roughly how I would want to do them. For example, I prefer the idea of running destructors and dropping state early, and I like using the become keyword.

However, I do not think it's really an option to change to a "callee pops" style. I think it's a crucial selling point of Rust that it compiles down to code that is basically the same as C -- it's ok for us to diverge slightly in our calling convention, but we have to be very careful there, particularly if it can lead to a performance hit.

That said, I'd like a point of clarification. Could we in some way "contain" the callee-pops convention? For example, imagine that we had to declare functions as tail recursive, and that allowed the function to contain a become or to be the target of a become, and we disallow 'indirect' become for now. Then perhaps the "callee pops" effect could be quarantined to the tail recursive bit of your program?

I am still kind of wary in general, just because this seems like a semi-niche feature that will add complexity and maintenance burden across the board. The portability hazards are significant as well. Then again, JS supposedly has tail recursion now, so maybe people are becoming familiar with the concept (and I know some things are much nicer when you can tail recurse). (One final point is that I am not sure how problematic the borrowing restrictions and so forth would prove to be, though obviously I see why we need some such restrictions.)

Contributor

nikomatsakis commented Jan 26, 2018

I think that if we were to do tail-calls, this RFC is roughly how I would want to do them. For example, I prefer the idea of running destructors and dropping state early, and I like using the become keyword.

However, I do not think it's really an option to change to a "callee pops" style. I think it's a crucial selling point of Rust that it compiles down to code that is basically the same as C -- it's ok for us to diverge slightly in our calling convention, but we have to be very careful there, particularly if it can lead to a performance hit.

That said, I'd like a point of clarification. Could we in some way "contain" the callee-pops convention? For example, imagine that we had to declare functions as tail recursive, and that allowed the function to contain a become or to be the target of a become, and we disallow 'indirect' become for now. Then perhaps the "callee pops" effect could be quarantined to the tail recursive bit of your program?

I am still kind of wary in general, just because this seems like a semi-niche feature that will add complexity and maintenance burden across the board. The portability hazards are significant as well. Then again, JS supposedly has tail recursion now, so maybe people are becoming familiar with the concept (and I know some things are much nicer when you can tail recurse). (One final point is that I am not sure how problematic the borrowing restrictions and so forth would prove to be, though obviously I see why we need some such restrictions.)

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Feb 7, 2018

Member

Given our goals for 2018, I don't think work in this area is on the docket near-term. Thus, I move to postpone:

@rfcbot fcp postpone

Member

aturon commented Feb 7, 2018

Given our goals for 2018, I don't think work in this area is on the docket near-term. Thus, I move to postpone:

@rfcbot fcp postpone

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 7, 2018

Team member @aturon has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented Feb 7, 2018

Team member @aturon has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@likeabbas

This comment has been minimized.

Show comment
Hide comment
@likeabbas

likeabbas Feb 23, 2018

There are a few defining characteristics of a language that stick out when I think of a language. How to define a variable, how to write an if statement, and how to iterate through an array. In fact, Rust is introduced by iterating through of a list of greetings. That introduction has a lasting impact in what people believe is the correct way to write Rust code.

With regards to the 2018 goal

Ship an epoch release: Rust 2018

I believe adding Tail Calls would be a one of the most defining characteristics of Rust. It is a characteristic that even more separates Rust from C and C++, and gives it a cleaner feeling I associate with functional programming. If one of the major goals of 2018 is giving Rust it's defining characteristics, then I believe this is an RFC that cannot be ignored.

There are a few defining characteristics of a language that stick out when I think of a language. How to define a variable, how to write an if statement, and how to iterate through an array. In fact, Rust is introduced by iterating through of a list of greetings. That introduction has a lasting impact in what people believe is the correct way to write Rust code.

With regards to the 2018 goal

Ship an epoch release: Rust 2018

I believe adding Tail Calls would be a one of the most defining characteristics of Rust. It is a characteristic that even more separates Rust from C and C++, and gives it a cleaner feeling I associate with functional programming. If one of the major goals of 2018 is giving Rust it's defining characteristics, then I believe this is an RFC that cannot be ignored.

@Pauan

This comment has been minimized.

Show comment
Hide comment
@Pauan

Pauan Feb 24, 2018

Member

@nikomatsakis Then again, JS supposedly has tail recursion now, so maybe people are becoming familiar with the concept.

Technically the ES6 spec mandates tail-calls, but the situation in reality is more complicated than that.

The only browser that actually supports tail calls is Safari (and Webkit). And the Edge team has said that it's unlikely that they will implement tail calls (for similar reasons as Rust: they currently use the Windows ABI calling convention, which doesn't work well with tail calls).

Therefore, tail calls in JS is a very controversial thing, even to this day:

Microsoft/ChakraCore#796
kangax/compat-table#819
https://github.com/tc39/proposal-ptc-syntax
tc39/proposal-ptc-syntax#22
https://v8project.blogspot.com/2016/04/es6-es7-and-beyond.html
https://www.chromestatus.com/features/5516876633341952
https://bugs.chromium.org/p/v8/issues/detail?id=4698#c75
https://github.com/rwaldron/tc39-notes/blob/master/es7/2016-05/may-24.md#syntactic-tail-calls-bt

So for now you cannot rely upon tail calls in JS, and given the controversy you might never be able to rely upon them.

Personally I love tail calls, but I can accept the technical reasons for not implementing them.

P.S. Just to be clear, the Edge team is against implicit tail-calls for all functions, but they're in favor of tail-calls-with-an-explicit-keyword (similar to this RFC).

Member

Pauan commented Feb 24, 2018

@nikomatsakis Then again, JS supposedly has tail recursion now, so maybe people are becoming familiar with the concept.

Technically the ES6 spec mandates tail-calls, but the situation in reality is more complicated than that.

The only browser that actually supports tail calls is Safari (and Webkit). And the Edge team has said that it's unlikely that they will implement tail calls (for similar reasons as Rust: they currently use the Windows ABI calling convention, which doesn't work well with tail calls).

Therefore, tail calls in JS is a very controversial thing, even to this day:

Microsoft/ChakraCore#796
kangax/compat-table#819
https://github.com/tc39/proposal-ptc-syntax
tc39/proposal-ptc-syntax#22
https://v8project.blogspot.com/2016/04/es6-es7-and-beyond.html
https://www.chromestatus.com/features/5516876633341952
https://bugs.chromium.org/p/v8/issues/detail?id=4698#c75
https://github.com/rwaldron/tc39-notes/blob/master/es7/2016-05/may-24.md#syntactic-tail-calls-bt

So for now you cannot rely upon tail calls in JS, and given the controversy you might never be able to rely upon them.

Personally I love tail calls, but I can accept the technical reasons for not implementing them.

P.S. Just to be clear, the Edge team is against implicit tail-calls for all functions, but they're in favor of tail-calls-with-an-explicit-keyword (similar to this RFC).

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 24, 2018

The final comment period is now complete.

rfcbot commented Feb 24, 2018

The final comment period is now complete.

@Centril Centril added the postponed label Feb 24, 2018

@Centril

This comment has been minimized.

Show comment
Hide comment
@Centril

Centril Feb 24, 2018

Contributor

Closing since FCP with a motion to postpone is now complete.

Contributor

Centril commented Feb 24, 2018

Closing since FCP with a motion to postpone is now complete.

@petrochenkov

This comment has been minimized.

Show comment
Hide comment
@petrochenkov

petrochenkov Feb 24, 2018

Contributor

"Postponed" issue - #271.

Contributor

petrochenkov commented Feb 24, 2018

"Postponed" issue - #271.

@petrochenkov petrochenkov removed the postponed label Feb 24, 2018

@Centril Centril added the postponed label May 15, 2018

@bbarker

This comment has been minimized.

Show comment
Hide comment
@ehaliewicz

This comment has been minimized.

Show comment
Hide comment
@ehaliewicz

ehaliewicz Jul 30, 2018

@bbarker yep, that's a similar solution to what Webkit does. It's a classic trick.
And also used by Chicken Scheme.

@bbarker yep, that's a similar solution to what Webkit does. It's a classic trick.
And also used by Chicken Scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment