Join GitHub today
Epic: Tail calls optimizations #6914
The goal is to make our tail call optimization to match netcore/.NET
The four main platforms we care about at first are: x86, amd64, arm and arm64; the optimization should work equally well on all of them.
Collecting notes for somewhat future tailcall work.
csc can emit things like:
or if you say csc -optimize:
JIT should recognize the second and possibly the first pattern and attempt to tail call. Even without the tail prefix.
The first pattern is somewhat generalizable in that "effectively nop" is unbounded -- a general optimization problem. Some of this this work might involve more surveying of what csc/mcs tends to emit. And surveying of people tend to use csc -optimize. The second pattern is trivial.
If these are already handled, then just add tests to assert it.
Meta-point is that csc never outputs "tail." yet programmers might deliberately write code amenable to tail optimizations. Until/unless C#/csc are fixed, the burden is all on the JIT. C# should gain a syntax though, and then this bug can be skipped.
This was referenced
Apr 4, 2018
Pretty much everything is believed to work. There are a few known problems. Traversing the test matrix takes a long time.
Many many scenarios improved.
referenced this issue
Mar 19, 2019
See around here https://github.com/mono/mono/tree/master/mono/tests/tailcall
A lot of these changes had to be reverted due to introduced breakages. I wouldn't say the situation to have changed much since before @jaykrell's work, and the existing set of tests, even if somehow exhaustive, doesn't give a clear picture as to what exactly is passing.
Very little had to be reverted.
The main problem was ARM32 could often not be fixed due to the use of an extra
Similar problem with interface calls, still ARM32-specific:
This could be trivially fixed for "bitcode" targets (watch, TV) as they work differently but that was rejected:
Though again, the problem is ARM32-specific.
There is also a problem with gsharedvt.
This disabling is only under FullAOT. JIT works and regular AOT and I think HybridAOT.
Also, value types passed by value on AMD64 and ARM64 as a problem, as they are passed by reference
Also, mono is stricter than CoreCLR about type matching.
Lots of tests used to fail and now pass.
Then there is the "auto tailcall" scenario, tailcalling without a tail. prefix.
I didn't change that much or at all.
The problems here are:
There are tests that depend on stackwalk to form strings for logging.
Debugging. People complain about lost frames.
Social. People don't think it is important.
See here for a start, uncommited PRs:
#9620 probably ignore this in dereference to the next two.
and then redone later more conservatively:
and then just focusing on the sensitivity separately:
The best/easiest thing is to try the F# tests.
And then there is a very large matrix to run them, which is largely covered by CI:
Some of the cells don't make sense or are unimportant but many are valid.
The test infrastructure splits up the F# tests runs them all individually.
I'm just repeating myself. Yes the summary from #6914 (comment) is accurate.
It is confusing I agree, because there are many variables, more than you might expect. Many cases were improved, and some improvements are blocked per-architecture (arm32) or codegen variant (FullAOT gsharedvt).
Let me try and split the difference between @luhenry and @jaykrell - our tail call support is better than it used to be, but it is difficult to concisely characterize when we will compile a tailcall successfully and diagnosing why a particular tail-call had to fall back to a regular call is difficult.
The first line means that
if we can't make the tailcall, there may be a message like:
For future reference: This summary is from mono HEAD 86e7400.
@lambdageek Summary tweeted here, thanks again! https://twitter.com/dsyme/status/1108486082109362178
That is a pretty accurate rendition of the source, yes.
And despite all the conditions, this does let through a lot more than before.
Close but not quite.
Aside: There is a separate jmp instruction that is pretty broken.
I'm fuzzy on what I did with calli.
This is actually the main obvious limitation and CoreCLR
The inlining thing is a dilema.
You could try to adjust inlining to encourage tailcall
We might actually inhibit inlining of anything with tail. prefix?