Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
bdeca9b
commit 31cf2d2
Showing
1 changed file
with
326 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,326 @@ | ||
Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz> | ||
Status: Draft | ||
Type: Standards Track | ||
Erlang-Version: R14B04 | ||
Content-Type: text/plain | ||
Created: 27-May-2011 | ||
Post-History: | ||
**** | ||
EEP 37: Funs with names | ||
--- | ||
|
||
|
||
|
||
Abstract | ||
======== | ||
|
||
The syntax of funs is extended to allow a variable name to | ||
be consistently present before each argument list. This | ||
allows funs to be recursive. The knot is tied in the | ||
opcodes that apply a fun to its arguments, so no change to | ||
the garbage collector design is required. | ||
|
||
|
||
|
||
Specification | ||
============= | ||
|
||
Currently, there are three forms for a fun: | ||
|
||
fun Name/Arity | ||
fun Module:Name/Arity | ||
|
||
and | ||
|
||
fun Fun_Clause {; Fun_Clause}... end | ||
|
||
We add another form: | ||
|
||
fun Variable Fun_Clause {; Variable Fun_Clause}... end | ||
|
||
If any `Fun_Clause` has a `Variable`, all must, and they must all | ||
be the same variable. Like the variables in the argument list(s), | ||
this variable is local to the fun, not shared with its context. | ||
Within the fun, the variable is bound to the value of the fun | ||
expression. | ||
|
||
There are several possible ways to implement this. One is | ||
rather neat because it preserves the cycle-freedom of the | ||
data structures the garbage collector has to deal with. | ||
|
||
One way to implement existing funs is this: | ||
|
||
- **a** Create an auxiliary function with a generated name | ||
|
||
<foo>(...,X1,...,Xk) ...; | ||
... | ||
<foo>(...,X1,...,Xk) .... | ||
|
||
having the same argument lists, guards, and clause bodies | ||
as the fun, except that each variable shared with the context | ||
appears as an extra argument. | ||
|
||
- **b** Translate the fun expression as | ||
|
||
'%mk-fun'({fun <foo>/n+k, X1, ..., Xk}) | ||
|
||
which gives the tuple a special tag to say that it represents | ||
a fun value. | ||
|
||
- **c** Translate `Foo(E1,...,Em)` | ||
as `A1 := E1, ..., Am := Em; funcall_m(Foo)` | ||
where the `funcall_m` instruction checks that its argument is | ||
a closure expecting `m` arguments, moves the `X1,...,Xk` fields | ||
of the tuple to argument registers `A<m+1>..A<m+k>`, and then | ||
jumps to the address in the first field. | ||
|
||
All it takes to implement recursive funs is | ||
|
||
- **a'** Create an auxiliary function | ||
|
||
<foo>(...,X1,...,Xk,Variable) ...; | ||
... | ||
<foo>(...,X1,...,Xk,Variable) .... | ||
|
||
- **b'** Translate the fun expression as | ||
|
||
'%mk-rec-fun'({fun <foo>/<n+k+1>, X1, ..., Xk}) | ||
|
||
which simply applies a second special tag. | ||
|
||
- **c'** The `funcall_m` opcode acts the same for both old and | ||
recursive funs, except that just before jumping, it | ||
adds tne fun value `Foo` itself as argument `A<m+k+1>`. | ||
This "ties the knot". | ||
|
||
So a recursive fun takes no more space or time to create than | ||
an existing one, and does not involve creating any cycles of | ||
pointers. Its code can be inserted into the failure path for | ||
the `funcall_m` instructions, whatever their form. | ||
|
||
|
||
|
||
|
||
Motivation | ||
========== | ||
|
||
Fun names can serve three purposes. | ||
|
||
First, they can simply be documentation. For example, | ||
|
||
cfun_files(CFun) -> | ||
fun(F1, F2) -> | ||
[[?OBJ(T1,_) | _] | _] = F1, | ||
[[?OBJ(T2,_) | _] | _] = F2, | ||
CFun(T1, T2) | ||
end. | ||
|
||
can be written as | ||
|
||
cfun_files(CFun) -> | ||
fun Compare([[?OBJ(T1,_)|_]|_], [[?OBJ(T2,_)|_]|_]) -> | ||
CFun(T1, T2) | ||
end. | ||
|
||
A named fun whose name is not used can be implemented as if | ||
the name were not there. | ||
|
||
Second, the fun's name can be built into its generated name. | ||
At the time of writing, we might have | ||
|
||
'-F/N-fun-K-' | ||
|
||
where `F/N` is the name of the function that includes the fun | ||
and `K` is the number of earlier funs in `F/N`. We could build | ||
the name in instead, using | ||
|
||
'-F/N-fun-Name-[K-]' | ||
|
||
where `K` is present only if the outer function contains more | ||
than one fun with the same name. The point of this is that | ||
such names are more likely to be useful after hot loading. | ||
For example, if we start with | ||
|
||
f(...Xs, Ys, ...) -> | ||
... | ||
sort(Xs, fun X_Key({_,N,_}) -> N end), | ||
sort(Ys, fun Y_Key({N,_,_}) -> N end), | ||
... | ||
|
||
and then we revise it, swapping the two calls to `sort/2`. | ||
With named funs, the two funs retain their generated names, | ||
and the module is safe. With anonymous functions, the | ||
chances are that the two funs with swap names; oops! | ||
|
||
Third, a frequently asked question in the Erlang mailing | ||
list is "why can't I have recursive funs?" to which we | ||
will now be able to rely, "you can; here is what they | ||
look like." | ||
|
||
This still does not permit mutually recursive funs, but | ||
people do not seem to ask for that much. | ||
|
||
Finally, the next time someone argues that Erlang syntax | ||
is inconsistent because function clauses have repeated | ||
names and fun clauses do not, we shall be able to reply | ||
"but fun clauses CAN have repeated names and probably | ||
should." | ||
|
||
|
||
|
||
|
||
Rationale | ||
========= | ||
|
||
There really seemed to be only two main questions. | ||
|
||
What should the scope of the fun name variable be? | ||
Some variables in a fun are shared between the fun | ||
and its context. Doing that would let us write | ||
|
||
f(...) -> | ||
fun G(...) -> ... end, | ||
fun H(...) -> ... end, | ||
... use G and H ... | ||
|
||
rather like using nested "define" in Scheme, except that | ||
while `H` could use `G`, `G` couldn't use `H`. | ||
|
||
Since you do not get mutual recursion this way, you should | ||
not be tricked into thinking you might. It's better that | ||
you have to write | ||
|
||
f(...) -> | ||
GG = fun G(...) -> ... end, | ||
HH = fun H(...) -> ... end, | ||
... use GG and HH ... | ||
|
||
so that you understand clearly what you are getting. | ||
|
||
While variables in the body of a fun clause may be shared | ||
with the context, variables in the arguments are not, | ||
something I have found confusing. At least this way the | ||
fun name follows the same scope rule as the variables in | ||
the argument list right next to it. | ||
|
||
The other main question was whether recursive fun values | ||
should be exactly the same representation as existing | ||
fun values, but with a cycle in it (tying the knot at | ||
construction time), or whether to introduce a new tag | ||
(tying the knot at call time). The lack of cycles in | ||
Erlang heaps has been a major factor in the design of | ||
several garbage collectors. I would expect changing | ||
that to be an order of magnitude harder than the | ||
changes required for this proposal. It was seeing that | ||
the knot could be tied at call with (without slowing | ||
down calls to existing funs) that made me dare to hope | ||
that this proposal might some day be accepted. | ||
|
||
The main issue now is that this does not let us define | ||
a group of mutually recursive local functions. | ||
Adopting this proposal now might get in the way of a | ||
better proposal that handles mutual recursion as well. | ||
|
||
I don't see such a proposal as being likely to arrive soon. | ||
|
||
There is a special case of this where the fun name is used | ||
only in tail call positions, which can be handled entirely | ||
by the compiler generating a jump back to the beginning. | ||
This need not have any consequences for the run time system | ||
at all. | ||
|
||
|
||
|
||
Backwards Compatibility | ||
======================= | ||
|
||
Code that does not use the new feature does not change its | ||
meaning. There may be code that relies on the form of | ||
generated function names; that would need changing. | ||
|
||
All syntax tools would need to be revised to handle the new form. | ||
Existing parse transforms might well fail on code containing the | ||
new form, but would work unchanged on code that does not. | ||
|
||
At least one new instruction is needed to create suitably | ||
distinguished closures. Existing programs that analyse BEAM | ||
files will not understand this until they are revised. | ||
|
||
As described under 'motivation', naming functions is | ||
useful even if you do not use the name in any clause body. | ||
This means that we can have a staged delivery of the feature. | ||
|
||
1. Make the parser recognise fun names and check their identity. | ||
Have it report an error if the fun name is used in a body. | ||
Have it erase the fun names from the AST before any | ||
downstream tool sees it. | ||
|
||
At this stage, fun names may serve as documentation. | ||
|
||
2. Upgrade the downstream tools to recognise an extended `'fun'` | ||
AST node with two extra fields: the fun name as an atom and | ||
a flag saying whether it is not used, used only in tail | ||
position, or used more generally. | ||
|
||
Upgrade the parser to report fun names, but retain the | ||
check that they are not used. Test the down stream tools. | ||
|
||
3. Modify the compiler to use the new, safer, form of generated | ||
name. Ensure that the generated names are accessed only | ||
through an interface, so all is consistent. | ||
|
||
At this stage, fun names help to reduce the danger from | ||
code revisions that add, remove, or re-order funs; a | ||
change that does not alter the number of funs with a | ||
particular name in a function should not change its name. | ||
|
||
4. (Optional.) Revise the code generator to accept the fun | ||
name in tail call position and generate a jump. Modify | ||
the parser to allow this. | ||
|
||
At this point, it is possible to pass a loop as a parameter, | ||
like a list traversal or a binary search. No changes to the | ||
representation of Erlang terms or the BEAM engine have been | ||
required yet. | ||
|
||
5. Add a new tag. Revise the funcall instructions to check for | ||
it if the existing check fails, and push the closure itself. | ||
Add a new instruction to make a new closure. Revise the | ||
Erlang term representation to encode recursive funs. Revise | ||
the type test instructions to recognise the new values. | ||
Teach HiPE what to do. | ||
|
||
This is the last stage. | ||
|
||
|
||
|
||
Reference Implementation | ||
======================== | ||
|
||
None in this draft. Stage 1 can be done fairly easily. | ||
Stage 2 would be hard for me because I'm not even sure what | ||
all the relevant modules are. | ||
|
||
|
||
|
||
References | ||
========== | ||
|
||
None. | ||
|
||
|
||
|
||
Copyright | ||
========= | ||
|
||
This document has been placed in the public domain. | ||
|
||
|
||
|
||
[EmacsVar]: <> "Local Variables:" | ||
[EmacsVar]: <> "mode: indented-text" | ||
[EmacsVar]: <> "indent-tabs-mode: nil" | ||
[EmacsVar]: <> "sentence-end-double-space: t" | ||
[EmacsVar]: <> "fill-column: 70" | ||
[EmacsVar]: <> "coding: utf-8" | ||
[EmacsVar]: <> "End:" |