Skip to content

Commit

Permalink
Add EEP 37: Funs with names
Browse files Browse the repository at this point in the history
  • Loading branch information
RaimoNiskanen committed May 30, 2011
1 parent bdeca9b commit 31cf2d2
Showing 1 changed file with 326 additions and 0 deletions.
326 changes: 326 additions & 0 deletions eeps/eep-0037.md
@@ -0,0 +1,326 @@
Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status: Draft
Type: Standards Track
Erlang-Version: R14B04
Content-Type: text/plain
Created: 27-May-2011
Post-History:
****
EEP 37: Funs with names
---



Abstract
========

The syntax of funs is extended to allow a variable name to
be consistently present before each argument list. This
allows funs to be recursive. The knot is tied in the
opcodes that apply a fun to its arguments, so no change to
the garbage collector design is required.



Specification
=============

Currently, there are three forms for a fun:

fun Name/Arity
fun Module:Name/Arity

and

fun Fun_Clause {; Fun_Clause}... end

We add another form:

fun Variable Fun_Clause {; Variable Fun_Clause}... end

If any `Fun_Clause` has a `Variable`, all must, and they must all
be the same variable. Like the variables in the argument list(s),
this variable is local to the fun, not shared with its context.
Within the fun, the variable is bound to the value of the fun
expression.

There are several possible ways to implement this. One is
rather neat because it preserves the cycle-freedom of the
data structures the garbage collector has to deal with.

One way to implement existing funs is this:

- **a** Create an auxiliary function with a generated name

<foo>(...,X1,...,Xk) ...;
...
<foo>(...,X1,...,Xk) ....

having the same argument lists, guards, and clause bodies
as the fun, except that each variable shared with the context
appears as an extra argument.

- **b** Translate the fun expression as

'%mk-fun'({fun <foo>/n+k, X1, ..., Xk})

which gives the tuple a special tag to say that it represents
a fun value.

- **c** Translate `Foo(E1,...,Em)`
as `A1 := E1, ..., Am := Em; funcall_m(Foo)`
where the `funcall_m` instruction checks that its argument is
a closure expecting `m` arguments, moves the `X1,...,Xk` fields
of the tuple to argument registers `A<m+1>..A<m+k>`, and then
jumps to the address in the first field.

All it takes to implement recursive funs is

- **a'** Create an auxiliary function

<foo>(...,X1,...,Xk,Variable) ...;
...
<foo>(...,X1,...,Xk,Variable) ....

- **b'** Translate the fun expression as

'%mk-rec-fun'({fun <foo>/<n+k+1>, X1, ..., Xk})

which simply applies a second special tag.

- **c'** The `funcall_m` opcode acts the same for both old and
recursive funs, except that just before jumping, it
adds tne fun value `Foo` itself as argument `A<m+k+1>`.
This "ties the knot".

So a recursive fun takes no more space or time to create than
an existing one, and does not involve creating any cycles of
pointers. Its code can be inserted into the failure path for
the `funcall_m` instructions, whatever their form.




Motivation
==========

Fun names can serve three purposes.

First, they can simply be documentation. For example,

cfun_files(CFun) ->
fun(F1, F2) ->
[[?OBJ(T1,_) | _] | _] = F1,
[[?OBJ(T2,_) | _] | _] = F2,
CFun(T1, T2)
end.

can be written as

cfun_files(CFun) ->
fun Compare([[?OBJ(T1,_)|_]|_], [[?OBJ(T2,_)|_]|_]) ->
CFun(T1, T2)
end.

A named fun whose name is not used can be implemented as if
the name were not there.

Second, the fun's name can be built into its generated name.
At the time of writing, we might have

'-F/N-fun-K-'

where `F/N` is the name of the function that includes the fun
and `K` is the number of earlier funs in `F/N`. We could build
the name in instead, using

'-F/N-fun-Name-[K-]'

where `K` is present only if the outer function contains more
than one fun with the same name. The point of this is that
such names are more likely to be useful after hot loading.
For example, if we start with

f(...Xs, Ys, ...) ->
...
sort(Xs, fun X_Key({_,N,_}) -> N end),
sort(Ys, fun Y_Key({N,_,_}) -> N end),
...

and then we revise it, swapping the two calls to `sort/2`.
With named funs, the two funs retain their generated names,
and the module is safe. With anonymous functions, the
chances are that the two funs with swap names; oops!

Third, a frequently asked question in the Erlang mailing
list is "why can't I have recursive funs?" to which we
will now be able to rely, "you can; here is what they
look like."

This still does not permit mutually recursive funs, but
people do not seem to ask for that much.

Finally, the next time someone argues that Erlang syntax
is inconsistent because function clauses have repeated
names and fun clauses do not, we shall be able to reply
"but fun clauses CAN have repeated names and probably
should."




Rationale
=========

There really seemed to be only two main questions.

What should the scope of the fun name variable be?
Some variables in a fun are shared between the fun
and its context. Doing that would let us write

f(...) ->
fun G(...) -> ... end,
fun H(...) -> ... end,
... use G and H ...

rather like using nested "define" in Scheme, except that
while `H` could use `G`, `G` couldn't use `H`.

Since you do not get mutual recursion this way, you should
not be tricked into thinking you might. It's better that
you have to write

f(...) ->
GG = fun G(...) -> ... end,
HH = fun H(...) -> ... end,
... use GG and HH ...

so that you understand clearly what you are getting.

While variables in the body of a fun clause may be shared
with the context, variables in the arguments are not,
something I have found confusing. At least this way the
fun name follows the same scope rule as the variables in
the argument list right next to it.

The other main question was whether recursive fun values
should be exactly the same representation as existing
fun values, but with a cycle in it (tying the knot at
construction time), or whether to introduce a new tag
(tying the knot at call time). The lack of cycles in
Erlang heaps has been a major factor in the design of
several garbage collectors. I would expect changing
that to be an order of magnitude harder than the
changes required for this proposal. It was seeing that
the knot could be tied at call with (without slowing
down calls to existing funs) that made me dare to hope
that this proposal might some day be accepted.

The main issue now is that this does not let us define
a group of mutually recursive local functions.
Adopting this proposal now might get in the way of a
better proposal that handles mutual recursion as well.

I don't see such a proposal as being likely to arrive soon.

There is a special case of this where the fun name is used
only in tail call positions, which can be handled entirely
by the compiler generating a jump back to the beginning.
This need not have any consequences for the run time system
at all.



Backwards Compatibility
=======================

Code that does not use the new feature does not change its
meaning. There may be code that relies on the form of
generated function names; that would need changing.

All syntax tools would need to be revised to handle the new form.
Existing parse transforms might well fail on code containing the
new form, but would work unchanged on code that does not.

At least one new instruction is needed to create suitably
distinguished closures. Existing programs that analyse BEAM
files will not understand this until they are revised.

As described under 'motivation', naming functions is
useful even if you do not use the name in any clause body.
This means that we can have a staged delivery of the feature.

1. Make the parser recognise fun names and check their identity.
Have it report an error if the fun name is used in a body.
Have it erase the fun names from the AST before any
downstream tool sees it.

At this stage, fun names may serve as documentation.

2. Upgrade the downstream tools to recognise an extended `'fun'`
AST node with two extra fields: the fun name as an atom and
a flag saying whether it is not used, used only in tail
position, or used more generally.

Upgrade the parser to report fun names, but retain the
check that they are not used. Test the down stream tools.

3. Modify the compiler to use the new, safer, form of generated
name. Ensure that the generated names are accessed only
through an interface, so all is consistent.

At this stage, fun names help to reduce the danger from
code revisions that add, remove, or re-order funs; a
change that does not alter the number of funs with a
particular name in a function should not change its name.

4. (Optional.) Revise the code generator to accept the fun
name in tail call position and generate a jump. Modify
the parser to allow this.

At this point, it is possible to pass a loop as a parameter,
like a list traversal or a binary search. No changes to the
representation of Erlang terms or the BEAM engine have been
required yet.

5. Add a new tag. Revise the funcall instructions to check for
it if the existing check fails, and push the closure itself.
Add a new instruction to make a new closure. Revise the
Erlang term representation to encode recursive funs. Revise
the type test instructions to recognise the new values.
Teach HiPE what to do.

This is the last stage.



Reference Implementation
========================

None in this draft. Stage 1 can be done fairly easily.
Stage 2 would be hard for me because I'm not even sure what
all the relevant modules are.



References
==========

None.



Copyright
=========

This document has been placed in the public domain.



[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"

0 comments on commit 31cf2d2

Please sign in to comment.