Add EEP 37: Funs with names

erlang · May 30, 2011 · 31cf2d2 · 31cf2d2
1 parent bdeca9b
commit 31cf2d2
Showing 1 changed file with 326 additions and 0 deletions.
diff --git a/eeps/eep-0037.md b/eeps/eep-0037.md
@@ -0,0 +1,326 @@
+    Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
+    Status: Draft
+    Type: Standards Track
+    Erlang-Version: R14B04
+    Content-Type: text/plain
+    Created: 27-May-2011
+    Post-History:
+****
+EEP 37: Funs with names
+---
+
+
+
+Abstract
+========
+
+The syntax of funs is extended to allow a variable name to
+be consistently present before each argument list.  This
+allows funs to be recursive.  The knot is tied in the
+opcodes that apply a fun to its arguments, so no change to
+the garbage collector design is required.
+
+
+
+Specification
+=============
+
+Currently, there are three forms for a fun:
+
+    fun Name/Arity
+    fun Module:Name/Arity
+
+and
+
+    fun Fun_Clause {; Fun_Clause}... end
+
+We add another form:
+
+    fun Variable Fun_Clause {; Variable Fun_Clause}... end
+
+If any `Fun_Clause` has a `Variable`, all must, and they must all
+be the same variable.  Like the variables in the argument list(s),
+this variable is local to the fun, not shared with its context.
+Within the fun, the variable is bound to the value of the fun
+expression.
+
+There are several possible ways to implement this.  One is
+rather neat because it preserves the cycle-freedom of the
+data structures the garbage collector has to deal with.
+
+One way to implement existing funs is this:
+
+- **a** Create an auxiliary function with a generated name
+
+        <foo>(...,X1,...,Xk) ...;
+        ...
+        <foo>(...,X1,...,Xk) ....
+
+    having the same argument lists, guards, and clause bodies
+    as the fun, except that each variable shared with the context
+    appears as an extra argument.
+
+- **b** Translate the fun expression as
+
+        '%mk-fun'({fun <foo>/n+k, X1, ..., Xk})
+
+    which gives the tuple a special tag to say that it represents
+    a fun value.
+
+- **c** Translate  `Foo(E1,...,Em)`
+    as `A1 := E1, ..., Am := Em; funcall_m(Foo)`
+    where the `funcall_m` instruction checks that its argument is
+    a closure expecting `m` arguments, moves the `X1,...,Xk` fields
+    of the tuple to argument registers `A<m+1>..A<m+k>`, and then
+    jumps to the address in the first field.
+
+All it takes to implement recursive funs is
+
+- **a'** Create an auxiliary function
+
+        <foo>(...,X1,...,Xk,Variable) ...;
+        ...
+        <foo>(...,X1,...,Xk,Variable) ....
+
+- **b'** Translate the fun expression as
+
+        '%mk-rec-fun'({fun <foo>/<n+k+1>, X1, ..., Xk})
+
+    which simply applies a second special tag.
+
+- **c'** The `funcall_m` opcode acts the same for both old and
+    recursive funs, except that just before jumping, it
+    adds tne fun value `Foo` itself as argument `A<m+k+1>`.
+    This "ties the knot".
+
+So a recursive fun takes no more space or time to create than
+an existing one, and does not involve creating any cycles of
+pointers.  Its code can be inserted into the failure path for
+the `funcall_m` instructions, whatever their form.
+
+
+
+
+Motivation
+==========
+
+Fun names can serve three purposes.
+
+First, they can simply be documentation.  For example,
+
+    cfun_files(CFun) ->
+        fun(F1, F2) ->
+    	    [[?OBJ(T1,_) | _] | _] = F1,
+    	    [[?OBJ(T2,_) | _] | _] = F2,
+    	    CFun(T1, T2)
+        end.
+
+can be written as
+
+    cfun_files(CFun) ->
+        fun Compare([[?OBJ(T1,_)|_]|_], [[?OBJ(T2,_)|_]|_]) ->
+    	CFun(T1, T2)
+        end.
+
+A named fun whose name is not used can be implemented as if
+the name were not there.
+
+Second, the fun's name can be built into its generated name.
+At the time of writing, we might have
+
+    '-F/N-fun-K-'
+
+where `F/N` is the name of the function that includes the fun
+and `K` is the number of earlier funs in `F/N`.  We could build
+the name in instead, using
+
+    '-F/N-fun-Name-[K-]'
+
+where `K` is present only if the outer function contains more
+than one fun with the same name.  The point of this is that
+such names are more likely to be useful after hot loading.
+For example, if we start with
+
+    f(...Xs, Ys, ...) ->
+        ...
+        sort(Xs, fun X_Key({_,N,_}) -> N end),
+        sort(Ys, fun Y_Key({N,_,_}) -> N end),
+        ...
+
+and then we revise it, swapping the two calls to `sort/2`.
+With named funs, the two funs retain their generated names,
+and the module is safe.  With anonymous functions, the
+chances are that the two funs with swap names; oops!
+
+Third, a frequently asked question in the Erlang mailing
+list is "why can't I have recursive funs?" to which we
+will now be able to rely, "you can; here is what they
+look like."
+
+This still does not permit mutually recursive funs, but
+people do not seem to ask for that much.
+
+Finally, the next time someone argues that Erlang syntax
+is inconsistent because function clauses have repeated
+names and fun clauses do not, we shall be able to reply
+"but fun clauses CAN have repeated names and probably
+should."
+
+
+
+
+Rationale
+=========
+
+There really seemed to be only two main questions.
+
+What should the scope of the fun name variable be?
+Some variables in a fun are shared between the fun
+and its context.  Doing that would let us write
+
+    f(...) ->
+        fun G(...) -> ... end,
+        fun H(...) -> ... end,
+        ... use G and H ...
+
+rather like using nested "define" in Scheme, except that
+while `H` could use `G`, `G` couldn't use `H`.
+
+Since you do not get mutual recursion this way, you should
+not be tricked into thinking you might.  It's better that
+you have to write
+
+    f(...) ->
+        GG = fun G(...) -> ... end,
+        HH = fun H(...) -> ... end,
+        ... use GG and HH ...
+
+so that you understand clearly what you are getting.
+
+While variables in the body of a fun clause may be shared
+with the context, variables in the arguments are not,
+something I have found confusing.  At least this way the
+fun name follows the same scope rule as the variables in
+the argument list right next to it.
+
+The other main question was whether recursive fun values
+should be exactly the same representation as existing
+fun values, but with a cycle in it (tying the knot at
+construction time), or whether to introduce a new tag
+(tying the knot at call time).  The lack of cycles in
+Erlang heaps has been a major factor in the design of
+several garbage collectors.  I would expect changing
+that to be an order of magnitude harder than the
+changes required for this proposal.  It was seeing that
+the knot could be tied at call with (without slowing
+down calls to existing funs) that made me dare to hope
+that this proposal might some day be accepted.
+
+The main issue now is that this does not let us define
+a group of mutually recursive local functions.
+Adopting this proposal now might get in the way of a
+better proposal that handles mutual recursion as well.
+
+I don't see such a proposal as being likely to arrive soon.
+
+There is a special case of this where the fun name is used
+only in tail call positions, which can be handled entirely
+by the compiler generating a jump back to the beginning.
+This need not have any consequences for the run time system
+at all.
+
+
+
+Backwards Compatibility
+=======================
+
+Code that does not use the new feature does not change its
+meaning.  There may be code that relies on the form of
+generated function names; that would need changing.
+
+All syntax tools would need to be revised to handle the new form.
+Existing parse transforms might well fail on code containing the
+new form, but would work unchanged on code that does not.
+
+At least one new instruction is needed to create suitably
+distinguished closures.  Existing programs that analyse BEAM
+files will not understand this until they are revised.
+
+As described under 'motivation', naming functions is
+useful even if you do not use the name in any clause body.
+This means that we can have a staged delivery of the feature.
+
+1.  Make the parser recognise fun names and check their identity.
+    Have it report an error if the fun name is used in a body.
+    Have it erase the fun names from the AST before any
+    downstream tool sees it.
+
+    At this stage, fun names may serve as documentation.
+
+2.  Upgrade the downstream tools to recognise an extended `'fun'`
+    AST node with two extra fields:  the fun name as an atom and
+    a flag saying whether it is not used, used only in tail
+    position, or used more generally.
+
+    Upgrade the parser to report fun names, but retain the
+    check that they are not used.  Test the down stream tools.
+
+3.  Modify the compiler to use the new, safer, form of generated
+    name.  Ensure that the generated names are accessed only
+    through an interface, so all is consistent.
+
+    At this stage, fun names help to reduce the danger from
+    code revisions that add, remove, or re-order funs; a
+    change that does not alter the number of funs with a
+    particular name in a function should not change its name.
+
+4.  (Optional.)  Revise the code generator to accept the fun
+    name in tail call position and generate a jump.  Modify
+    the parser to allow this.
+
+    At this point, it is possible to pass a loop as a parameter,
+    like a list traversal or a binary search.  No changes to the
+    representation of Erlang terms or the BEAM engine have been
+    required yet.
+
+5.  Add a new tag.  Revise the funcall instructions to check for
+    it if the existing check fails, and push the closure itself.
+    Add a new instruction to make a new closure.  Revise the
+    Erlang term representation to encode recursive funs.  Revise
+    the type test instructions to recognise the new values.
+    Teach HiPE what to do.
+
+    This is the last stage.
+
+
+
+Reference Implementation
+========================
+
+None in this draft.  Stage 1 can be done fairly easily.
+Stage 2 would be hard for me because I'm not even sure what
+all the relevant modules are.
+
+
+
+References
+==========
+
+None.
+
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+[EmacsVar]: <> "Local Variables:"
+[EmacsVar]: <> "mode: indented-text"
+[EmacsVar]: <> "indent-tabs-mode: nil"
+[EmacsVar]: <> "sentence-end-double-space: t"
+[EmacsVar]: <> "fill-column: 70"
+[EmacsVar]: <> "coding: utf-8"
+[EmacsVar]: <> "End:"