Skip to content
Browse files

Add EEP 41: Pseudo-assignment for Erlang

  • Loading branch information...
1 parent 3e4b05a commit a1595c12e5d79ac703e3ff9ab9b2ef98a6b39024 @RaimoNiskanen RaimoNiskanen committed Feb 5, 2013
Showing with 626 additions and 0 deletions.
  1. +626 −0 eeps/eep-0041.md
View
626 eeps/eep-0041.md
@@ -0,0 +1,626 @@
+ Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
+ Status: Draft
+ Type: Standards Track
+ Erlang-Version: R16A
+ Created: 04-Feb-2013
+ Post-History:
+****
+EEP 41: Pseudo-assignment for Erlang
+----
+
+
+
+Abstract
+========
+
+Add the infix token ':=' to Erlang with the purely
+functional update semantics of '<-' in R.
+
+Example
+=======
+
+Given the declarations
+
+ -record(rect, {top,left,bottom,right}).
+ -record(whatever, {region, ...}).
+
+ centre(#rect{top=T,left=L,bottom=B,right=R}) ->
+ {(L+R)/2, (T+B)/2}.
+
+ 'centre:='(#record{top=T,left=L,bottom=B,right=R}, {X,Y}) ->
+ DX = X - (L+R)/2,
+ DY = Y - (T+B)/2,
+ #rect{top=T+DY,left=L+DX,bottom=B+DY,right=R+DX}.
+
+the pseudo-assignment
+
+ centre(W#whatever.area) := P
+
+expands to
+
+ W' = W#whatever{
+ area = 'centre:='(W#whatever.area, P)}
+
+with W' automatically replacing downstream mentions of W.
+
+
+Specification
+=============
+
+A new token ':=' is introduced. It may only be used in
+the form
+
+ Lhs := Rhs
+
+where Rhs is any Erlang expression, called the source, and
+Lhs, called the target, is
+
+- a variable (not a wildcard),
+- Lhs' #record.field, or
+- ~f(Lhs'), or
+- f(Lhs', E2, ..., En), or
+- m:f(Lhs', E2, ..., En)
+
+where E2 ... En are any Erlang expressions, Lhs'
+is another instance of the same form, f is an atom,
+and the module prefix m may only be an atom or a variable.
+
+The "ultimate target"
+
+- of a variable is that variable,
+- of L#r.f is the ultimate target of L,
+- of ~f(L) is the ultimate target of L,
+- of f(L,...) is the ultimate target of L,
+- of m:f(L,...) is the ultimate target of L.
+
+Any pseudo-assignment is basically a (re)binding of its
+ultimate target and has as its value the value
+given to that variable, *not* the source right hand side.
+
+A pseudo-assignment is equivalent to a sequence of
+simple variable=expression bindings joined by comma,
+and may appear anywhere in an expression that such a
+sequence of bindings may appear, except that if it
+occurs inside a list comprehension, the ultimate
+target must not be mentioned outside that comprehension.
+
+The semantics of pseudo-assignment is defined using
+three conceptual stages: protection, expansion, and
+renaming.
+
+Protection
+----------
+
+The basic idea is that
+
+ f(T, E2, ..., En) := S
+
+is syntactic sugar for
+
+ T := 'f:='(T, E2, ..., En, S)
+
+This form of pseudo-assignment comes from S ([S][] [R][]),
+although Pop-2 ([P][]) had an analogous approach much earlier,
+and somewhat similar "sinister function calls" were found in
+SETL ([M][]) (which looks imperative but whose values are
+semantically immutable).
+
+Where things get slightly complicated is that we want
+subexpressions of T1, E2, ..., En, S evaluated exactly
+once and in order. This is like the way the Common Lisp
+([L][]) macros that work with generalised variables "[evaluate]
+the subforms of the macro call [...] exactly once in
+left-to-right order". Let's start with an example:
+
+ f(g(T, E1), E2) := E3
+
+ => V1 = E1,
+ g(T, V1) := 'f:='(g(T, V1), E2, E3)
+
+ => V1 = E1,
+ T := 'g:='(T, V1, 'f:='(g(T, V1), E2, E3))
+
+so that E1 is not evaluated twice.
+
+This step is defined using Erlang pseudo-code, in which
+<[...]> brackets are "quasi-quotes" enclosing source
+syntax representations of abstract syntax trees.
+Informally, do a pre-order walk over the AST adding
+V=Arg bindings for every non-first argument Arg of each
+function but the top-most, for any Arg that needs it.
+Which arguments do not need this protection? Ones whose
+evaluation cannot produce any observable effects, which
+we can approximate well enough by saying that variables
+and constants don't need protection and everything else does.
+
+ % protect(ast()) -> ast()
+
+ protect(<[ Lhs := Rhs ]>) ->
+ {Lhs', Bindings} = protect(Lhs, 0, []),
+ prepend_bindings(Bindings, <[ Lhs' := Rhs ]>).
+
+ % prepend_bindings([ast()], ast()) -> ast().
+
+ prepend_bindings([Binding|Bindings], E) ->
+ E' = prepend_bindings(Bindings, E),
+ <[ Binding, E' ]>;
+ prepend_bindings([], E) ->
+ E.
+
+ % protect(Expr::ast(), Depth::int(), [ast()]) ->
+ % {ast(), [ast()].
+
+ protect(<[ Var ]>, _, B) ->
+ {<[ Var ]>, B};
+ protect(<[ ~F(T) ]>, D, B) ->
+ {T', B'} = protect(T, D+1, B),
+ {<[ ~F(T') ]>, B'};
+ protect(<[ T#R.F ]>, D, B) ->
+ {T', B'} = protect(T, D+1, B),
+ {<[ T'@R.F ]>, B'};
+ protect(<[ F(T,E2,...,En) ]>, D = 0, B) ->
+ {T', B'} = protect(T, D+1, B),
+ {<[ F(T',E2,...,En) ]>, B');
+ protect(<[ F(T,E2,...,En) ]>, D, B) when D > 0 ->
+ {[E2',...,En'], B'} = protect_args([E2,...,En], B),
+ {T', B''} = protect(T, D+1, B),
+ {F(T',E2',...,En'), B''};
+ protect(<[ M:F(T,E2,...,En) ]>, D = 0, B) ->
+ {T', B'} = protect(T, D+1, B),
+ {<[ M:F(T',E2,...,En) ]>, B'');
+ protect(<[ M:F(T,E2,...,En) ]>, D, B) when D > 0 ->
+ {[E2',...,En'], B'} = protect_args([E2,...,En], B),
+ {T', B''} = protect(T, D+1, B),
+ {M:F(T',E2',...,En'), B''};
+
+ % protect_args([ast()], [ast()]) -> {[ast()], [ast()]}.
+
+ protect_args([], B) ->
+ {[], B};
+ protect_args([<[ Var ]>|Args], B) ->
+ {Args', B'} = protect_args(Args, B),
+ {[< Var ]>|Args'], B'};
+ protect_args([<[ Const ]>|Args], B) ->
+ {Args', B'} = protect_args(Args, B),
+ {[< Const ]>|Args'], B'};
+ protect_args([<[ E ]>|Args], B) ->
+ V = a new variable,
+ {Args', B'} = protect_args(Args, [<[ V = E ]>|B]),
+ {[<[ V ]>|Args'], B'}.
+
+
+Expansion
+---------
+
+Expansion recursively rewrites pseudo-assignments until
+the target is a simple variable.
+
+ L#r.f := E
+ => L := L#r{f = E}
+
+ ~f(L) := E
+ => L := <{f ~ E | L}>
+
+ f(L, E2, ..., En) := E
+ => L := 'f:='(L, E2, ..., En, E)
+
+ m:f(L, E2, ..., En) := E
+ => L := m:'f:='(L, L2, ..., En, E)
+
+An assignment function is not a special kind of function but
+an ordinary function with a special form of name. They can
+be exported, imported, remote-called, passed around in or as
+funs, using existing Erlang means.
+
+In particular, there is no automatic connection between
+f/n and 'f:=/(n+1). Importing or exporting one does not
+automatically import or export the other.
+
+Renaming
+--------
+
+After expansion and renaming, there are exactly as many
+pseudo-assignments as there were before, but each one now
+has a simple variable as its entire target.
+
+This is handled by renaming. Instead of thinking of a
+variable as identified by a name, think of it as identified
+by a «name,version» pair. So the assignment
+
+ V := E
+
+is to be thought of (and indeed transformed to)
+
+ «V,n+1» = E
+
+where n is the highest version of V appearing on the
+execution path to this rebinding. If there is no
+such version, n = 0. So
+
+ X := f(...),
+ X := g(..X..),
+ X := h(..X..),
+
+becomes
+
+ «X,1» = f(...),
+ «X,2» = g(..«X,1»..),
+ «X,3» = h(..«X,2»..),
+
+Sequenceas are easy. The difficulty is control
+paths that split and rejoin, like 'if' or 'case'.
+
+If E is a split-join control path, and X is a variable that appears in
+in E and is live after it, and the last occurrences of X in each branch
+of E do not all have the same version, then let «X,m» be the highest
+version of X in E. On each branch of E where a version of X is created,
+replace the highest version of X by «X,m». If a branch does not
+create a version of X and X is not live on entry to E, this is already
+an error in Erlang, and we don't change that. If «X,p» is the version
+of X that is live on entry to E, then add
+
+ «X,m» = «X,p»
+
+just after the -> arrow of each branch that does not update X.
+Here's an exmaple.
+
+ W = 137,
+ if X < Y -> Z = X-1, Z := Z*(Y+1)
+ ; X >= Y -> Z = 42, W := 3145
+ end,
+ f(Z, W)
+
+becomes
+
+ «W,1» = 137,
+ if «X,1» < «Y,1» ->
+ «W,2» = «W,1», % patch
+ «Z,1» = «X,1» - 1,
+ «Z,2» = «Z,1»*(«Y,1»+1)
+ ; «X,1» >= «Y,1» ->
+ «Z,2» = 42, % patch
+ «W,2» = 3145
+ end,
+ f(«Z,2», «W,2»)
+
+The first patch line is added because that branch does not
+update W, and it is added where it is so as not to interfere
+with the result of the rest of the branch.
+The second patch line would have bound «Z,1» except that
+the version was pushed up to to match the other branch.
+
+In effect, we are working with static single assignment form,
+and the patches are pushing the phi-function back into the
+branches.
+
+The semantic analyser and code generator of the compiler never
+get to hear about pseudo-assignment. There is no reason why
+different versions of a variable should be allocated the same
+virtual register or memory cell; it's up to the register
+allocator to do that if it is useful or to do otherwise if
+that's more useful.
+
+Motivation
+==========
+
+Several people have complained on the Erlang mailing list
+that having to write
+
+ X = f(...),
+ X1 = g(..X..),
+ X2 = h(..X1...)
+
+is error prone as well as tedious because if they have to
+reorder the sequence of transformations, add a transformation,
+or remove one, they have to rename the variables.
+
+The fact that "assignment" to whole variables can be modelled
+in a pure declarative language using renaming has been known
+for a long time. I knew it when writing "The Craft of Prolog",
+and it was folklore then. The question was not *could* we
+support
+
+ X := f(...),
+ X := g(..X..),
+ X := h(..X..),
+
+but *should* we?
+
+Loïc Hoguin has argued strongly that "[he] just want[s]
+primitives to easily update deep data structures" (26 Jan 2013),
+saying that Erlang's handling of records is inadequate because it
+makes this difficult. He wrote (25 Jan 2013):
+> Assume a variable Character.
+> This variable contains everything about the character.
+> How do you best access and modify Character?
+> The answer must not involve the process dictionary, processes or
+> message passing.
+> Today I have to write each access and modification function.
+> Or I can generate it,
+> but either way I end up with hundreds of functions in many modules.
+> Or I could use records, and have one line per sub-record per
+> modification function I write.
+> That's not *easy* nor *practical*.
+> Easy and practical is:
+
+> Character.weapon.ability.cost
+
+> for access, and:
+
+> Character.weapon.ability.cost = 123
+
+> for modification.
+
+I don't propose to give him that, but
+
+ C = cost(ability(Character#cinfo.weapon)),
+ cost(ability(Character#cinfo.weapon)) := C + 123
+
+he can have, where all functions might be inlined, or
+
+ C = ~cost ~ability ~weapon Character,
+ ~cost ~ability ~weapon Character := C + 123
+
+in that bright future when we have frames.
+
+The good part, from my point of view, is that this brief
+syntax can be had without introducing mutable data structures.
+This is *pseudo*-assignment. And it is a proven technique
+that has been used for over 25 years.
+
+The bad part, for die-hard assignment fans, is that updating
+deep paths this way requires allocating modified copies of
+records along the way, but we can't change that without
+altering fundamental properties of Erlang.
+
+Rationale
+=========
+
+The questions are: what kind of "assignment" should be offered,
+what syntax should be used for assignment and what targets should`
+be allowed.
+
+Without adding a type system that would permit Haskell-style
+monads or Clean/Mercury-style uniqueness tracking,
+there are two ways to add assignment to Erlang: the Lisp way
+and the S way. The Lisp way is to offer the real thing in
+all its destructive power. That would have the huge benefit
+of making Erlang much more comfortable for C/Java/JavaScript
+programmers, and we could look forward to the day when Erlang
+syntax is finally reformed to be JavaScript with threads. It
+would also have the huge price of requiring major changes to
+the Erlang compiler and runtime system and of voiding one of
+the major guarantees ("your data is safe with us") cherished
+by Erlang programmers. It would make Erlang programs harder
+to get right. Frankly, if we want JavaScript with threads,
+we'd do much better to add threads to JavaScript.
+
+The other way is the S way. S is a programming language
+devised by John Chambers at AT&T for programming statistics
+algorithms. The revised language definition was published
+in 1988. The syntax of S looks like slightly deranged
+C, but the semantics is astonishingly functional. In particular,
+at least up to S3, S values did not detectably share mutable
+parts. An S assignment like
+
+ a[i,j] <- 0
+
+is equivalent to
+
+ "["(a, i, j) <- 0
+
+which is in turn equivalent to
+
+ a <- "[<-"(a, i, j, value = 0)
+
+and this is not merely a fashion of speaking, there really is a
+function named "[<-" which is really called. With its C-like
+syntax, immutable data structures, and lazily evaluated
+function arguments, the S language is is definitely strange.
+But it is highly *practical*. The R repository has a huge
+range of packages doing amazing and useful things; it is used
+in Statistics courses around the world; and there is an
+abundance of excellent books teaching and using S/R. So while
+the idea of "assignment" being syntactic sugar for computing
+a new whole value and rebinding it to a value may seem unfamiliar,
+it is demonstrably both *workable* and *usable*.
+
+Since there is a battle-tested form of "assignment" that does
+not require mutable data structures, that's clearly the way for
+Erlang to go.
+
+As for syntax, I am familiar with
+
+- Lhs = Rhs (Fortran, COBOL, BASIC, C)
+- Lhs := Rhs (Algol, Pascal, Modula, Ada, ANSI Smalltalk)
+- Lhs left-arrow Rhs (APL, classic Smalltalk)
+- Lhs <- Rhs (S)
+- Rhs -> Lhs (S, Pop-2)
+- (set! Lhs Rhs) (Scheme)
+- (setf Lhs Rhs) (Lisp)
+
+Of these, Erlang already uses =, <-, and -> for other purposes,
+and the Unicode left arrow remains difficult to type. Erlang
+syntax is not Lisp syntax, and while LFE has Lispy macros,
+plain Erlang does not. This leaves := as the sole credible contender.
+
+As for the targets of pseudo-assignments, we could simply allow
+Erlang variables. The ability to do renaming-style assignments
+has been frequently requested in the Erlang mailing list. It is
+important to understand that the renaming approach to variable
+"assignment" does not require the ability to rewrite a memory
+cell: different "versions" of a variable may well occupy different
+cells, and whether they do or not is up to the register allocator.
+
+We have also seen Erlang criticised for being unable to express
+chained record updates clearly. Instead of
+
+ X1 = X#r{f = X#r.f#s{g = X#r.f#s.g#t{h = 42}}}
+
+many people would rather write
+
+ X#r.f#s.g#t.h := 42
+
+and who can blame them? (Well, *me*. I think they should not
+be writing code like that whatever the syntax, and found in my
+own C code that purging it of pointer chains uncovered a
+scary number of present and potential bugs. The basic nature
+of the problem is excessive coupling.) So we want to allow
+field references as targets.
+
+The ~f(L) syntax comes from the frames proposal. If record
+fields are pseudo-assignable, so should frame slots be.
+
+If we want to pseudo-assign to elements of hash tables or
+array-like structures, we have to allow function calls.
+The example of S shows that we *can* include function calls on
+the left of assignments, meaning that we can do
+
+ at(Dict, Key) ->
+ case dict:find(Dict, Key)
+ of {ok,V} -> V
+ ; error -> 0
+ end.
+
+ 'at:='(Dict, Key, Value) ->
+ dict:store(Key, Value, Dict).
+
+ ...
+
+ at(D, K) := at(D, K) + 1
+ ...
+ 'element:='(N, Tuple, V) ->
+ setelement(N, Tuple, V).
+ ...
+ A := {0,0,0},
+ element(1, A) := 3,
+ element(2, A) := 1,
+ element(3, A) := 4
+
+Some languages let you assign to substrings.
+If we can pseudo-assign to function calls, we need no
+extra machinery for that:
+
+ 'substr:='(String, Start, New) ->
+ string:substr(String, 1, Start - 1) ++ New.
+
+ 'substr:='(String, Start, Length, New) ->
+ string:substr(String, 1, Start - 1) ++ New ++
+ string:substr(String, Start + Length - 1).
+
+
+The next step in generality would be to follow Algol 68 and
+allow
+
+ if G1 -> B1, L1
+ ; ...
+ ; Gn -> Bn, Ln
+ end := E
+
+meaning
+
+ if G1 -> B1, L1 := E
+ ; ...
+ ; Gn -> Bn, Ln := E
+ end
+
+with similar definitions for 'case' &c. I can't see a
+straightforward way to implement that without either
+duplicating E or doing computations out of order, so it
+seemed like a good idea to stop just before this point.
+
+The 'expansion' step above says that importing or exporting
+f/n does not automatically import or export 'f:='/(n+1).
+This could be a source of unimportant but annoying errors.
+I expect using assignment functions to be more common than
+defining them, and you can write a remote call to a function
+without explicitly importing it, so *writing*
+
+ string:substr(Line, Comment_Start) := ""
+
+does not require any special declaration. The possible
+mistake, then, is to define f/n and 'f:='/(n+1) in a module
+and export the first without exporting the second. If this
+does turn out to be a problem, it will be easy enough to add
+a rule that if f/n is exported and 'f:='/(n+1) is defined,
+the assignment form is exported too. Let's wait and see if
+it's really needed.
+
+Should pseudo-assignment syntax be allowed for a variable's
+initial binding? There does not seem to be any compelling
+reason to forbid it.
+
+There *is* a compelling reason to forbid variables being
+pseudo-assigned inside a list comprehension that are used
+outside it. List comprehensions could be compiled inline
+instead of generating out-of-line recursive functions, as
+they currently do. And variable assignment by actual
+honest-to-goodness smash-that-memory-cell assignment could
+be used to implement such assignments. And the formal
+semantics could remain renaming. But the code generator
+would have to know about it. The renaming semantics
+*could* be implemented with the out-of-line approach, but
+the current values of such variables would have to be
+passed in and returned, creating overheads that would
+surprise me, let alone other programmers. Simpler by far
+just to forbid it. This is not entirely unlike the
+somewhat fiddly scope rules for variables in anonymous
+functions.
+
+
+Backwards Compatibility
+=======================
+
+The token ':=' and the token sequence ':' '=' are not currently
+legal anywhere in Erlang source code, so no existing code is
+directly affected.
+
+Pseudo-assignment is defined as a source to source transformation.
+This transformation is local to the function clause affected and
+can done entirely within the parser.
+
+This means that anything in the Erlang tool chain downstream from
+the parser is unaffected by this change. In particular, profiling,
+monitoring, and debugging tools just see plain old Erlang.
+
+
+Reference Implementation
+========================
+
+None in this draft, though implementation hints are given.
+
+
+
+References
+==========
+
+There are five references here. I do not know why they do not
+appear. Using Markdown is not like dancing around eggs. It
+is like dancing around hand grenades in the dark.
+
+[S]: http://en.wikipedia.org/wiki/S_%28programming_language%29
+ "The S programming language"
+[R]: http://www.r-project.org/
+ "The R Project for Statistical Computing"
+[P]: http://en.wikipedia.org/wiki/POP-2
+ "Pop-2"
+[L]: http://www.lispworks.com/documentation/common-lisp.html
+ "The Common Lisp HyperSpec"
+[M]: http://www.setl.org
+ "GNU SETL"
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+[EmacsVar]: <> "Local Variables:"
+[EmacsVar]: <> "mode: indented-text"
+[EmacsVar]: <> "indent-tabs-mode: nil"
+[EmacsVar]: <> "sentence-end-double-space: t"
+[EmacsVar]: <> "fill-column: 70"
+[EmacsVar]: <> "coding: latin1"
+[EmacsVar]: <> "End:"

0 comments on commit a1595c1

Please sign in to comment.
Something went wrong with that request. Please try again.