/
eep-0019.txt
337 lines (243 loc) · 10.8 KB
/
eep-0019.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
EEP: 19
Title: Comprehension multigenerators
Version: $Revision$
Last-Modified: $Date$
Author: Richard A. O'Keefe <ok@cs.otago.ac.nz>
Status: Draft
Type: Standards Track
Erlang-Version: R12B-4
Content-Type: text/plain
Created: 14-Aug-2008
Post-History:
Abstract
Add Clean-inspired multi-sequence generators to comprehensions,
making code more intention-revealing and reducing the need to zip.
This is related to EEP 12 [1], but is independent of it.
Specification
Currently, Erlang has
Pattern <- Expr
to enumerate over the elements of a single list and
Pattern <= Expr
to enumerate over a binary. EEP 12 [1] adds
Pattern [<-] List
Pattern {<-} Tuple
Pattern <<<->> Binary
This proposal changes that to
generator: term_generator | binary_generator;
binary_generator: pattern '<=' expression;
term_generator: term_generator '&&' term_generator
| pattern '<-' expression;
if we otherwise stick with current Erlang, or
generator: term_generator | binary_generator;
binary_generator: pattern '<=' expression
| pattern '<<' '<-' '>>' expression;
term_generator: term_generator '&&' term_generator
| pattern '<-' expression
| pattern '[' '<-' ']' expression
| pattern '{' '<-' '}' expression;
if we go with EEP 12.
Roughly speaking, ignoring errors and side effects,
the effect of P1 <- E1 && ... Pn <- En
is the effect of {P1,...,Pn} <- zip(E1, ..., En)
where
zip([X1|Xs1], ..., [Xn|Xsn]) ->
[{X1,...,Xn} | zip(Xs1, ..., Xsn)];
zip([], ..., []) ->
[].
However, it is expected that there will NOT be any extra list
or tuples created by the implementation; this specifies the
effect but NOT how it is to be implemented.
The effect of a term generator using the new notations of EEP 12
is that which would be obtained by first replacing
P {<-} E with P <- tuple_to_list(E)
P [<-] E with P <- E
and then applying the translation above.
In the presence of errors, the behaviour of && is not precisely
the same as using zip. We need to specify the actual behaviour
more precisely. For brevity, I ignore binary enumeration. Both
tuple enumeration and tuple comprehension are currently defined
by rewriting to plain list comprehension, so that's all we need
to worry about for now.
A list comprehension has the form [E || C1, ..., Cn]
where each Ci is
- a generator Pattern <- List_Expression
- a binding Pattern = Any_Expression
- a "guard" Other_Expression that should give true or false.
This acts like
R = [],
<| E || [C1, ..., Cn] |>(R),
reverse(R)
where
<| E || [] |>(R)
=> R = [E | R] % reassign R
<| E || [Pi <- Ei|Cs] |>(R)
=> Ti = Ei
Label: case Ti
of [Pi|X] -> Ti = X % reassign Ti
<| E || Cs |>(R)
goto Label
; [_ |X] -> Ti = X % reassign Ti
goto Label
; [] -> ok
end
<| E || [Pi = Ei|Cs] |>(R)
=> case Ei
of Pi -> <| E || Cs |>(R)
; _ -> ok
end
<| E || [Ei|Cs] |>(R)
=> case Ei
of true -> <| E || Cs |>(R)
; false -> ok
end
In these translations, pattern matching syntax is used, with the
intent that the variables which are unbound according to the
normal rules of Erlang, and thus get bound by the Pi <- or Pi =
matching, are treated *as if* unbound in the code to be generated,
ignoring whatever values they might previous have had. That also
applies when R or Ti appears on the left of a pattern match; the
fact that the variable really was bound is ignored and a simple
assignment is done.
This does involve (re-)assignment to local variables in the code
to be generated, but it does NOT involve user-visible assignment
and it does NOT involve mutable data structures. It is no more
problematic for the language or the runtime system than reusing a
dead register is.
Handling multi-list enumeration is a simple, albeit schematic,
change to the rule for enumeration.
<| E || [Pi1 <- Ei1 && Pi2 <- Ei2 && ... && Pik <- Eik|Cs] |>(R)
=> Ti1 = Ei1
...
Tik = Eik
Label: case {Ti1,...,Tik}
of {[Pi1|X1], ..., [Pik,Xk]} ->
Ti1 = X1 % reassign
...
Tik = Xk % reassign
<| E || Cs |>(R)
goto label
; {[_|X1], ..., [_|Xk]} ->
Ti1 = X1 % reassign
...
Tik = Xk % reassign
; {[], ..., []} ->
ok
end
Note that the use of tuple syntax in the case expression and the
case clauses does not imply the literal creation of a tuple in
the generated code, only that k values are to be matched against
k patterns in each case clause.
Motivation
"How do I iterate over several lists at once?" is a moderately
common question from Erlang and Haskell beginners. The stock
answer, "use zip", is almost tolerable for Haskell, where the
the zipping family goes up to 7 lists and the compiler works
hard to eliminate the intermediate data structures by using
deforestation. For Erlang, where even zip4 is missing, and
where the apparent cost of creating the unwanted list and
tuples is all too real, the fact that the use of zips makes
the code harder to read means that there is no good to
outweigh the bad.
With the new notation,
zip4(As, Bs, Cs, Ds) ->
[{A,B,C,D} || A <- As && B <- Bs && C <- Cs && D <- Ds].
zipwith4(F, As, Bs, Cs, Ds) ->
[F(A,B,C,D) || A <- As && B <- Bs && C <- Cs && D <- Ds].
dot(Xs, Ys) ->
sum([X*Y || X <- Xs && Y <- Ys]).
ifelse(Tests, Xs, Ys) -> % Simulate R's ifelse(,,)
[ case T of true -> X ; false -> Y end
|| T <- Tests && X <- Xs && Y <- Ys
].
This code from module dialyzer_dep
merge_outs([#output{type=list, content=L1}|Left],
#output{type=list, content=L2}) ->
NewList = [merge_outs([X, Y]) || {X, Y} <- lists:zip(L1, L2)],
merge_outs(Left, output(NewList));
would become
merge_outs([#output{type=list, content=L1}|Left],
#output{type=list, content=L2]) ->
merge_outs(Left, output(
[merge_outs([X,Y]) || X <- L1 && Y <- L2]));
This code from forward_args/3 in module dialyzer_dataflow
NewArgTypes = [t_sup(X, Y) ||
{X, Y} <- lists:zip(ArgTypes, OldArgTypes)],
would become
NewArgTypes = [t_sup(X, Y) || X <- ArgTypes && Y <- OldArgTypes],
Rationale
This is a case where no invention is required, really.
Clean has
Qualifier = Generators {|Guard}
Generators = {Generator}-list
| Generator {& Generator}
Generator = Selector <- ListExpr // lazy list
| Selector <|- ListExpr // overloaded list
| Selector <-: ArrayExpr // array
All I have to do is bend this a little to fit it into Erlang
syntax. Since we use "||" for list comprehensions, "&&" was
the obvious spelling for generators that step together.
I do not yet understand in detail what the Erlang compiler
does, but it seems to involve generating an auxiliary function.
Let's take
[f(X) || X <- Xs, X > 0]
as an example. This seems to be compiled as
foo(Xs)
where
foo([X|Xs]) when X > 0 -> [f(X) | foo(Xs)];
foo([_|Xs]) -> foo(Xs);
foo([]) -> [].
With a multi-sequence generator, the translation is similar.
[g(X, Y) || X <- Xs && Y <- Ys, X > Y]
can be compiled as
bar(Xs, Ys)
where
bar([X|Xs], [Y|Ys]) when X > Y ->
[g(X, Y) | bar(Xs, Ys)];
bar([_|Xs], [_|Ys]) -> bar(Xs, Ys);
bar([], []) -> [].
The specification above gives the kind of translation I would like
to see; I do have an implementation in mind (based on Pop-2) that
doesn't need the reversal but don't know how it would fit in BEAM.
One obvious question is whether we need this at all. Why not just
get people to write calls to lists:zip and get the compiler to
optimise them? One answer is that this notation is much clearer;
the programmer's *intent* is to advance along two or more lists
at the same time, not to create a list of pairs. When you want to
create a list of pairs, lists:zip/2 is the perfect way to do it.
A more important answer is that the proposed notation is NOT a
simple optimisation of equivalent code using lists:zip/2.
[E || {P,Q} <- lists:zip(A, B)] % "zip version"
fails at once if A and B are not proper lists of the same length.
[E || P <- A && Q <- B] % "Clean version"
eventually fails if A and B are not proper lists of the same
length, but may have evaluated E (which may have had side effects)
many times before that. So an Erlang compiler would not be
allowed to replace the "zip version" by the "Clean version" unless
it could prove both that A and B were lists (which may be within
the abilities of the Dialyzer) and that they were exactly the same
length (which as far as I know isn't).
However, a multi-sequence generator and a single-sequence one
using calls to lists:zip/2 are clearly *similar*, so they should
eventually react to lists of different length the same way.
In Haskell, zipping two lists of different length acts as if the
longer were truncated to the length of the shorter. Since
Haskell has lazy evaluation, lists may be infinite, so you can't
afford to wait until the end to start a comprehension. Since
Erlang is strict, and since mistakes are common, lists:zip/2 in
Erlang makes sense as it is.
Backwards Compatibility
The "operator" '&&' is not legal syntax anywhere in Erlang
at the moment, so no existing code can be affected.
Reference Implementation
None yet, but I'd like to do it when I can figure out how.
References
[1] EEP 12, http://www.erlang.org/eeps/eep-0012.html
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: