Variable bindings in comprehensions #4862

kostis · 2021-05-20T17:34:45Z

Describe the bug
I was playing with comprehensions involving multiple generators and I got some list/binary comprehension results which were news to me. I am posting it here to initiate some discussion.

To Reproduce

Eshell V10.3.4  (abort with ^G)
1> [X || X <- [1,2,3], X <- [3,4,5]].
[3,4,5,3,4,5,3,4,5]
2> [X || X <- [1,2,3], X <- [a,b,c]].
[a,b,c,a,b,c,a,b,c]
3> [X || X <- [1,2,3], Y=X <- [a,b,c]].
[a,b,c,a,b,c,a,b,c]
4> [{X,Y} || X <- [1,2,3], X=Y <- [a,b,c]].
[{a,a},{b,b},{c,c},{a,a},{b,b},{c,c},{a,a},{b,b},{c,c}]

Expected behavior
The above surprised me; I understand why it happens, of course, but I still find it surprising, esp. since in Erlang it's allowed for the term X in the "variable" position of a generator (X <- L) to be an already bound term and thus use it as a filter.

Affected versions
Probably all.

Additional context
The same behavior exists in compiled code, but at least there one gets warnings (a weird one about "an unused variable" and another one about shadowing that at least makes some sense).

nickva · 2021-05-20T18:29:38Z

I was surprised by this recently as well. The user guide describes it with a few examples: https://erlang.org/doc/programming_examples/list_comprehensions.html#variable-bindings-in-list-comprehensions and gives the advice to do the matching in the filters.

However, the expressions reference https://erlang.org/doc/reference_manual/expressions.html#list-comprehensions only mentions The variables in the generator patterns, shadow variables in the function clause, surrounding the list comprehensions. but doesn't say what happens when generator variables shadow other generator variables.

zuiderkwast · 2021-05-20T21:54:59Z

It seems to me that variables in the generator pattern shadow previous bindings of the variable, including previous generators within the same comprehension.

The other construct where we have shadowing is funs, so the examples are somewhat like ...

Eshell V11.1.8  (abort with ^G)
1> lists:flatmap(fun(X) -> lists:flatmap(fun(X) -> [X] end, [3,4,5]) end, [1,2,3]).
[3,4,5,3,4,5,3,4,5]
2> lists:flatmap(fun(X) -> lists:flatmap(fun(X) -> [X] end, [a,b,c]) end, [1,2,3]).
[a,b,c,a,b,c,a,b,c]
3> lists:flatmap(fun(X) -> lists:flatmap(fun(Y=X) -> [X] end, [a,b,c]) end, [1,2,3]).
[a,b,c,a,b,c,a,b,c]
4> lists:flatmap(fun(X) -> lists:flatmap(fun(X=Y) -> [{X,Y}] end, [a,b,c]) end, [1,2,3]).
[{a,a},{b,b},{c,c},{a,a},{b,b},{c,c},{a,a},{b,b},{c,c}]

... aren't they? I imagine the solution to this bug can be to update the documentation.

Or did I miss something here?

bjorng · 2021-05-24T06:21:40Z

@zuiderkwast List comprehensions inherited the variables shadowing rules from funs, because lists comprehensions were originally implemented using funs.

I will update the documentation.

Closes erlang#4862

…-17432 Clarify the rules for variable shadowing in comprehensions

kostis · 2021-05-26T07:46:19Z

As I wrote, I opened this issue to initiate some discussion on (what should be) the semantics of variable bindings in comprehensions (esp. since generators already permit filtering). My goal was to (hopefully) improve the language and (try to) make its semantics cleaner and more sane. Instead, all I got was some (unrelated IMO) answer about variable shadowing in funs (what do parameters in nested funs have to do with what I pointed out?) and a comment that "things are they way they are because once upon a time we used to implement comprehensions using funs." What does this have to do with what the semantics of fresh variables in list comprehensions should be?

Anyway, I find this issue sad :-( But I should have known better by now...

bjorng · 2021-05-26T08:26:07Z

What is your suggestion for improving the language?

Your only comment under "Expected behavior" was that the behavior surprised you. More useful would be to rewrite your example under "To Reproduce" to clearly show what your expected result was. Could you do it now to make it clear what you mean?

Also, could you clarify the following:

esp. since in Erlang it's allowed for the term X in the "variable" position of a generator (X <- L) to be an already bound term and thus use it as a filter.

It is not clear to me what you mean. An already bound variable cannot be used as a filter (it will be rebound):

1> X = a.
a
2> [X || X <- [a,b,c]].
[a,b,c]

The mention of funs was to explain the history of the (surprising) semantics, that variables will be rebound instead of compared to the previous binding (in contrast to the behavior in the rest of the language).

We are open to improvements of the language if you can explain them in a way that we can understand and if there is as way to do that in a somewhat backward-compatible way (for example, silently changing the semantics of existing code would not be acceptable, but first introducing a warning that the behavior would change and change it in some future release could be).

kostis · 2021-05-26T08:58:09Z

It is not clear to me what you mean. An already bound variable cannot be used as a filter (it will be rebound):

1> X = a.
a
2> [X || X <- [a,b,c]].
[a,b,c]

The mention of funs was to explain the history of the (surprising) semantics, that variables will be rebound instead of compared to the previous binding (in contrast to the behavior in the rest of the language).

Right. So, at least we agree that the current status is inconsistent, don't we?

What is your suggestion for improving the language?

Make the language consistent, even at the expense of breaking backwards compatibility.

Scopes of variables and bindings in comprehensions should not be different from scopes of variables in the rest of the language.

In particular, my example showed that even fresh variables, introduced within the comprehension (i.e., not coming from outside) and used only there, behave in a way which is non-intuitive even to a seasoned Erlang programmer. (And given this was not even mentioned in the docs, it was probably even news to most Erlang programmers out there.)

To make what I mean explicit, given that Erlang allows pattern matching with already bound variables, I would expect that

[X || X <- [1,2,3], X <- [2,3,4]]

produces [2,3] in exactly the same way that the following two comprehensions already do:

[X || X <- [1,2,3], Y <- [2,3,4], X =:= Y]
[Z || X <- [1,2,3], Y <- [2,3,4], (Z = X) =:= Y].

Edit:
For completeness, I should mention that an alternative way to fix such issues is for the compiler to insist that any variable mentioned in a generator position in a list comprehension must be fresh.
This would at least not allow comprehensions like [X || X <- [1,2,3], X <- [2,3,4]] or X = ..., [... || X <- L, ...].

RaimoNiskanen · 2021-05-26T09:18:37Z

What about

X = a,
[X || X <- [a,b,c]].

?

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

lhoguin · 2021-05-26T09:21:01Z

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

Yes. Please. :-)

kostis · 2021-05-26T09:26:09Z

What about
X = a,
[X || X <- [a,b,c]].
?

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

Yes. In exactly the same way that the following list comprehension produces [a]:

Y = a,
[X || X <- [a,b,c], X =:= Y].

zuiderkwast · 2021-05-26T09:44:46Z

In X <- [a,b,c], the variable is bound multiple times, so it doesn't makes sense that its scope is the entire surrounding scope. [Edit] And for nested scopes we have shadowing.

RaimoNiskanen · 2021-05-26T09:51:44Z

[Y || {1,Y} <- [{1,a}, {2,b}]

produces [a]
Does not that make it reasonable that

X = 1,
[Y || {X,Y} <- [{1,a}, {2,b}]

also should produce [a]
?

The question is; should we have nested scopes?
Before fun:s were added, and later comprehensions, there were no such thing, The scope was the function.

lhoguin · 2021-05-26T09:57:59Z

The most important is that we have consistent scopes as @kostis said. It doesn't really matter if the scope is the function, or if we have nested scopes, though I would personally favor the function scope.

Though for funs since you have a function inside a function that's pretty much the definition of nested scopes...

KennethL · 2021-05-26T09:58:46Z

I think you must mean ? X = 1, [Y || {X,Y} <- [{1,a}, {2,b}] Should produce [a] /Kenneth

…

On Wed, May 26, 2021 at 11:52 AM Raimo Niskanen ***@***.***> wrote: [Y || {1,Y} <- [{1,a}, {2,b}] produces [a] Does not that make it reasonable that X = a, [Y || {X,Y} <- [{1,a}, {2,b}] also should produce [a] ? The question is; should we have nested scopes? Before fun:s were added, and later list comprehensions, there were no such thing, The scope was the function. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4862 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABFWSGC7H2JQV7UD4DNFB3TPTAERANCNFSM45HQDZFA> .

bjorng · 2021-05-26T10:12:15Z

@kostis Nice explanation!

Right. So, at least we agree that the current status is inconsistent, don't we?

Yes, we do.

Make the language consistent, even at the expense of breaking backwards compatibility.

I am all for making language consistent, if (as I already said in other words) it can be done in a way that will allow existing code bases to be gradually migrated.

For completeness, I should mention that an alternative way to fix such issues is for the compiler to insist that any variable mentioned in a generator position in a list comprehension must be fresh.

That would probably be possible to do. At first, reusing a variable would result in a warning (distinct from the current shadowing warning), and several releases later it could be forbidden.

zuiderkwast · 2021-05-26T11:29:52Z

The pattern matching in generators is consistent with funs, not with case/receive/of/catch. If it's changed to be consistent with case/etc., it will be inconsistent with funs. What do we win?

I do think shadowing is a source of errors and that a warning is nice to have. Why doesn't the interpreter/shell emit warnings?

RaimoNiskanen · 2021-05-26T11:45:30Z

@KennethL: I realized my mistake and edited my post after about 30 s to your corrected code.

I doubt we will get consensus in the community for any kind of change in this domain. This should maybe be something for a language extension/variant mechanism, that we do not yet have.

@lhoguin: A fun can only be defined in a function, so we can choose to say that a fun does not have a scope of its own. The current solution that a fun has a scope of its own, but not quite, feels half baked.

@zuiderkwast: We would win consistency. And I think consistency with case/receive/if is the right way since they are the original constructs.

lhoguin · 2021-05-26T11:53:21Z

@lhoguin: A fun can only be defined in a function, so we can choose to say that a fun does not have a scope of its own. The current solution that a fun has a scope of its own, but not quite, feels half baked.

I do not think we can choose to say a fun does not have a scope of its own, because people have been using funs specifically to have a scope (in macros for example). We can say that any variable defined before must match (instead of shadowing; and I would prefer that), but the variables defined in the fun will not be available outside anyway, even if the fun is ran immediately, so you do have a scope regardless.

I suppose LCs are the same in that regard (variables defined before should match IMO, but variables defined in the LC shouldn't be available after the LC).

RaimoNiskanen · 2021-05-26T12:00:46Z

That is kind of like an unsafe variable from a case. In this case it is unsafe because the fun may not have execute yet.
Still, we do not say that a case has got a variable scope. (or do we?)
Terminology is hard.

I think the macros instead need a feature to create macro local variables, or rather variables unique to a macro invocation. Using funs to solve that problem is a bit like using a pipe wrench as a hammer.

lhoguin · 2021-05-26T12:10:42Z

But the fun may execute in a different function, so what happens then? Are the variables defined in the fun "exported" to the function that executes it?

RaimoNiskanen · 2021-05-26T12:31:38Z

@lhoguin: You lost me there.

Variables introduced in a fun can only be "exported" to the surrounding syntactical scope, and there they must always be "unsafe" because the fun may not have executed yet. So, a variable introduced in a fun can not be used outside the fun. And still, one can have the view that the variable scope is the surrounding function; the variable is merely unusable after the fun's end.

I just compared with introducing a variable in some but not all case clauses. Then the variable exists after the case's end, but is unsafe and therefore unusable.

zuiderkwast · 2021-05-26T21:03:38Z

@lhoguin If you want to remove shadowing in fun heads, would you then want

X = 1,
Fun = fun (X) -> X + 1 end.

to be equivalent to

Fun = fun (1) -> 1 + 1 end.

?

essen · 2021-05-26T21:20:03Z

@zuiderkwast Yes. Of course that example is nonsensical. It becomes more interesting when X is matched as part of a tuple or a some other structure.

A slightly better example could be:

List = [...],
X = abc,
Fun = fun ({X, Y}) -> do_something(Y);
          ({_, _}) -> ok
end,
lists:foreach(Fun, List).

It avoids the whole when X =:= Y that is unnatural compared to the rest of Erlang.

zuiderkwast · 2021-05-27T20:06:24Z

@lhoguin @essen I agree that when X =:= Y is unnatural, just like similar filters in comprehensions. However, if someone mistakes fun(X) -> X + 1 end for a fun with shadowing while it isn't, this too can be a source of error or confusion, especially since that's how it used to work before. Perhaps emitting a warning or an error for shadowing is the best option then, to avoid the situation, or---if it's possible to detect that new kind of probably-not-what-you-want situation---some other warning for that.

lhoguin · 2021-05-28T07:33:27Z

@zuiderkwast I suppose it goes back to the discussion about ^ we had not too long ago.

RaimoNiskanen · 2021-05-28T11:09:36Z

I have been itching to bring up how much clarity the pinning operator possibly may bring to variable shadowing... ;-)

zuiderkwast · 2021-05-28T12:02:21Z

For reference:

Add a new operator ^ for pinning of pattern variables #2951, the pinning operator PR + discussion
https://erlang.org/eeps/eep-0055.html, the pinning operator EEP

kostis added the bug Issue is reported as a bug label May 20, 2021

kostis mentioned this issue May 20, 2021

Map comprehensions #4856

Closed

bjorng self-assigned this May 24, 2021

bjorng added the team:VM Assigned to OTP team VM label May 24, 2021

bjorng added a commit to bjorng/otp that referenced this issue May 24, 2021

Clarify the rules for variables shadowing in comprehensions

fb52bb6

Closes erlang#4862

bjorng mentioned this issue May 24, 2021

Clarify the rules for variable shadowing in comprehensions #4878

Merged

bjorng linked a pull request May 24, 2021 that will close this issue

Clarify the rules for variable shadowing in comprehensions #4878

Merged

rickard-green added the in progress label May 24, 2021

bjorng added a commit that referenced this issue May 26, 2021

Merge pull request #4878 from bjorng/bjorn/ref-manual-lcs/GH-4862/OTP…

0758a4c

…-17432 Clarify the rules for variable shadowing in comprehensions

bjorng closed this as completed in a9f1d75 May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable bindings in comprehensions #4862

Variable bindings in comprehensions #4862

kostis commented May 20, 2021

nickva commented May 20, 2021

zuiderkwast commented May 20, 2021

bjorng commented May 24, 2021

kostis commented May 26, 2021

bjorng commented May 26, 2021

kostis commented May 26, 2021 •

edited

Loading

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

kostis commented May 26, 2021

zuiderkwast commented May 26, 2021 •

edited

Loading

RaimoNiskanen commented May 26, 2021 •

edited

Loading

lhoguin commented May 26, 2021

KennethL commented May 26, 2021 via email

bjorng commented May 26, 2021

zuiderkwast commented May 26, 2021 •

edited

Loading

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

RaimoNiskanen commented May 26, 2021

zuiderkwast commented May 26, 2021

essen commented May 26, 2021

zuiderkwast commented May 27, 2021 •

edited

Loading

lhoguin commented May 28, 2021

RaimoNiskanen commented May 28, 2021

zuiderkwast commented May 28, 2021

Variable bindings in comprehensions #4862

Variable bindings in comprehensions #4862

Comments

kostis commented May 20, 2021

nickva commented May 20, 2021

zuiderkwast commented May 20, 2021

bjorng commented May 24, 2021

kostis commented May 26, 2021

bjorng commented May 26, 2021

kostis commented May 26, 2021 • edited Loading

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

kostis commented May 26, 2021

zuiderkwast commented May 26, 2021 • edited Loading

RaimoNiskanen commented May 26, 2021 • edited Loading

lhoguin commented May 26, 2021

KennethL commented May 26, 2021 via email

bjorng commented May 26, 2021

zuiderkwast commented May 26, 2021 • edited Loading

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

RaimoNiskanen commented May 26, 2021

lhoguin commented May 26, 2021

RaimoNiskanen commented May 26, 2021

zuiderkwast commented May 26, 2021

essen commented May 26, 2021

zuiderkwast commented May 27, 2021 • edited Loading

lhoguin commented May 28, 2021

RaimoNiskanen commented May 28, 2021

zuiderkwast commented May 28, 2021

kostis commented May 26, 2021 •

edited

Loading

zuiderkwast commented May 26, 2021 •

edited

Loading

RaimoNiskanen commented May 26, 2021 •

edited

Loading

zuiderkwast commented May 26, 2021 •

edited

Loading

zuiderkwast commented May 27, 2021 •

edited

Loading