Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable bindings in comprehensions #4862

Closed
kostis opened this issue May 20, 2021 · 26 comments · Fixed by #4878
Closed

Variable bindings in comprehensions #4862

kostis opened this issue May 20, 2021 · 26 comments · Fixed by #4878
Assignees
Labels
bug Issue is reported as a bug in progress team:VM Assigned to OTP team VM

Comments

@kostis
Copy link
Contributor

kostis commented May 20, 2021

Describe the bug
I was playing with comprehensions involving multiple generators and I got some list/binary comprehension results which were news to me. I am posting it here to initiate some discussion.

To Reproduce

Eshell V10.3.4  (abort with ^G)
1> [X || X <- [1,2,3], X <- [3,4,5]].
[3,4,5,3,4,5,3,4,5]
2> [X || X <- [1,2,3], X <- [a,b,c]].
[a,b,c,a,b,c,a,b,c]
3> [X || X <- [1,2,3], Y=X <- [a,b,c]].
[a,b,c,a,b,c,a,b,c]
4> [{X,Y} || X <- [1,2,3], X=Y <- [a,b,c]].
[{a,a},{b,b},{c,c},{a,a},{b,b},{c,c},{a,a},{b,b},{c,c}]

Expected behavior
The above surprised me; I understand why it happens, of course, but I still find it surprising, esp. since in Erlang it's allowed for the term X in the "variable" position of a generator (X <- L) to be an already bound term and thus use it as a filter.

Affected versions
Probably all.

Additional context
The same behavior exists in compiled code, but at least there one gets warnings (a weird one about "an unused variable" and another one about shadowing that at least makes some sense).

@kostis kostis added the bug Issue is reported as a bug label May 20, 2021
@nickva
Copy link
Contributor

nickva commented May 20, 2021

I was surprised by this recently as well. The user guide describes it with a few examples: https://erlang.org/doc/programming_examples/list_comprehensions.html#variable-bindings-in-list-comprehensions and gives the advice to do the matching in the filters.

However, the expressions reference https://erlang.org/doc/reference_manual/expressions.html#list-comprehensions only mentions The variables in the generator patterns, shadow variables in the function clause, surrounding the list comprehensions. but doesn't say what happens when generator variables shadow other generator variables.

@zuiderkwast
Copy link
Contributor

It seems to me that variables in the generator pattern shadow previous bindings of the variable, including previous generators within the same comprehension.

The other construct where we have shadowing is funs, so the examples are somewhat like ...

Eshell V11.1.8  (abort with ^G)
1> lists:flatmap(fun(X) -> lists:flatmap(fun(X) -> [X] end, [3,4,5]) end, [1,2,3]).
[3,4,5,3,4,5,3,4,5]
2> lists:flatmap(fun(X) -> lists:flatmap(fun(X) -> [X] end, [a,b,c]) end, [1,2,3]).
[a,b,c,a,b,c,a,b,c]
3> lists:flatmap(fun(X) -> lists:flatmap(fun(Y=X) -> [X] end, [a,b,c]) end, [1,2,3]).
[a,b,c,a,b,c,a,b,c]
4> lists:flatmap(fun(X) -> lists:flatmap(fun(X=Y) -> [{X,Y}] end, [a,b,c]) end, [1,2,3]).
[{a,a},{b,b},{c,c},{a,a},{b,b},{c,c},{a,a},{b,b},{c,c}]

... aren't they? I imagine the solution to this bug can be to update the documentation.

Or did I miss something here?

@bjorng bjorng self-assigned this May 24, 2021
@bjorng bjorng added the team:VM Assigned to OTP team VM label May 24, 2021
@bjorng
Copy link
Contributor

bjorng commented May 24, 2021

@zuiderkwast List comprehensions inherited the variables shadowing rules from funs, because lists comprehensions were originally implemented using funs.

I will update the documentation.

bjorng added a commit to bjorng/otp that referenced this issue May 24, 2021
@bjorng bjorng linked a pull request May 24, 2021 that will close this issue
bjorng added a commit that referenced this issue May 26, 2021
…-17432

Clarify the rules for variable shadowing in comprehensions
@bjorng bjorng closed this as completed in a9f1d75 May 26, 2021
@kostis
Copy link
Contributor Author

kostis commented May 26, 2021

As I wrote, I opened this issue to initiate some discussion on (what should be) the semantics of variable bindings in comprehensions (esp. since generators already permit filtering). My goal was to (hopefully) improve the language and (try to) make its semantics cleaner and more sane. Instead, all I got was some (unrelated IMO) answer about variable shadowing in funs (what do parameters in nested funs have to do with what I pointed out?) and a comment that "things are they way they are because once upon a time we used to implement comprehensions using funs." What does this have to do with what the semantics of fresh variables in list comprehensions should be?

Anyway, I find this issue sad :-( But I should have known better by now...

@bjorng
Copy link
Contributor

bjorng commented May 26, 2021

What is your suggestion for improving the language?

Your only comment under "Expected behavior" was that the behavior surprised you. More useful would be to rewrite your example under "To Reproduce" to clearly show what your expected result was. Could you do it now to make it clear what you mean?

Also, could you clarify the following:

esp. since in Erlang it's allowed for the term X in the "variable" position of a generator (X <- L) to be an already bound term and thus use it as a filter.

It is not clear to me what you mean. An already bound variable cannot be used as a filter (it will be rebound):

1> X = a.
a
2> [X || X <- [a,b,c]].
[a,b,c]

The mention of funs was to explain the history of the (surprising) semantics, that variables will be rebound instead of compared to the previous binding (in contrast to the behavior in the rest of the language).

We are open to improvements of the language if you can explain them in a way that we can understand and if there is as way to do that in a somewhat backward-compatible way (for example, silently changing the semantics of existing code would not be acceptable, but first introducing a warning that the behavior would change and change it in some future release could be).

@kostis
Copy link
Contributor Author

kostis commented May 26, 2021

It is not clear to me what you mean. An already bound variable cannot be used as a filter (it will be rebound):

1> X = a.
a
2> [X || X <- [a,b,c]].
[a,b,c]

The mention of funs was to explain the history of the (surprising) semantics, that variables will be rebound instead of compared to the previous binding (in contrast to the behavior in the rest of the language).

Right. So, at least we agree that the current status is inconsistent, don't we?

What is your suggestion for improving the language?

Make the language consistent, even at the expense of breaking backwards compatibility.

Scopes of variables and bindings in comprehensions should not be different from scopes of variables in the rest of the language.

In particular, my example showed that even fresh variables, introduced within the comprehension (i.e., not coming from outside) and used only there, behave in a way which is non-intuitive even to a seasoned Erlang programmer. (And given this was not even mentioned in the docs, it was probably even news to most Erlang programmers out there.)

To make what I mean explicit, given that Erlang allows pattern matching with already bound variables, I would expect that

[X || X <- [1,2,3], X <- [2,3,4]]

produces [2,3] in exactly the same way that the following two comprehensions already do:

[X || X <- [1,2,3], Y <- [2,3,4], X =:= Y]
[Z || X <- [1,2,3], Y <- [2,3,4], (Z = X) =:= Y].

Edit:
For completeness, I should mention that an alternative way to fix such issues is for the compiler to insist that any variable mentioned in a generator position in a list comprehension must be fresh.
This would at least not allow comprehensions like [X || X <- [1,2,3], X <- [2,3,4]] or X = ..., [... || X <- L, ...].

@RaimoNiskanen
Copy link
Contributor

What about

X = a,
[X || X <- [a,b,c]].

?

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

@lhoguin
Copy link
Contributor

lhoguin commented May 26, 2021

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

Yes. Please. :-)

@kostis
Copy link
Contributor Author

kostis commented May 26, 2021

What about

X = a,
[X || X <- [a,b,c]].

?

Should the X in the list comprehension, ideally, be the preceding X that is bound to a? This would result in [a], right?

Yes. In exactly the same way that the following list comprehension produces [a]:

Y = a,
[X || X <- [a,b,c], X =:= Y].

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 26, 2021

In X <- [a,b,c], the variable is bound multiple times, so it doesn't makes sense that its scope is the entire surrounding scope. [Edit] And for nested scopes we have shadowing.

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented May 26, 2021

[Y || {1,Y} <- [{1,a}, {2,b}]

produces [a]
Does not that make it reasonable that

X = 1,
[Y || {X,Y} <- [{1,a}, {2,b}]

also should produce [a]
?

The question is; should we have nested scopes?
Before fun:s were added, and later comprehensions, there were no such thing, The scope was the function.

@lhoguin
Copy link
Contributor

lhoguin commented May 26, 2021

The most important is that we have consistent scopes as @kostis said. It doesn't really matter if the scope is the function, or if we have nested scopes, though I would personally favor the function scope.

Though for funs since you have a function inside a function that's pretty much the definition of nested scopes...

@KennethL
Copy link
Contributor

KennethL commented May 26, 2021 via email

@bjorng
Copy link
Contributor

bjorng commented May 26, 2021

@kostis Nice explanation!

Right. So, at least we agree that the current status is inconsistent, don't we?

Yes, we do.

Make the language consistent, even at the expense of breaking backwards compatibility.

I am all for making language consistent, if (as I already said in other words) it can be done in a way that will allow existing code bases to be gradually migrated.

For completeness, I should mention that an alternative way to fix such issues is for the compiler to insist that any variable mentioned in a generator position in a list comprehension must be fresh.

That would probably be possible to do. At first, reusing a variable would result in a warning (distinct from the current shadowing warning), and several releases later it could be forbidden.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 26, 2021

The pattern matching in generators is consistent with funs, not with case/receive/of/catch. If it's changed to be consistent with case/etc., it will be inconsistent with funs. What do we win?

I do think shadowing is a source of errors and that a warning is nice to have. Why doesn't the interpreter/shell emit warnings?

@RaimoNiskanen
Copy link
Contributor

@KennethL: I realized my mistake and edited my post after about 30 s to your corrected code.

I doubt we will get consensus in the community for any kind of change in this domain. This should maybe be something for a language extension/variant mechanism, that we do not yet have.

@lhoguin: A fun can only be defined in a function, so we can choose to say that a fun does not have a scope of its own. The current solution that a fun has a scope of its own, but not quite, feels half baked.

@zuiderkwast: We would win consistency. And I think consistency with case/receive/if is the right way since they are the original constructs.

@lhoguin
Copy link
Contributor

lhoguin commented May 26, 2021

@lhoguin: A fun can only be defined in a function, so we can choose to say that a fun does not have a scope of its own. The current solution that a fun has a scope of its own, but not quite, feels half baked.

I do not think we can choose to say a fun does not have a scope of its own, because people have been using funs specifically to have a scope (in macros for example). We can say that any variable defined before must match (instead of shadowing; and I would prefer that), but the variables defined in the fun will not be available outside anyway, even if the fun is ran immediately, so you do have a scope regardless.

I suppose LCs are the same in that regard (variables defined before should match IMO, but variables defined in the LC shouldn't be available after the LC).

@RaimoNiskanen
Copy link
Contributor

That is kind of like an unsafe variable from a case. In this case it is unsafe because the fun may not have execute yet.
Still, we do not say that a case has got a variable scope. (or do we?)
Terminology is hard.

I think the macros instead need a feature to create macro local variables, or rather variables unique to a macro invocation. Using funs to solve that problem is a bit like using a pipe wrench as a hammer.

@lhoguin
Copy link
Contributor

lhoguin commented May 26, 2021

But the fun may execute in a different function, so what happens then? Are the variables defined in the fun "exported" to the function that executes it?

@RaimoNiskanen
Copy link
Contributor

@lhoguin: You lost me there.

Variables introduced in a fun can only be "exported" to the surrounding syntactical scope, and there they must always be "unsafe" because the fun may not have executed yet. So, a variable introduced in a fun can not be used outside the fun. And still, one can have the view that the variable scope is the surrounding function; the variable is merely unusable after the fun's end.

I just compared with introducing a variable in some but not all case clauses. Then the variable exists after the case's end, but is unsafe and therefore unusable.

@zuiderkwast
Copy link
Contributor

@lhoguin If you want to remove shadowing in fun heads, would you then want

X = 1,
Fun = fun (X) -> X + 1 end.

to be equivalent to

Fun = fun (1) -> 1 + 1 end.

?

@essen
Copy link
Contributor

essen commented May 26, 2021

@zuiderkwast Yes. Of course that example is nonsensical. It becomes more interesting when X is matched as part of a tuple or a some other structure.

A slightly better example could be:

List = [...],
X = abc,
Fun = fun ({X, Y}) -> do_something(Y);
          ({_, _}) -> ok
end,
lists:foreach(Fun, List).

It avoids the whole when X =:= Y that is unnatural compared to the rest of Erlang.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 27, 2021

@lhoguin @essen I agree that when X =:= Y is unnatural, just like similar filters in comprehensions. However, if someone mistakes fun(X) -> X + 1 end for a fun with shadowing while it isn't, this too can be a source of error or confusion, especially since that's how it used to work before. Perhaps emitting a warning or an error for shadowing is the best option then, to avoid the situation, or---if it's possible to detect that new kind of probably-not-what-you-want situation---some other warning for that.

@lhoguin
Copy link
Contributor

lhoguin commented May 28, 2021

@zuiderkwast I suppose it goes back to the discussion about ^ we had not too long ago.

@RaimoNiskanen
Copy link
Contributor

I have been itching to bring up how much clarity the pinning operator possibly may bring to variable shadowing... ;-)

@zuiderkwast
Copy link
Contributor

For reference:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug in progress team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants