Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new operator ^ for pinning of pattern variables #2951

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

richcarl
Copy link
Contributor

@richcarl richcarl commented Dec 24, 2020

This allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default) provides warnings for all uses of already bound variables in patterns that have not been marked with ^.

Note that this branch is a preview, and is based on another branch (#2944) to avoid future complications. Only the last commit pertains to this PR.

@josevalim
Copy link
Contributor

josevalim commented Dec 25, 2020

Hi @richcarl, I have a couple questions based on places where we need to handle ^ specially in Elixir:

  1. What happens when you are binding the variable for the first time but it is repeated?

     {X, X} = {ok, ok}
    

    In Elixir we don't require ^ because ^ is meant to bind to the previous value before the assignment and there is none here. This is also to avoid left-to-right semantics in pattern matching. For example, pinning only the right one would be weird.

  2. There are two places in patterns that support guard expression instead of patterns: the size expression in binaries and map keys. Are you warning and requiring variables to be bound for them? So you have to write:

     %{^X => Value} = Map,
    

    In binaries, you will only warn if you are matching on a previous value:

     Size = ...,
     <<Foo:^Size>>,
    

    But not on this:

     <<X:X>>
    

Regardless of the direction to go, you should probably have tests around those scenarios.


There is another warning we have added to Elixir which I found extremely helpful which is to warn if an underscored variable is matched or repeated in a pattern. For example, if you are matching on Erlang AST, you may write:

some_fun({match, _Anno, {var, _Anno, Var}, _}) ->

And then this doesn't work because the underscored _Anno variables need to match. This will provide similar guarantees to the pin variable but only for underscored variables - but that's something I would say can be done by default because 99% of the cases where this happens, it is accidental.

%% the stack var must be unused after processing the pattern;
%% it can be used either if bound/unsafe before the try, or
%% if it occurs in the class or term part of the pattern
add_error(L, {stacktrace_bound,V}, St);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a commit from #2944 in this PR. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, in the hurry I forgot to mention that in the description. This is to avoid future conflicts.

@richcarl
Copy link
Contributor Author

  1. What happens when you are binding the variable for the first time but it is repeated?

     {X, X} = {ok, ok}
    

    In Elixir we don't require ^ because ^ is meant to bind to the previous value before the assignment and there is none here. This is also to avoid left-to-right semantics in pattern matching. For example, pinning only the right one would be weird.

The current semantics does not change, so the meaning is still that these are both instances of the same new variable, with the constraint that it has to have the same value in both positions. (The compiler implements this using temporary names for both fields, adds a guard to check that they are equal, and then if they match, the value is assigned to X.)

Pinning does not apply here, because that says "evaluate this thing in the environment surrounding the pattern", and there is no previous X there. There is also no specific left/right evaluation order in patterns: conceptually, both variable instances are bound in parallel given the same current environment, and then they are checked for equality.

@richcarl
Copy link
Contributor Author

  1. There are two places in patterns that support guard expression instead of patterns: the size expression in binaries and map keys. Are you warning and requiring variables to be bound for them? So you have to write:
     %{^X => Value} = Map,
    

These subexpressions have the semantics of guard expressions. Their variables are already evaluated in the surrounding environment and can never be previously unbound, so pinning is never required (or even allowed). But you are right that I should add some more test cases.

@richcarl
Copy link
Contributor Author

There is another warning we have added to Elixir which I found extremely helpful which is to warn if an underscored variable is matched or repeated in a pattern. For example, if you are matching on Erlang AST, you may write:

some_fun({match, _Anno, {var, _Anno, Var}, _}) ->

And then this doesn't work because the underscored _Anno variables need to match. This will provide similar guarantees to the pin variable but only for underscored variables - but that's something I would say can be done by default because 99% of the cases where this happens, it is accidental.

I think this is a good idea (the AST example has bitten me once or twice), but that's for another PR.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Jan 4, 2021
@alavrik
Copy link

alavrik commented Jan 11, 2021

A new EEP 55 was added to consider inclusion of this change: https://github.com/erlang/eep/blob/master/eeps/eep-0055.md

@attah
Copy link
Contributor

attah commented Jan 12, 2021

Doesn't the purpose of this get a bit broken when the operator is optional?
Would it not match that fact better if unbound variables can be annotated to ask the compiler to tell you if it becomes an unintentional match instead?

@richcarl
Copy link
Contributor Author

Doesn't the purpose of this get a bit broken when the operator is optional?

No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead.

Would it not match that fact better if unbound variables can be annotated to ask the compiler to tell you if it becomes an unintentional match instead?

No, unbound variables is by far the most common case in patterns, being how you introduce new variable bindings in Erlang. You don't want to annotate each and every one of those.

@RaimoNiskanen
Copy link
Contributor

As others have said: for Elixir this operator is essential, since they
rebind variables without it.

For Erlang, if using a pinning operator had been required from the start;
I think that would have been a bit better than the current "match
if already bound". It is hard to be sure by looking at the code
if the variable is already bound - you have to make a machine search.

Introducing a pinning operator now is trickier...

Having a compiler option to choose if pinning is allowed/required makes it
hard to know what to expect from the code. The compiler option is set in
some Makefile far away from the source code.

I think I would prefer that instead there should be a compiler pragma
(I wish it would not be allowed from an include file but that is probably
impossible to enforce) so it is visible in the current module what to
expect about operator pinning. Without the pragma the pinning operator is
not allowed, with it pinning is mandatory; not a warning - an error if
a pinning operator is missing.

You get the idea: it should be possible from the source code how to read
it, at least on the module level.

How to take the next step i.e when code not using pinning is the exception,
to remove the compiler pragma, I have not thought about yet...

@ferd
Copy link
Contributor

ferd commented Jan 14, 2021

No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead.

I would absolutely hate that stated end state. I don't necessarily have a cogent argument there, if only that the base pattern matching semantics behind Erlang are one of the things I found the most useful and impressive about the language and getting away from there is something I can't find myself agreeing with. It's the kind of thing other languages do and then I'm forced to use them and think "it's much nicer in Erlang."

@josevalim
Copy link
Contributor

josevalim commented Jan 14, 2021

stated end state

And I think this is the one of the most important aspect to discuss. If this is to be added but there is no agreement on the end state, then I would say it is best not to add it in the first place, rather than having two ways of writing the same code based on a compiler flag (or a pragma).

@richcarl
Copy link
Contributor Author

richcarl commented Jan 14, 2021 via email

@essen
Copy link
Contributor

essen commented Jan 14, 2021

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm).

Not true in tests. And tests are often included in the same module as the code when using eunit so you can't just disable this for tests.

@richcarl
Copy link
Contributor Author

richcarl commented Jan 14, 2021 via email

@ferd
Copy link
Contributor

ferd commented Jan 14, 2021

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm). You'd only need to mark these up as being intentional, and everything else stays exactly the same.

I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer.

This is a stance that I take in a vacuum about my own preferences, where I don't have to care for other people's opinions and how confused they might feel by the unfamiliar semantics. Their views are valid and I can understand them, but I won't spend the energy to defend them here. If everything had to be in line with what felt the most welcoming and easy, we'd all be stuck using Go right now and mostly always building from a foundation of familiarity that wouldn't have given rise to Erlang itself; on the other hand without that perspective, there's probably gonna be a more limited community and adoption wouldn't pick up at all, and the sustainability would be in jeopardy.

It's a difficult balance to strike, and I don't know where the proper tradeoff is to be done. So amidst that uncertainty, I'm just voicing my own preferences.

@richcarl
Copy link
Contributor Author

richcarl commented Jan 14, 2021 via email

@kjnilsson
Copy link
Contributor

Automatic matching on already bound variables can make short pieces of code
quite elegant, but I have come to see this sort of thing as a cuteness that
a language for large systems with many developers cannot afford - and I
think it's also one of the little unnecessary obstacles that can turn off
beginners who are trying to make sense of the examples they stumble over:
"oh, you should simply have noted that X was bound up there, better luck
next time".

Absolutely agree here. This happens all the time in substantial code-bases. I even know of developers that have started favouring guards (ugh) for matching on previously bound values to avoid confusion of whether a value in a match is bound or not or if it should be or not. This way it match can be made explicit which is really valuable.

@ferd
Copy link
Contributor

ferd commented Jan 14, 2021

Sounds like you're focusing on the effects in fun heads and list comprehensions, where ^-annotation would add possibilities which don't exist today. But this is far from the main point of the suggestion, and could even be dropped (but it would be an assymmetry if ^ was not allowed in those patterns as well). But for the common case of patterns in functions, =, and case/if/receive/try, there is no shadowing involved, and yet that is where ^-annotations are most important. Without annotations, you cannot read and understand a pattern until you have parsed enough of the context to know which variables are already bound - is it an assertion or just a binding?

I don't really find too many of these "but what if you are reading code out of context" situations realistic. The first thing I always do in a code base is trying to figure out context because it drives everything. Performance optimizations are premature unless contextualized, fixes may break things at a distance unless they're contextualized, and naming of variables can be confusing unless they're contextualized in the domain they steep.

I never gave a crap about the arguments about "I'm refactoring and swapping lines and the , ; . stuff makes it hard to do" precisely because a change in return value never happens in isolation and always has to consider the caller to work. I won't give a crap about the argument of whether a thing is a rebinding match or not because whether it is a rebinding match or not will still need to be placed in its current context to know if a thing is useful or not, and if it's systematic I'm just asked to always provide markers for that context and intent that would be implicit otherwise.

Nothing we ever do here will remove the need for me to understand context. It will ask me to always declare my intent which arguably can be beneficial at times, but in my 10 years of Erlang I can remember 1 bug where this was very significant in a rare edge case situation (one that was too deeply nested and tricky to trigger for it to have been properly tested).

It's very possible other people feel this is very significant and important, but I personally don't.

And the status of this can change without warning if someone does a simple renaming or adds what they thought was a new variable further up. When you're looking at a dozen pull requests every day, this sort of boobytrap is not fun to have around. Automatic matching on already bound variables can make short pieces of code quite elegant, but I have come to see this sort of thing as a cuteness that a language for large systems with many developers cannot afford - and I think it's also one of the little unnecessary obstacles that can turn off beginners who are trying to make sense of the examples they stumble over: "oh, you should simply have noted that X was bound up there, better luck next time".

I haven't found that to be a significant case personally. Again, I'm not denying that experience in other people, but I'm speaking from my own position of comfort from this process. I likely have internalized the pitfall enough that I never mind it anymore.

Either way, the better systematic fix here if we really want to address the unexpected surprises would be to make these conditional expressions with matches have their own scope. I have found that almost everyone I introduced to the language just assumed that the scope of a new conditional was a new scope the way they would be in C-like languages, and that the problem with the pattern match was one of being surprised of that scope.

Arguably, introducing per-expression scopes would remove the very rare "I'm exporting the variable from all clauses so I can use it outside of the expression" case, bring in the shadowing warnings of funs and list comprehensions to case/if/receive/try, and make the use of a ^ pinning marker extremely obvious in intent with warnings otherwise. It would remove any need to specify stuff unless there is an actual risk of confusion (because the scoping rules tell you so), and require the least amount of work to get much clearer semantics. I've got this gut feeling that this current EEP hopes to get to that final result without actually fixing the scoping issues that actually underpin the perceived problem, and they'll be making me do work for the compiler that language semantics could instead address better than by having me toggle an operator every time.

@ferd
Copy link
Contributor

ferd commented Jan 14, 2021

I no kidding have had a lot more cases where the problem was that one of the thing I matched in the clause of a case expression is something I wanted to use externally and had to rename variables inside the case to not clash with usage after the case than when it happens beforehand, and per-expression scope there would fix that on top of the perceived problem behind this EEP. It would be a much more general solution.

Fixing the scoping creates a need for the operator. It feels backwards to create the operator to address the needs you'd have if you had the actual scoping in place, and doing one without the other feels misguided.

@essen
Copy link
Contributor

essen commented Jan 14, 2021

I suspect people who find ^ most useful are also those who have to deal with larger function clauses. I do not see this being useful for my own code bases, but this is most likely very appealing for other code bases.

@RaimoNiskanen
Copy link
Contributor

@ferd wrote:

the problem was that one of the thing I matched in the clause of a case expression is something I wanted to use externally and had to rename variables inside the case to not clash with usage after the case than when it happens beforehand,

I also encounter this every now and then, which raises a question: If you have Y partially matched in a case (not from all clauses), after the case end using ^Y would fail compilation, but a Y could be seen as either a) a deliberately new variable with no compile time problem, or, b) as both using a partially matched variable and missing the pinning operator.

I think a) could work since if you one day would match Y in all clauses you would get an error for missing the pinning operator. Ugly, though, since to utilize this you would have to conciously make sure Y is not matched in all clauses, so b) is probably the safe choice...

@richcarl
Copy link
Contributor Author

Either way, the better systematic fix here if we really want to address the unexpected surprises would be to make these conditional expressions with matches have their own scope. I have found that almost everyone I introduced to the language just assumed that the scope of a new conditional was a new scope the way they would be in C-like languages, and that the problem with the pattern match was one of being surprised of that scope.

Arguably, introducing per-expression scopes would remove the very rare "I'm exporting the variable from all clauses so I can use it outside of the expression" case, bring in the shadowing warnings of funs and list comprehensions to case/if/receive/try, and make the use of a ^ pinning marker extremely obvious in intent with warnings otherwise. It would remove any need to specify stuff unless there is an actual risk of confusion (because the scoping rules tell you so), and require the least amount of work to get much clearer semantics. I've got this gut feeling that this current EEP hopes to get to that final result without actually fixing the scoping issues that actually underpin the perceived problem, and they'll be making me do work for the compiler that language semantics could instead address better than by having me toggle an operator every time.

You're getting ahead of me, but that's definitely something I and the team at WhatsApp have been discussing. But it would be a really big change to the language, and to have any hope of succeeding it would need to be done in stages, with a smooth migration path. The pinning stage is a good first step, making the meaning of each pattern exactly clear regardless of whether shadowing is used or not. I'll be following this up with other suggestions, and local scopes is one of the things we've been trying out - very little code actually uses variables that are exported from subexpressions. But if all of that was done in a big bang change, it might as well be a fork of the language because of all the adoption problems.

@RaimoNiskanen
Copy link
Contributor

@ferd wrote:

I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer.

As I have understood Prolog (never used it myself), unification is quite different from Erlang's first bind then match approach, with more symmetry between the variable occurences.

I think it would be reasonable to have a syntax for matching, as opposed to binding, since they are fundamentally different operations (in contrast to in Prolog), and matching is the rare one.

@galdor
Copy link

galdor commented Jan 15, 2021

For reference, a discussion on this very topic is in progress on the erlang-questions mailing-list.

@pjk25
Copy link
Contributor

pjk25 commented Jan 15, 2021

stated end state

And I think this is the one of the most important aspect to discuss. If this is to be added but there is no agreement on the end state, then I would say it is best not to add it in the first place, rather than having two ways of writing the same code based on a compiler flag (or a pragma).

With regard to end state, I would be happy to see this be added but remain an optional annotation. It requires no changes to existing codebases as I understand it - which means that individual developers or teams can be free to adopt it should it be a net increase of clarity in their particular case.

@michaelklishin
Copy link
Contributor

Great to see some feedback from @josevalim. Pinning from my experience is very useful to have in Elixir.

This feature certainly would have saved team RabbitMQ hours in investigations of some subtle bugs. And sure, there were larger functions involved. It's reasonable to expect that some functions will need some nesting of cases, and there are functions that accept anonymous functions as arguments. So shadowing is a very real thing in Erlang, even in relatively small modules. And so are the pains and time waste associated with it.

Oh, and the "unified Prolog syntax" argument made me literally LOL. Erlang has a certain reputation in our industry for its syntax. Whether justified or not, let us be mindful of it. Unless we want the industrial Erlang user community to eventually reach the size of that of Prolog!

@fisher
Copy link

fisher commented Jan 15, 2021

I don't like this idea at all, the very existence of such a proposal makes me upset.
Instead, I'd like Elixir to fail on x = x + 1 because there is no variable, it is a binding, and it is not an assingment, it is a pattern matching, and this is a clear, concise default.

Adding this as an optional operator makes things even worse in my opinion, because it adds ambiguity and as a consequence, unnecessary complexity. Please keep it simple.

@michalmuskala
Copy link
Contributor

@fisher then you'd be happy that this proposal does not make this work, moreover it would make this fail at compile-time, rather than runtime as it does today. The compiler would warn you that if you indeed want to match (preserving current Erlang semantics) you should have used ^X = X + 1.

Observing the conversation around this proposal here and on the mailing list, I have a general remark. It seems that large number of people commenting have not read through the proposal and are opposing or commenting about things completely unrelated. This is not helpful.

@michaelklishin
Copy link
Contributor

@fisher how does an explicit, opt-in way to express the exact intent of the developer add meaningfully to the complexity? What really does add to the practical complexity of using Erlang is the way things work today. Several members of my team can attest to that.

@essen
Copy link
Contributor

essen commented Jan 17, 2021

@okeuday I don't think ROK has a very good reply. There's no reason there can't still be a shadow warning. When ^ is used the variable refers to the same value. Otherwise the shadow warning can still tell you to be careful about what you're doing.

Funs are not commonly used like this, so translate the example into a scoped case expression and imagine what things could be like. For example we could write something like this with scoped case expressions:

Config = case get_config() of
    no_config -> default_config();
    Config -> Config
end

Today we have to use an intermediate variable, despite Config being the same value. With scopes, we end up with two Config variables essentially, one for each scope, and they simply refer to the same value.

Now if we have scoped case (and other) expressions, we need some way to "import" the variable into the scope for matching. This would be one way. We could write the following, where the "case .. of" part is still in the parent scope and the clause is its own scope:

OldConfig = get_config(),
NewConfig = maybe_update_config(OldConfig),
case OldConfig of
    ^NewConfig -> ok;
    _ -> write_config(NewConfig)
end

There might be other ways than the operator, I don't know, but this is what I see as desirable.

Now I don't know if adding this operator without local scoping is a good idea, but I agree with Richard that local scoping would be good to have and that there needs to be some way to identify if a variable was previously bound. Perhaps the steps to get there should be something like:

  1. Add warning about matching existing variables (optional and would help some people)
  2. Completely disallow exporting variables from case/if expressions, make the warning an error
  3. Add local scoping with the operator (or some other solution, but what?) and clarify and make consistent existing scoping rules
  4. Allow the operator to be used anywhere

Steps 1 and 2 could be done today with little pushback I believe. Step 3 and 4 later on whenever there's consensus on what it would look like. Step 4 should remain optional. I personally have no strong feelings about these steps as long as usage of the operator remains optional.

@okeuday
Copy link
Contributor

okeuday commented Jan 17, 2021

@essen Your example is clearer like this (the example below is clear without the operator):

OldConfig = get_config(),
case maybe_update_config(OldConfig) of
    OldConfig -> ok;
    NewConfig -> write_config(NewConfig)
end

Even with the operator as an optional feature, an intended use to break scopes makes the feature worse. Keeping expressions referentially transparent is important. Benefiting from the isolation of scopes, to isolate complexity is also important (more of a concern in C/C++/Java, but still important here). I haven't seen any positive benefit that the operator could provide in Erlang source code.

@essen
Copy link
Contributor

essen commented Jan 17, 2021

@essen Your example is clearer like this (the example below is clear without the operator):

OldConfig = get_config(),
case maybe_update_config(OldConfig) of
    OldConfig -> ok;
    NewConfig -> write_config(NewConfig)
end

This wouldn't work under the conditions I laid out because you'd have to write ^OldConfig -> ok;, otherwise the second OldConfig is a different variable not bound and it would always match and never write the config.

@slepher
Copy link

slepher commented Jan 20, 2021

Rebinding Variables in erlang could be implemented by parse_transform
maybe you want to try this:
https://github.com/slepher/astranaut#rebinding

@paulo-ferraz-oliveira
Copy link
Contributor

EEP-55's Backwards Compatibility seems to refer to using old code on an Erlang/OTP version that already supports the pinning operator (e.g. OTP 24, 25, ...).

I'm wondering how we'd be able to use new code that uses it in older Erlang/OTP-supported software (e.g. OTP 19, 20, ...). Would a parse transform of any kind be possible, here?

@richcarl
Copy link
Contributor Author

richcarl commented Jan 20, 2021 via email

@michaelklishin
Copy link
Contributor

michaelklishin commented Jan 23, 2021

We now have some practical examples of what this feature could reveal in real world open source projects.

@richcarl experimented with annotating the OTP codebase and published his findings. This seems to be very promising, so @kjnilsson decided to do the same thing to one of the smaller but key RabbitMQ dependencies, Ra. We also found code that looks suspicious as a result: rabbitmq/ra#209.

In a few days worth of work, this helped reveal potential bugs in pretty complex Erlang codebases.

@ranjanified
Copy link
Contributor

ranjanified commented Jan 23, 2021 via email

This allows you to annotate already-bound pattern variables as ^X, like
in Elixir. This is less error prone when code is being refactored and
moved around so that variables previously new in a pattern may become
bound, or vice versa, and makes it easier for the reader to see the intent
of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default)
provides warnings for all uses of already bound variables in patterns that
have not been marked with ^.
@paulo-ferraz-oliveira
Copy link
Contributor

I tried this today on a lib. I maintain. [will try in more if I have time for it]

I liked what I saw. The ^ does improve readability for intent. [mind you, this is a VERY superficial analysis on my behalf, but I'd though about some kind of mechanism like this, in the past, so am probably "more open" to the idea]

I also found a potential issue regarding dead code, as in:

try create_table(A) of
    A -> dead_code; % :-)
    _ -> something_else
...
end.

(even if it doesn't go forward, at least it will help me find some situations that were apparently there that shouldn't be).

Also, yeah, we tend to move code a lot in refactors (not saying that this is something everybody does) and I've been bit by pattern matching errors quite a few times: one can argue "But there should be tests!", but there's not always time to do the right thing (which I'm not, again, proposing is the best way to go).

I also do maintain functions that are sometimes a couple of dozens of lines and having to figure out if a given variable was already bound before the code I copied or wrote, is sometimes a pain (think _Result, which is rather common in our code and could otherwise cause a matching error).

All in all, I'd probably like it to exist in the language (optionally, as proposed) but have not a very strong opinion on the matter. [I tried going through the whole mailing list -stuff, but it got overwhelming after a while]

@fisher
Copy link

fisher commented Jan 31, 2021

I also do maintain functions that are sometimes a couple of dozens of lines and having to figure out if a given variable

It's not a variable, it does not change its value.

was already bound before the code I copied or wrote, is sometimes a pain (think _Result, which is rather common in our code and could otherwise cause a matching error).

If you don't care about the result, don't pattern match it.

@richcarl
Copy link
Contributor Author

richcarl commented Jan 31, 2021

I tried this today on a lib. I maintain. [will try in more if I have time for it]

Thanks for trying it out!

I also found a potential issue regarding dead code, as in:

try create_table(A) of
    A -> dead_code; % :-)
    _ -> something_else
...
end.

I'm not clear what you mean that the issue is - if the code is dead because the pattern A could never match the result of create_table(A) (but Dialyzer could warn about that)? Or if the author thought the A in the clause pattern was a new variable, but then the _ case would be dead instead and the compiler would say so.

@paulo-ferraz-oliveira
Copy link
Contributor

paulo-ferraz-oliveira commented Jan 31, 2021

It's not a variable, it does not change its value.

:-) I don't want to get into an argument over semantics (I'm well aware of how to use variables in Erlang). Let's just say it's documented as being "a variable", even if it is single-assignment.

If you don't care about the result, don't pattern match it.

That was an example, not the whole truth. There might be good reason to name an ignored (_) variable, for readability/maintainability. It's helped us before.

Why else would people do stuff like

    case compile:file(File, Opts) of
        {ok, _Mod} ->

otherwise? [this code is from rebar3]

To me, though, that code is much more readable than

    case compile:file(File, Opts) of
        {ok, _} ->

(and especially when the return is not a simple two-element tuple).

(I also mentioned copying code between modules, which, when left not-completely-tested, may only surface issues much later - which the pinning would help surface sooner).

@paulo-ferraz-oliveira
Copy link
Contributor

the code is dead because the pattern A could never match the result of create_table(A) (but Dialyzer could warn about that)

You are right that dialyzer "could" warn about that, but if my function depends on code not available for the analysis that'll not be so, right?

local_func(A) ->
    case remote:func(A) of
       A -> dead_code;
       _ -> ok
    end.

(@richcarl, maybe I'm missing something)

Note: I used rebar3 dialyzer with options (error_handling, underspecs, unmatched_returns) - unknown left out since the function will only be available after boot.


In the meantime, in another lib., it's helped me find another case where it was useful to re-examine the code:

A = do_this(),
case A of
    {ok, _} ->
        A = do_this(),
        done
    _ ->
        ok
end

... do_this() being executed twice for no reason (and also, with no error), but hindering performance.

@richcarl
Copy link
Contributor Author

You are right that dialyzer "could" warn about that, but if my function depends on code not available for the analysis that'll not be so, right?

local_func(A) ->
    case remote:func(A) of
       A -> dead_code;
       _ -> ok
    end.

Sure, but the thing is, we can't be sure it's dead unless we know what remote:func/1 does (at least its return type). It could be that it's the identity function, for some values of A at least, and then dead_code would in fact be used.

In the meantime, in another lib., it's helped me find another case where it was useful to re-examine the code:

A = do_this(),
case A of
    {ok, _} ->
        A = do_this(),
        done
    _ ->
        ok
end

... do_this() being executed twice for no reason (and also, with no error), but hindering performance.

Yeah, I found one of those in the xmerl library as well. It can happen when copying and pasting, but not cutting out the original line. And the code quietly keeps working. :-)

@paulo-ferraz-oliveira
Copy link
Contributor

paulo-ferraz-oliveira commented Jan 31, 2021

Sure, but the thing is, we can't be sure it's dead unless we know what remote:func/1 does (at least its return type). It could be that it's the identity function, for some values of A at least, and then dead_code would in fact be used.

You're right (I'm not stating we should have an automatic way of finding out if it's dead 😄 - unless I wrapped remote:func/1 in a locally -spec(_).ed function - which would least give me the static analysis guarantee). In this case, though:

  1. before pinning, dialyzer wouldn't complain,
  2. after pinning, dialyzer wouldn't complain, but warn_unpinned_vars would.

After examining that bit of code I found out that it was dead. I wouldn't have, otherwise, if not for this warning: I'm not stating pinning is for that, it just happened to have helped.


Edit:
I've now re-read my message and see how it can be misleading/confusing.

"I also found a potential issue regarding dead code, as in..."
should have read
"I also found dead code in one of our lib.s, using the warn_unpinned_vars, and after analysing the warning it issued".

(not a native English speaker, sorry about that)

@richcarl
Copy link
Contributor Author

  1. after pinning, dialyzer wouldn't complain, but warn_unpinned_vars would.

After examining that bit of code I found out that it was dead. I wouldn't have, otherwise, if not for this warning: I'm not stating pinning is for that, it just happened to have helped.

Yes, the code was not obviously wrong, but definitely suspicious looking.

@paulo-ferraz-oliveira
Copy link
Contributor

paulo-ferraz-oliveira commented Jan 31, 2021

Yes, the code was not obviously wrong, but definitely suspicious looking.

That's what I meant: not wrong (in the same module, the same pattern was used 3 more times), but suspicious 😄.

In the meantime, I've ran it (the warn_unpinned_vars) over 40 something lib.s we maintain internally. Most of the time I find the warnings were for stuff like lists:keyfind (and similar) or pattern-matching in catch, thus the code was OK as was, but still I had to re-read it to understand the developer's initial intention.

My only worries would now be forward compatibility (as expressed before) and the fact that it can/should remain optional.

@KennethL
Copy link
Contributor

KennethL commented Feb 1, 2021

The OTP team had a meeting about EEP 55 and this pull request, and we came to the following conclusion.

We will not approve this pull request for inclusion in OTP 24. OTP 24 will be released in May and before that there will be 3 release candidates with the first one on February 24:th.

We agree that there is a problem that binding when match was intended, and vice-versa,
is a source of error especially in big projects with many beginners and large functions.
This can be worth to solve.

But we need to investigate this more before OTP 25 i order to take a decision.
We also note that if a change like this should be released many other things must be in place before an introduction. Things such as: Erlang editor modes, other tools, documentation, examples, test suites. parse transforms, ...

@s-ledenev
Copy link

The rationale itself is absolutely concocted out of thin air. For anyone who face such "problems" I suggest to write smaller functions and think better on their code composition.

Arguments like "let's do X in our language because I like this at Y" with such flimsy arguments is absolute heresy. And defending arguments like "no context" and so on are weak!

For 20+ years I've seen this many times. Any small unnecessary addition to language itself is the first step for spoiling language. Look at C++ it was mess then and it is complete mess now! Look at the last changes of C# (ex., default implementation for interface) - mess is on the way. Erlang as the language is absolutely perfect! It has small vocabulary and simple usage rules. Don't spoil it! If anyone likes something in Elixir - use it! When in Rome, do as the Romans do!

And I am unhappily surprised that such EEP was registered at all. How could absolutely reasonable Erlang community do that? I hope this EEP would be rejected and this incident would be over and done with.

@lhoguin
Copy link
Contributor

lhoguin commented Apr 24, 2021

@s-ledenev EEPs being registered is not indicative of anything other than them being a proposal for Erlang. There are plenty of EEPs that never led to anything, regardless of their perceived merits.

@potatosalad potatosalad deleted the pinning-operator branch May 31, 2022 23:31
@TD5
Copy link
Contributor

TD5 commented Sep 29, 2022

I would personally have been saved a headache if I'd had this when refactoring some legacy code and accidentally reused a variable.

We agree that there is a problem that binding when match was intended, and vice-versa, is a source of error especially in big projects with many beginners and large functions.

...

We also note that if a change like this should be released many other things must be in place before an introduction.
Things such as: Erlang editor modes, other tools, documentation, examples, test suites. parse transforms, ...

Would putting this change behind an erl_features toggle make the switch more palatable? Then we have an onboarding path so those who want it can get it quickly, but it can stay disabled for maximum compatibility by default until there has been plenty of time to add support in community tooling.

@yushli
Copy link

yushli commented Jun 5, 2023

Stumble across this pull request and the corresponding EEP. Looks like it can nicely fix the variable shadowing issue in ets:fun2ms: https://erlangforums.com/t/variable-shadowing-constraint-of-match-spec-generated-using-ms-transform/2656

@yushli
Copy link

yushli commented Jun 5, 2023

Is it desirable to constraint this pinning operation to be allowed just inside fun header to solve this particular variable shadowing issue?
Also this is still a valid use case for allowing pinning inside a fun head:

You need an operator to avoid shadowing issues, regardless of other uses. This example:

f(X, Y) ->
    F = fun ({a, ^Y}) -> {ok, Y};
            (_) -> error
        end,
    F(X).

@jhogberg jhogberg added team:LG Assigned to OTP language group and removed team:VM Assigned to OTP team VM labels Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:LG Assigned to OTP language group
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet