-
Notifications
You must be signed in to change notification settings - Fork 65
Distinguishing Loads and Stores #1
Comments
This is an issue that most languages with a match expression/statement have had to solve. For example, in Scala:
It would be interesting to make a survey of what other languages do. IIRC most require the use of some special syntax to force the alternate interpretation. |
Yeah, it does elegantly solve both problems. What about simply adding a token to the name... like
Other things we could consider, piggybacking off of this:
|
I notice Rust uses something similar as Scala; their case |
In Elixir you can use the "pin" operator
binds a new variable
matches the value of the existing variable |
At the end of the day, it might not really be necessary to introduce any special syntax to distinguish between loads and stores here, anyway. Based on the syntax currently used in our discussion, we might start by agreeing that anything that is "called" must be a class with a If, in the example given above,
|
A similar question to consider is what happens if a variable appears twice in a pattern. Think, for instance, of In a function definition, At least in Scala, |
That's my POV, but I'd like to convince Brandt, and I do have some niggling doubts that I'll illustrate below.
All agree on that.
Here I'm not so sure. It is a common convention in Python to name constants in order that it is easy to change the actual value of the constant without having to update every use site. This convention started at least in C, which uses UPPER_CASE for such names, and PEP 8 agrees -- but not everyone does (in particular, large parts of mypy don't). So if we have e.g.
then I would like to be able to write things like
This seems especially important when the constants are imported from elsewhere, e.g.
See above.
I think that's the case in this example. The definition is on L248 in the same file and it is clearly a constant (annotated with
That would look a bit clumsy in this case. As a compromise (where have I seen this before?) I would propose that if the name is all lowercase (including underscores and digits) it is an extraction target and if it starts with or contains an uppercase letter it is considered a variable. We would then have to recommend the mypy project to rename things like Note that for constants prefixed with a module and/or class name (anything with a |
Regarding
Or what about
? |
Yes, I see your point with the constant values coming from other libraries, or the need for more complex constant objects. Although, as you pointed out, you could always write constants from other modules as attributes with a dot (e.g., Using upper- vs. lowercase to distinguish constants and variables/targets seems to me like a good compromise. If I recall correctly, Scala uses this convention, based on that Java classes and constants should start with an uppercase characters, whereas variables should not. |
Concerning The |
It should not be hard for the parser to check that there can only be one assignment to any given variable. This should be similar to the check for duplicate parameter names in functions. For There is another issue regarding variable assignments -- suppose we have a pattern
and the target is
Clearly the variables have to be assigned before the guard is executed. Should they be "unset" if the guard fails? The simplest approach just sets variables as matching progresses and leaves them set if we fail at a later sub-pattern. This is similar to how the walrus operator works -- it just assigns to a local variable with standard local scope, e.g.
This behavior in turn comes from Python's general reluctance to introduce nested local scopes. (We do that for comprehensions so it's not entirely unheard of, but there are consequences, e.g. assignment to variables in the containing scope would have to be marked with |
Using the casing of the names to distinguish these seems very un-Pythonic to me. Python's compiler does an excellent job of elegantly and intuitively sorting out local, nonlocal, and global names, while providing workarounds for the less-trivial cases. I would feel disappointed if we settled for case-sensitivity rather than finding a similarly elegant solution here. It seems to me there are four "easy-to-implement" options, and two "hard-to-implement" options. Easy:
Hard:
I feel that Easy 1 and Easy 2 are our best bets. It should, ideally, be trivial to refactor old code to use the new feature without having to change or resolve the names of other code elements. |
I have a slight preference for ?x over x?. The latter, to me, means 'optional'. $x also works for me, or any other single character that is traditionally interpreted as being a variable interpolation in other languages (shell, make, JS, etc.). Pattern matching is a kind of 'reverse interpolation'. (Although I wouldn't suggest using sscanf %s). I have a slight preference over using a token (Easy 2) vs. the other options. |
I recently surprised someone by showing them how easy this was: class _:
for x in y:
... It's just the Python equivalent of "wrap it in braces". 😉 |
This reminds me of an idea I learned nearly 40 years ago, in the context of the design of Python's predecessor, ABC (I wasn't the designer then :-). Too many languages were designed with the convenience of the compiler in mind, requiring the user to do extra work. Forcing the user to mark either loads or stores with extra syntax (either Easy 1 or Easy 2) because it would be too hard for the compiler to figure it out seems like a fallback to those pre-ABC days. Hard 2 is unacceptable because it means the compiler cannot properly generate code for this. (Check out the translate() methods in my sample code in patma.py -- it translates every pattern type into simple code that works in today's Python. The idea would be to do this at the AST level so the bytecode compiler wouldn't have to know about patterns.) I think Hard 1 is doable -- the compiler already knows the local namespace and all containing function namespaces (in case of nested functions). It could be extended to know what's (potentially) set in the global namespace as well, except for But Hard 1 has another issue. There may be variables in outer scopes with simple names (such as I admit that Easy 3 (the case distinction) is also a form of letting the user mark up the distinction, but it has the advantage that it uses an established convention for variable names and constant names, and I am still in favor of it -- it reads quite natural to me, compared to I am not all that worried about having to explain the inconsistency to beginners. Beginners usually don't see all that much consistency -- they see an overwhelming array of confusing notation, and they use their brain's pattern matching to sort it out (some more successful than others :-). It's like when you're explaining a board game to a new player: you inevitably end up mixing hard rules and important strategy concerns together, and beginners often don't know whether they can't make a certain move because it's forbidden or just because it would be a bad idea strategically. (Due to shelter in place I have witnessed this a fair bit, and experienced it myself as well. :-) There are tons of other things we're planning to do slightly different from other places in the language -- e.g. we all seem to agree that the target |
Okay, then let's seriously consider Hard 1. First, a huge data point which aligns with your observation: of all of the listed options, your rough rewriting of our examples just naturally works with both of the "Hard" options, and doesn't work with any of the "Easy" options (including case-sensitive contexts). That's probably because it's a very natural style to those already familiar with the way Python uses names, and builds on the same rules that the language has successfully used for decades. As a result, it can usually "just work" in any existing codebase.
Hm, maybe. We really have two cases to consider here:
If this is the biggest issue, I still think Hard 1 is attractive.
To clarify, are you suggesting the final implementation wouldn't involve changes to the VM/compiler? Or just the proof-of-concept? It seems that these statements will involve some specialized moves/mechanics that could be greatly aided by specialized bytecode. I'm not opposed to the idea of making this an AST transformation (I think it's too early to tell), but it seems surprising to be basing fundamental design decisions on that assumption. I think it's important to recognize that, unlike all of the third-party pattern-matching libraries we're drawing inspiration from, we will have the huge benefit of first-class language support.
Well, I'm not a huge fan of this line of reasoning. Even so, "case-sensitive name contexts" definitely puts the others to shame! |
It seems that we have gotten to a bunch of workable solutions and are debating aesthetics at this point - which is more unpalatable and less Pythonic, a prefix character or case-sensitive name contexts? To avoid an impasse, I suggest several approaches:
Also, aesthetics are affected by familiarity - something may seem ugly at first, but eventually may grow on you. I know this is true for many first-time Python users unfamiliar with the use of indentation to define block structure. So too with these ideas.
Of course, once this reaches the formal PEP stage, there will be an order of magnitude more bikeshedding... |
Writing or looking at real-world code is IMHO a great idea to think about how to move forward. Also because it seems like pattern matching has two wildly different parents. On the one hand we have the Now, if I may add my little bit of experience with pattern matching; although it is not written in Python itself, I have just had a look at how I used match/case-statements in my larger projects such as the Python-parser. To my own surprise, I predominantly use it as a glorified switch to check for enum-values. Interestingly enough, however, those are written as attributes, anyway (e.g., If nothing else, this might explain why I am so much more in favour of assigning to all names that are not used as "attributes" or "calls" :-). And the upper-/lowercase convention is very familiar to me, too, which means that I am hugely biased in this matter. |
For what its worth, let me also respond to Brandt's great list.
A combination of 1 and 4 might be a viable compromise, though, in the following sense. I would still claim that in most cases of pattern matching, you use names as targets/stores to extract values from a data structure. Constant values are expressed as either literals or attributes. If you wanted to use a "regular" name as constant, you declare that in the beginning. But, again, I feel that something like this only makes sense if we are pretty sure that most use cases have no need for such an additional keyword (and my view on "regular use case" here is certainly biased). Finally, if we go for "hard 1", I would only consider global names that are directly assigned on the module level. This would not only exclude anything imported via |
My main desire is to be able to write |
Yeah, that's the one thing that redeems this for me. It also helps that no perfect solution seems to exist yet. I'm okay with moving forward using case-sensitivity for the POC, since it looks like it has 50% support here anyways and is relatively painless to change later. But just for the sake of moving forward. 🙂 |
I looked at a few other languages and how they handle this issue, and found an approach taken by Thorn to be interesting. Scala just uses backticks to mark an identifier as a constant value instead of a variable to be assigned to: case `pi` => print("This is 3.1415936...") This is also used to allow keywords to pass as identifiers/attribute names, which is important because of interoperability with Java (an issue that also comes up with Jython). Thorn introduces several concepts in this regard (however, I got the feeling that Thorn is a research language to try out various concepts, anyway). Their main reasoning, however, is that something is either a variable, or an expression. Variables take on the value of whatever they match in their respective positions. Expressions always yield values, against which the object's values are then compared. However, to properly distinguish between expressions and patterns (including 'store' variables), they introduce the evaluation operator Although the syntax Instead of having a symbol to discriminate load/store semantics as such, I think the notion of an interpolation operator might be something worth discussing or considering. On the one hand, it has a precedent in Python with string interpolation, on the other hand, it opens up much more possibilities than simply a load-marker. On the flip side, though, it would mean to think hard about evaluation order and when variables are effectively assigned to. And we should also be aware that basically the very same thing could be achieved using guards. |
I considered allowing
|
Just to clarify, the cognitive load for remembering the load/store context extends not only on plain names but also on all enclosing calls, for example these two are very different:
This particular case was probably the main reason why I abandoned this idea. |
I agree that However, my main point is that if we want to consider a marker for loads, I would argue that we consider something more general than just a plain load marker. Although the dot-syntax in issue #19 is kind of nice, and would nullify this idea here. |
Question: is |
A related question: Is the following illegal?
The second match arm refers to 'x' because it was assigned a value in the first match arm (which is evaluated first), and we don't 'rewind' variable bindings if a match arm fails. I feel like it should be illegal, but I'm not sure how you would detect a case like this. |
The dot-syntax (and any name-marker syntax, in fact) is easily extensible in the future. E.g. suppose we currently just allow
I propose not to allow reusing variable bindings later in the pattern (of course it is allowed in guards). As this example shows it would be an invitation to cleverness. The Python compiler could easily detect and reject this.
That looks horrible, and should probably be rejected by a static checker, but I don't want to complicate the Python compiler for match statements to be saddled with the responsibility of detecting/rejecting it. But if someone finds an easy way of detecting it I wouldn't object. Python's current compiler does do a complete analysis of local name usage, and maybe we could add some kind of rule that states that if a variable is bound anywhere in a given match statement, it can only be used in guards (and blocks?) for cases that actually bind it. But traditionally this is the kind of thing where we tell users not to do it without making it impossible -- certainly the integrity of the virtual machine is not at stake here. |
I fully agree.
Yes. There are cases like
I think if we wanted to fully avoid this, we would have to introduce a separate scope for each match case to make sure that no bidings escape. However, some future implementation might be able to optimise pattern matching, reorder some cases, or check them "in parallel". It might therefore be a good idea to explicitly state that the
Although it might be interesting to consider this some time in the future, I am also happy with using the dot just as a load marker. A leading dot to mark the current namespace seems natural enough to me and a good compromise. |
@brandtbucher:
@gvanrossum:
I agree it's a bit verbose, but it seems like it clears up a lot of headaches for both the user and the compiler. Example 5, rewritten in the prevailing keyword flavor:
Otherwise, to me, it's not entirely clear where
none_rprimitive
,a
,b
, and maybe evenRUnion
are coming from / going to. Likely, this ambiguity would need to be resolved in the eval loop (likeLOAD_NAME
does), which is unfortunate. Here, everything is known at compile-time.(This could open the door to capturing types like
RUnion
as well... but that's a separate discussion).My opinion is that the headaches a construct like this saves vastly outnumber those due to mistakes like the one you mentioned, which a linter could probably catch in most cases.
The text was updated successfully, but these errors were encountered: