Revisit load vs. store #90
Comments
Yeah, this will likely be the biggest sticking-point. I'm not personally even 80% satisfied with any proposed option, which is... frustrating. |
This is probably one of the most obvious cases where the two different heritages of pattern matching clash. Hence, whatever solution we come up with, half of the people won't like it because it contradicts their approach to this. One of the major problems I see here is that we often discuss this without context in an abstract space of possibilities. A single We should for the sake of this discussion probably rather stick to examples like the I think we quite agree that all dotted names must have load semantics. The rationale being that we only want to assign to local variables. Ideally, pattern matching has no side effects beyond these local variables (we cannot strictly enforce this in Python, of course) and assigning to dotted names would basically mean that we write to attributes and global variables. Flipping the semantics of
Given that we are moving inside a network of interdependent rules, customs, and use cases, there are realistically much fewer feasible options than what you would naively expect. |
@JelleZijlstra brought up a good argument for marking extraction variables:
(Though I think this would be an even stronger argument with "?" instead of "$". :-) |
Hello Guido, Brandt and Tobias :) I would like to bring to attention an alternative for extraction variables that I posted in the Python-Dev mailing list, which is in the same vein as a proposal from @dmoisset (feel free to correct/elaborate on anything Daniel). Posting it here is motivated by a sense that it may have flown under the radar while the discussion comes to a head between established options (primarily '. / ? / $' as a prefix) - see quotes below from #92:
In short, the idea is to use the scope of a match block to define upfront variables which can be used for extraction purposes. Here are some examples that span across @dmoisset's post and my own, where 'x' and 'y' are extraction variables and the body of case statements is omitted:
The nomenclature/syntax is certainly open to debate. Some alternatives to kick off with:
@dmoisset's post (proposal 1.B): https://mail.python.org/archives/list/python-dev@python.org/message/43YOZUKP3GJ66Z2V2NKSJARL4CGKISEH/ Some (somewhat biased) pros and cons compared to special prefix characters... PROS;
CONS:
P.S. English naturally lends itself to duplicitous terminology, as seen by the many different terms already used to refer to what the PEP calls a 'Name Pattern': [extraction/capture] variable, assignment [target/cue], placeholder. I'd like to get thoughts on whether some clarity is helpful in these scenarios, or if it's better to let language free range? (I am guilty of this with the 'placeholder' term.) |
I want to bring up one subtle point that I think has been overlooked: If we do define a prefix operator for named pattern variables, this does not affect the walrus operator. In other words, you can still use This means that |
I created an experimental branch https://github.com/gvanrossum/patma/blob/expr-qmark/examples/expr.py showing what my PRO: It is very obvious which names are pattern bindings (stores) and which are loads. |
Another variant to differentiate variable binding (store) from comparing to the values of a (global) variable that I have not seen discussed elsewhere is: example 1x = 'hello world'
# ...
match example:
case global x: print('how boring...') # load a global variable
case x: print(f'How ingenious: {x}') # store x example 2def check_example(example):
default = 'hello world'
match example:
case nonlocal default: print('how boring...') # load a variable from the outer scope
case x: print(f'How ingenious: {x}') # store x I personally find this much clearer in intend than
|
@SaschaSchlemmer This is clean but a bit verbose, especially when the patterns are nested. Also, in your proposal way, I wonder if we're supposed to mark it
|
match example:
case Point(global x, global x): ...
....
match log_level:
case global DEBUG: ....
case global WARN: ...
case global INFO: ... I must admit I am fully in the FP camp where I will use |
You hit the nail on the head here. The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them. And now this has caused a dilemma. |
@gvanrossum Semantically I think |
! and ? Are about equally ugly. |
They're both ugly but |
Given typical usage I still want to use unadorned names for capture variables. Maybe we can introduce some notation that allows any expression to be used as a value? What about ‘+x‘ or ‘+(x+y)‘ ? |
Yes, More generally, however, I would certainly welcome an operator to evaluate expressions, rather than a store marker. As I indicated before, there are other languages that actually use Using @ambientnuance We actually considered and briefly discussed the idea of explicitly listing the names of either the loads or stores, as evidenced here. Actually, this variant reminds me a lot of Pascal, where you had to declare all local variables upfront and it therefore feels very backward to me. But my main concern is that it does not scale well. When patterns get larger and more complex, it gets difficult to keep track of which names have load and which have store semantics, respectively. While the compiler could certainly handle it, it becomes hard for us humans to read code when the actual meaning of a symbol can no longer be detemined by local cues. I would therefore very much prefer a solution that determines a name's semantics locally. |
Rightio, thanks for your follow up @Tobias-Kohn. A good self-reminder to search through all GitHub issues, not just recent ones. Could this be added to the ‘Rejected Ideas’ in the PEP (presumably under ‘Alternatives for constant value pattern’)? I can understand if this may be deferred, given the active discussion of some options currently in that section. Regarding poor scaling in large match blocks (or indeed as you note, complex patterns), this is definitely the weak spot for a ‘declarative’ approach. There is certainly a strong appeal in being locally unambiguous. Nonetheless, this hadn’t weighed heavily in my mind, as I had mentally assigned the task of distinguishing store and local variables to a syntax highlighter - with unique names enforced. My personal choice would be to highlight store variables in some manner, since they are distinct from all other variables. But, I was reminded today that many people prefer to have a muted theme, and also learnt that some choose to forgo syntax modification altogether. |
In any case, @aeros made a worthwhile comment in the dev mailing list that I think is relevant here. They emphasised the importance of easy readability for a special character modifier, particularly for those with any visual impairments (size being the dominant factor). They also made a good counter-argument to my syntax highlighter crutch, albeit directed at the use of smaller special characters such as '.' :
Their full comment: Thanks for your time amidst what seems to be a hot topic. |
The way I would characterize the current dilemma is that explicitly marking stores has a number of compelling advantages - such as avoiding the "foot gun" problem; the major sticking point is aesthetics. |
Maybe taking a step back and seeing whether there is an actual need to support constant value patterns (like
if log_level == DEBUG:
# do this
elif log_level == WARN:
# do that
else:
# do whatever
match log_level:
case logging.DEBUG: ...
case logging.WARN: ...
case _: ...
|
@SaschaSchlemmer If the construct is intended for FP-like structure checks as a first-class use-case, then one way of making that clear might be going back to the ‘as’ or ‘case’ discussion. The former makes it much harder to confuse with switch/case usage while still coherent: “match an object as having this pattern”. Happily marrying the two worlds seems like something worthwhile for the significant expansion in functionality. I’ve mentioned this in another thread already, but I think @brandtbucher’s update to expr.py in #105 provides a low-friction avenue to do so. EDIT: To be clear, you do make a strong argument. The distinction between pattern matching and control flow is advantageous. |
doesn't read to bad and leaves the unadorned form for capturing variables. I think though that arbitrary expressions (like one caveat: |
@SaschaSchlemmer that has the same problem I identified above: it's very easy to just write |
That case can be made for any solution, regardless of the actual syntax. Mixing load and store semantics is maybe just not a good idea for these cases where the meaning is ambiguous. This (perceived) gain in functionality might just not be worth the potential for bugs and confusing users. |
I do think that ? is clearer in the case where you are have a store on an attribute lookup in a match case. I.E. @dataclass
class Record:
a: int
b: int
c: int
r = Record(1, 2, 3)
match r:
case Record(b=var?):
print(var) Without it someone might not expect a store, it would look like a keyword arg using a variable, when really it is just matching to next sub pattern. |
@SaschaSchlemmer Im not sure to which part you are giving the thumbs down to, you prefer the spelling in the current proposal and implementation? I personally find it more surprising on first read, we don't normally expect an assignment to happen on the right side of an equals. @dataclass
class Record:
a: int
b: int
c: int
r = Record(1, 2, 3)
match r:
case Record(b=var):
print(var) Or are you saying this sort of thing should be taken out of the current proposal? |
I find it unacceptable that load vs. store applies to the whole clause. Also when I first read Sascha’s first example I didn’t understand it because I didn’t notice the ‘as’. Maybe we could debate the UPPERCASE rule, and decide first two preferences. In Scala the UPPER rule seems to work well. Does Rust have it? |
@Tobias-Kohn to be honest I have not seen a convincing example where I'd prefer load semantics (except for literal values) over using store semantics and an appropriate guard. |
I still dislike the uppercase rule, because the language is now enforcing a convention (and one that may not always be "correct" in this context). It also only enforces it in this very narrow use case. Both of these points make it feel more "bolted-on" than the other options. There are also cases that aren't obvious to me:
It's also worth considering how easy it is to correct unintentional stores when they're found. I've recently added a syntax warning for some trivial cases (prompted by a recent mailing list discussion, and not pushed yet): >>> match 42:
... case foo: pass
... case bar: pass
... case _: pass
...
<stdin>:2: SyntaxWarning: unguarded name capture pattern makes remaining cases unreachable; did you forget a leading dot? This simple action of adding a
I'll need to think about this more. Right now I pretty strongly prefer "PEP" and "purist", and pretty strongly dislike "pragmatic" and "compromise" ("pragmatic" and "purist" are pretty loaded names when discussing Python language design, by the way... 😉). Strong strong strong dislike of |
Either way, I think it's important to constantly emphasize (especially when discussing name patterns) that we are creating this feature specifically for destructuring, not switching. That should help reduce pushback from people who want to adorn stores rather than loads (or feel that rules like "purist" aren't powerful enough). |
@gvanrossum I don't believe it does. To my knowledge in rust you either must use a match guard, or their binding operator in cases like these (I am not a rust expert though). The binding operator does a loads, compares, then stores, an example can be found here (it shows matching a pattern, but it may be a variable as well in a limited sense). In python (using their same at symbol) that might be spelled number_of_doors=4
match random_new_car()
case Car(color=color, doors=doors@number_of_doors):
print(f"This is a {color} car guaranteed to have {doors} doors") where I the variable is stored in doors, and I guess could be left as Edit: |
Big +1 from me for @brandtbucher pointing out the difficulties with the uppercase rule! I hadn't though of that, but I think these two issues (leading underscores and non-latin names) are quite valid. Of course, a firm rule will answer these questions, but it shows nonetheless that it might not be quite as straight-forward as at least I had thought. I also like the SyntaxWarning! Very nice indeed! While I am certainly not too eager to go for the backticks rule, I am not entirely sure whether the use of the language in markdown can be a strong concern. After all, it would be intended as a rather 'obscure' feature to be used sparingly. |
I knew you would like that.
Alright, you're in charge of writing the RST docs if we go this route. 😉 |
Just to double check, is there anyone here that is still against default bind(store) semantics and prefers evaluate(load) ? I know I mentioned some misgivings at some point but I'm generally onboard with binding by default (I'm asking because of brandt's comment about «help reduce pushback from people who want to adorn stores rather than loads ») |
The uppercase rule build on the convention and idea that constants are written in uppercase letters. In Scala (where the uppercase rule is applied), there is also the Java convention of writing all classes with an uppercase letter. This means that, e.g., the load semantics of In Python, we face several difficulties with this rule:
In favour of the uppercase rule, we find that it is quite simple, and solves the load/store problem without additional syntactic clutter. It thus has the potential to be a viable compromise between the two groups. On the other hand, having load and compare semantics for dotted names seems to cover enough cases as far as I am concerned. |
We seem to have agreement that dot-in-middle ( Which leaves the choice: Do we use some form of the UPPERCASE rule or not? Let's have a vote among the authors. If it's accepted, I'd solve one minor issue by ruling that |
I vote no uppercase. I still like the leading dot, actually... though I recognize that adding back it later is painless. |
I also vote no uppercase. However: as I understand the unicode standard seems to have an "UPPERCASE" flag for each character that specifies whether it is uppercase or not. Ignoring leading underscores also seems reasonable enough. But since my reservations are primarily on other aspects than whether we can determine if something is uppercase or not, I am still in favour of not implementing this rule. |
Note that even with the purist approach, there are ways to match an unqualified names, using either guards or custom matchers. I recognize that using it this way is quite ugly and verbose - it might make sense if you had a match statement with a bunch of qualified names, and needed one special case for an unqualified name - but if you had a bunch of unqualified names the burden would be great enough as to force most people not to use a match statement at all. |
@viridia Could you vote? If you're against UPPER it's decided. If you're for we'll have to ask Ivan. Regarding uppercase for non-Latin alphabets, Unicode has several letter categories, and "Lo" (Letter, other) is neither lowercase nor uppercase -- and there are 127,004 of those. I'm not sure but it looks like that includes (almost) all CJK "letters". If we're still interested after the vote I can ask around. |
I am -0.5 on using uppercase. One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues. Alternatively, the rule could be "Conforms to the Python code formatting standard for an enumeration constant", in which cause I would be +0. |
That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store. |
Agreed. There are many available disambiguating characters available. I'm
personally in favor of ^ as a prefix, which would cause no ambiguity.
…On Thu, Jul 2, 2020, 7:05 PM Daniel F Moisset ***@***.***> wrote:
One other variant I would propose is, rather than "Starts with a capital
letter", instead "Contains no lowercase letters". I am not sure if that is
better or whether it addresses the unicode issues.
That won't work; that would mean that variable names with no latin letters
(which may be *all* variables for code written in some non-English
languages) would default to load semantics rather than store.
|
We could keep the door ajar for some variant of the uppercase rule by stipulating that capture variable names shouldn't start with a capital letter (after stripping leading underscores). This shouldn't affect users whose alphabets has no case distinction (though a future decision to use a leading uppercase letter to mark dot-free loads would). It also shouldn't affect any serious use of match/case -- PEP 8 is quite clear that locals should use all-lowercase, and while I've seen plenty of code that violates the recommendation of using UPPERCASE for named constants or CapWords for class names, I don't think I've seen much code using anything but lowercase for local variables except on a whim. I doubt that anyone would even notice if we snuck this into the implementation without telling anybody. :-) |
Glad to see the authors voting for no uppercase. Many including me believe using uppercases this way is bad for Python.
Of course. It seems that "no uppercase" will be adopted, and maybe no need to consider alphabets without lowercase/uppercase concepts. In case you still some information about CJK languages, usually a CJK language user will expect all characters to be either uppercase or lowercase(because this is a routine way). However, instead of using cases, CJK language users might prefer using |
Looks like we're going with just the dot-in-the-middle rule ("purist"). People can create a dummy class and put values in there:
|
I imagine most people will not use the dummy class and instead go for:
And it's what I've seen in Haskell and Rust (see for example Listing 18-27 here) |
This has been implemented. Still needs PEP though. |
I'm not so sure. That idiom looks backwards: We want to compare a value, but instead we extract it and then add a guard -- but for the human reader, a guard is much more expensive to understand, because there are many other things you could test for in a guard. Plus now the reader is wondering, is Also it would be repetitive if several case clauses need it. |
With the adoption of the dot-in-the-middle rule, it seems that the use of "." as a sigil, if one is ever needed in the future, could have a little extra weight to it by analogy. Surprisingly I haven't seen* it explicitly stated or discussed anywhere, although it seems likely it was the intention, that the two combined rules would have compared nicely with the traversal rules of relative imports (and the current working directory concept on the Linux file system). Essentially, in this context, if I understand correctly, as foo.bar is matching the constant bar loaded from the foo namespace, .bar could be said to be matching the constant loaded from the current one. Realizing this while reflecting on the PEP updates fulfilled a missing motivator that took the original PEP on constant patterns from arbitrary to making more sense to me personally, so I don't know if others readers might have missed this connection as well, not that other reasons for not maintaining it in the PEP don't exist, of course. Apologies if this is obvious and/or unwanted noise. *searching this repo, python-dev, python-ideas, the PEP versions |
A bunch of folks on python-dev brought up that it's confusing to have
case foo:
mean "assign to foo" and havecase .foo:
to alter the meaning to "compare to the value of foo".I think we're going to need another round of this discussion.
The text was updated successfully, but these errors were encountered: