-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make match patterns explicit in the AST #88058
Comments
In the SC submission ticket for PEP-642 (python/steering-council#43 ), Guido indicated he was in favour of the more explicit pattern matching AST changes suggested in that PEP. This ticket covers adapting the pattern matching implementation to explicitly separate pattern nodes from expr nodes in the AST, so the code generator doesn't need to be context dependent. |
Given that the AST is a public interface, I would advise to implement this before beta freeze of 3.10 to avoid making this a breaking change in the future for linters and the like |
Agreed. I had wanted the AST to be part of the PEPs specifically *because* it's a public API, but didn't check until today whether or not the original PEP-634 implementation had been merged as-is, or with a cleaned up AST definition. https://github.com/ncoghlan/cpython/pull/8/files is the initial implementation with the AST and Grammar file updated. I'll only create the CPython PR once the tests are passing, but I'll try to give Brandt at least a week to review it (I'll also ping him directly via email). |
It's specifically the definition of "match_case" in the AST that is affected: match_case = (pattern pattern, expr? guard, stmt* body)
pattern = MatchAlways
| MatchValue(expr value)
| MatchConstant(constant value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
Relative to the PEP-642 AST, the notion of a "matchop" is gone - MatchValue always compares by equality, and there's a MatchConstant node for the identity comparisons against None, True, and False. The grammar, code generator, AST validator, and unparser are then updated to produce or consume the new AST nodes. |
Hm, for some reason I was thinking that this was safe to do after the feature freeze. Let's get to it then! Batuhan, perhaps we should change the linked issue for your AST validator PR to this one. That way we can close the old catch-all PEP-634 implementation issue and keep the discussion focused here. Nick, you might find the discussion on this mypy PR (python/mypy#10191) helpful. In particular, Adrian raises some points about ways we could make type inference in the AST a bit neater. For example: not all patterns can be used for mapping keys. Perhaps there is a way to organize parts of the AST hierarchy that makes this a bit more concrete (though we should be careful to not limit reasonable syntax extensions in the future by doing so). A few notes, just from skimming the outline here:
MatchValue technically contains an expression node, although the actual expressions it can contain are quite limited (dotted names and negative/complex numbers, if I understand correctly). Can we gain anything by splitting this up into nodes for each of these (MatchValue(identifier* parts), etc...) instead?
If I remember correctly (this part was implemented a while ago), collecting the different positional and keyword arguments in the parser is sort of simple since we can pass around sequences of keyword nodes easily. I *think* that the new proposed design would require hacks like creating dummy MatchClass nodes with *only* the keyword parts and later combining them with the positional arguments and a class name, since it keeps the keyword names separate from their arguments. Maybe some of the parser folks on this thread can chime in on that. Also, continuing the theme above, I think we can be more specific with "expr cls" here (maybe "identifier* cls").
In the interest of brevity, maybe "MatchStar" or something?
I think we could just give MatchMapping a "identifier? rest" member, right? That way it becomes a bit easier to use, and we don't need to worry if it's last or not. It also keeps us from having to handle empty entries in the keys. Oh, also, the ast unparser will need to be updated as well. |
Also, I would like some more extensive argument or why not having context dependent nodes justifies the extra maintainance cost (I'm not against that of course, and I am sympathetic, but I think this needs some more extensive coverage or why we are doing this). Notice that we already have context dependent nodes and attributes, as well as implicit meaning of some values like None. |
I think it might be better as a separate issue that has a dependency on this one. I've created bpo-43897 for this. |
https://www.python.org/dev/peps/pep-0642/#representing-patterns-explicitly-in-the-abstract-syntax-tree covers the rationale for why it is potentially problematic to reuse expression nodes for match patterns: it doesn't semantically align with the fact that patterns are *not* expressions, so the AST ends up being closer to a concrete syntax tree than we would like. For the specifics of the AST nodes:
|
The draft PR moved because I messed up the previous branch name: https://github.com/ncoghlan/cpython/pull/9/files It's to the point where everything except symtable.c compiles. I put an initial skeleton of a validator in, but only enough to check that the parallel sequence lengths in the class and mapping pattern nodes are consistent - there's still plenty to be done in Batuhan's PR (e.g. because the AST constant folding now just delegates to the expression folding functions for MatchValue values and MatchMapping keys, this iteration only enforces the restrictions on permitted subexpressions in the surface syntax). In addition to Brandt's MatchStar suggestion, I've also tweaked the MatchClassparameter names to better match PEP-634 (the old names were based on PEP-642's attribute pattern concept, so "extra_" made sense, but "kwd_" makes more sense for PEP-634): pattern = MatchAlways
| MatchValue(expr value)
| MatchConstant(constant value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)
I think the idea of making the MatchMapping node signature "MatchMapping(expr* keys, pattern* patterns, identifier? rest)" has merit, but I'd like to get this initial iteration reviewed first to minimise the diff in the compiler_pattern_mapping implementation (right now it is fairly clear that the code generation for mapping patterns hasn't changed, but that will become significantly less obvious if/when the "**rest" handling changes away from working the same way _PyAST_Dict works) |
+1 on MatchStar(). Much more similiar to the existing node names, and also less semantically-named.
I'm not really sure whether we should continue following this old tradition of int? end_* attributes for new sum types like pattern, considering that it was done for supporting the creation of old nodes 3.8< without any change on 3.8>=. pattern subclasses will only be created on 3.10>= so we should be safe by requiring end_* attributes. |
Yeah, it will be a requirement very soon actually. |
I've removed the "?" from the end attributes for pattern nodes. The draft PR branch now compiles, but I've clearly made a mistake somewhere as it segfaults when trying to compile test_patma.py. |
My proposed MatchConstant node name was bothering me, as most constants are actually matched by equality in MatchValue, the same way variables are. So I'm switching my proposed name for that node to be MatchSingleton, since that better reflects the key difference with MatchValue (i.e. comparing by identity rather than by value, and specifically comparing with the builtin singletons). |
Segfaults are fixed (I had a couple of cases where I wasn't checking for NULL, and another where I was looking up MatchMapping attributes on a MatchClass node). test_patma passes except for test cases 240, 241, and 242, which look like genuine regressions - the logic to reject illegal MatchValue nodes is missing from the code generator side, and these particular values make it through the parser OK. I'll need to add back some of the code I took out so these still get rejected. |
To get the "0+0" matching case rejected again I had to flesh out a bit more of the AST validator. This initial piece of the validator replaces the previously pattern-matching-specific AST optimisation code that refused to fold BinOp expressions that weren't complex numbers, allowing the compiler to reject them. I also added two new tests (240b and 241b) to ensure that 0+0 & f"" were also rejected as mapping keys. 241b passes with the current PR, as f"" gets rejected by both the check in the AST validator *and* by the "constant or attribute lookup" restriction in the compiler, so the latter check covers for the leaky validator. 240b is disabled for now, as it won't pass until the AST validator checks all the subexpressions being used as keys in a mapping pattern (because it gets constant folded to "0", the check in the compiler accepts it). That leaves fixing the unparser tests as the key thing to do before I create the main PR. Before that, though, I'll look at potentially simplifying the MatchMapping code by changing the signature as Brandt suggested. |
ncoghlan@54940a3 implements Brandt's suggestion of disallowing NULL keys in MatchMapping and instead passing in an explicit "rest" identifier to capture the extra keys. This did require moving to a multi-branch pattern definition in the grammar, similar to class patterns, in order to enforce the separating comma when both ordinary key:pattern pairs and a double-star target are present. I think it's a nice ergonomic improvement on the AST node itself though, as it makes it trivial to check if a mapping patterns captures extra keys or not (vs the relatively non-obvious check to see if the last key is NULL/None). That just leaves updating the unparser to handle the new node types. Capturing the latest node definitions from the draft PR: pattern = MatchAlways
| MatchValue(expr value)
| MatchSingleton(constant value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns, identifier? rest)
| MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns)
|
Setting clear timing expectations, given beta is so close: I'm not expecting to have time to finish the unparser work over the weekend, but I do expect to have time to finish it on Monday evening my time (UTC+10). |
I can probably find time to take a pass at the unparser, if you want.
That's why I suggested "identifier*"! Each identifier in the list would be one part of the dotted name, so we wouldn't need to worry about having arbitrary expressions: Foo() -> MatchClass(["Foo"], [], [], []) Not a big deal, though. |
Interesting idea - I had missed that you were suggested "identifier*" for the cls node. It's simple enough to check for Name-or-Attribute in the AST validator though, and sticking with a standard expr_ty node means getting a lot of standard handling in the rest of the compiler and the unparser. My commitments today finished earlier than I expected, so I'm about to tackle the unparser now. |
Working on the unparser picked up an inconsistency I had previously missed: aside from named expressions, "target" attributes in the AST are generally full expression subnodes, rather than identifier attributes. Identifier attributes on nodes are typically called "name", including in the originally implemented MatchAs node. Accordingly, I've switched both MatchStar and MatchAs over to having "name" attributes instead of "target" attributes. |
PR against the main repo is now up: #25585 A couple of things in the unparser tripped me up (ast_unparse.c only handles annotation expressions, the way ast._Unparser.visit() is defined means NodeVisitor.generic_visit can't be used), so the PR also includes some clarifying comments for future editors. |
PR is green now, and I don't have any further changes I want to make, so Brandt, please feel free to merge if you're happy with the proposed AST nodes, and the way I've gone about implementing them. If you'd like some adjustments, it probably makes the most sense to just amend the PR directly - otherwise the 24-hour turnaround on comments due to the time zone difference will potentially give us trouble with the beta deadline. |
Sounds good. I'll probably be able to make time to review it today, or tomorrow at the latest. |
After the next docs build, the documentation for the explicit AST nodes should be available https://docs.python.org/dev/library/ast.html#pattern-matching |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: