-
Notifications
You must be signed in to change notification settings - Fork 65
Custom match protocol, revisited #121
Comments
Another example: a 2D spatial matcher. @dataclass
class Point:
x: int
y: int
class InCircle:
@staticmethod
def __match__(target, center, radius):
# Note use of nested match statement and guard
match target:
case Point(x, y) if (dist2 := (x - center.x)**2 + (y - center.y)**2) < radius**2:
return (sqrt(dist2,) # Return distance from center as a tuple (or could be a proxy)
case _:
return None
match pt:
case InCircle[chicago, 2.0](dist):
print(f"{dist} miles from Chicago")
case InCircle[new_york, 2.0](dist):
print("{dist} miles from New York") |
Before derailing this issue, would you be interested in collecting other alternatives here, or did you wrote this to discuss over the alternative you put in the description? |
A few more examples: class EqualTo:
"""Matcher class that matches values by equality."""
@staticmethod
def __match__(target, expected):
return target if target == expected else None
match x:
# Value being tested can be an expression.
case EqualTo[2 * 5]:
print("x is equal to 2 * 5") class RegExMatchGroupProxy:
"""A match proxy that permits extraction of the regex group value, start and end."""
__match_arg__ = ['value', 'start', 'end']
def __init__(self, match, group):
self.match = match
self.group = group
@property
def value(self):
return self.match.group(self.group)
@property
def start(self):
return self.match.start(self.group)
@property
def end(self):
return self.match.end(self.group)
class RegExGroup:
"""Matcher class that tests for the existence of a group on a regex match,
and allows extraction of properties of the group."""
def ___match___(target, group_name):
if isinstance(target, re.Match) and re.group(group_name):
return RegExMatchGroupProxy(target, group_name)
else:
return None
const TOKEN_RE = re.compile("(?P<digits>\d+)|(?P<letters>)\a+')
match TOKEN_RE.match(input):
case RegExGroup["letters"](value, start, end):
print(f"found a sequence of letters: {value}")
case RegExGroup["digits"](value, start, end):
print(f"found a sequence of digits: {value}") Note to self: the hacker / language syntax geek side of me really likes this. |
Sure, I don't want to exclude other ideas or suggestions. The only danger is that folks will pile on with creativity and things will get so noisy as to make a discussion difficult (which has happened before). But let's not worry about that until it becomes a problem. |
A few more notes on this. A custom matcher invocation has the following lifecycle:
Note that this lifecycle definition is independent of whatever syntax we choose to represent parameters. However, even fixing on this particular lifecycle eliminates a large number of alternate designs, some of which were discussed in earlier threads. Putting aside syntax, the key features of this proposal are:
It shares these characteristics with earlier proposals:
|
BTW did you review Ivan's original proposal for |
I read it a while back, but forgot the details. I just re-read it. Ivan's proposed match protocol is quite a bit different than what I outline here. The basic difference has to do with division of responsibility. In Ivan's doc, the custom match method is responsible for the entire match operation, including matching of sub-patterns. I avoided that design because, in our previous discussions, there was a concern that this put too much machinery into the custom match method instead of being in the VM. This meant that the implementer of a custom match method would have to deal with a lot of complexity, and also it might be slow. In my proposal the custom match method is only responsible for the decision at one level of the nested hierarchy of patterns - a level representing a reference to a type name. This, if you have pattern A(B, C), the The main difference between what was in the PEP and this proposal is the ability to pass in parameters to the matcher - something that I believe Tobias wanted. Other than the ability to pass in parameters, it's not that different from what we had before. It's just a refinement of ideas that the rest of you came up with. Without the ability to dynamically parameterize the match operation, the behavior of A custom matcher, as defined in the earlier PEP, really has two jobs: to decide whether the match should succeed at all, and to determine the set of matchable attributes. However, that first step is so constrained that it might as well not be customized at all. By this I mean that the Adding additional parameters which can be specified at the match site opens up a huge space of possibilities - perhaps too much. This is what we need to discuss. What I have discovered is that it allows a large space of interesting use cases. However, the downside is that it may be too flexible - it allows clever users to implement strange behaviors that are highly un-match-like. Potentially, any Similarly, a class with a custom match method could take over logic that would otherwise have to be put in a guard condition. All of the examples I posted in this thread could have been done with guard conditions instead of a custom match. Whether or not that is a good thing depends on whether the match class makes sense as a unitary element, rather than having the guard condition be separately visible and explicit. So even though this proposal offers much less flexibility and variability than Ivan's proposal, I imagine there are some who would judge this as having far too much of those qualities. I'm OK if this idea is rejected, but I would like it to be discussed. I don't know if I mentioned this, but I don't have an agenda with respect to my participation in authoring this PEP - my primary agenda is to help other people with their agendas. Many of the arguments that I have made in the other threads are merely my attempt to clarify things that other people have said, to try and help get to a consensus. |
My issue with your latest proposal is that it is a bit heavy on syntax in the pattern, but especially heavy in what you have to do in the implementation. Look e.g. at the regex match example. |
(There’s also a bug in @viridia's RegexMatch example, |
As far as I can tell, we have the following reasons to desire customizing the matching process:
I'll ignore item 3 for now, @viridia seems to have focused mostly on that and I think his ideas can be used even in alternate approaches. I want to work out from the other 2 trying to get to a decent UX (i.e, the life of the person customizing the matcher should be easy, not just the life of the person writing the With the current state of affairs after #115, item 1 is possible but cumbersome (overriding isinstance needs a metaclass, although a library helper could improve that). Item 2 can be workarounded sometimes adding a new property to a class we own. None of those workarounds will help us if we're matching strings, or objects built by 3rd party libraries, in that case, we're forced to write our new "constructor" and use that one in the pattern. That's the same that the examples above have been doing: to get an "email-like string" behaviour Tobias added a new What strikes me most about these classes is that they are never instantiated. They have a single method, which is a staticmethod. That looks like a convoluted way of writing a function. I think the right way of doing this shouldn't be looking up a The previous version of "destructured" object was a proxy, which will be normally hard to build: you need to define a new class, forward all the methods to the original except the new ones, it's quite a lot of work. Or you can not make it behave like a proxy, but still you need to set up a lot of properties. Most of the proxies I saw in previous discussions here ended up being simpler python types (like a tuple) and then this relied on the "single argument match with self" magic, but once you wanted to write
Let me show how this "protocol" (which is no longer a protocol, because we no longer have special methods) would work in the previous examples (I'm not covering parametrization, I'll discuss that in a later comment): The email matcher: def Email(obj):
if isinstance(obj, str) and str.count("@") == 1:
return obj.split("@")
match user.address:
case Email("postmaster"): ... # Note that we use a single argument here
case Email(user, "gmail.com"): ... A polar matcher, supporting kwargs def Polar(obj):
if isinstance(obj, complex):
rho, theta = cmath.polar(obj)
return {"rho": rho, "theta": theta}
match c:
case Polar(theta=t) if t < Math.pi/6: ... Let's say def ExtractDate(obj):
if isinstance(obj, datetime):
return (obj,)
match e:
case Event(start=ExtractDate(s), end=ExtractDate(e)):
duration = e-s # should be a timedelta That makes the "match type and get object" functionality no longer magic and anybody can get it. So, to summarize, this would be my proposed change wrt the current PEP semantics: when matching a pattern
|
@dmoisset
I remember one of you authors mentioned the principle and others agreed in some threads of this tracker. |
Thanks @dmoisset for a detailed and thoughtful response. There are two issues I am concerned about. I did actually consider using a callable instead of a type. However, in Python the line between type and callable is sometimes fuzzy, and I'm not certain whether it is always possible to distinguish between them. Perhaps someone with more experience in Python types can offer some guidance. The second, more serious, problem is that the test for whether a name is a callable or a type can only be made at runtime, but unfortunately we need to know at compile time when compiling the match statement. The reason is that when compiling a pattern, the compiler transforms the symbols in parenthesis into a sequence of tests and branches ("does the tuple have length 3?", "does the first element match this pattern?" and so on). We can't decide after the fact that, no, we really want to evaluate these arguments like a normal function call, not without generating a lot of extra code. Recall in the other thread how we talked about the fact that the syntax inside of a pattern describes an operation that is the inverse of what it would be outside of a pattern - deconstruction instead of construction. However, in order to pass in parameters to a custom matcher, we need to reverse this reversal - that is, we need to pass in those parameters as if we were calling an ordinary function. This leads to a syntactical paradox:
The problem is, they can't both look like function calls, otherwise we have no way to distinguish them, which means we can't mix them together in pattern context. (Well, that's not entirely true - we could distinguish them by position, such as Thus, we need some syntactical hint that these are 'normal' function arguments that are evaluated in the normal way, rather than patterns to be matched. I thought of a couple of different approaches, such as having a single argument list with some separator symbol, e.g. That being said, I don't really care about the syntax, that is just bikeshedding. The main decision point, however, is that there is some syntactic hint that the compiler can use to tell whether an argument list represents a set of normal function call arguments or a list of subpatterns. If we have some sort of syntactic hint, then sure, we can use either types or callables here. Using a proxy object as a match result is more complex to code, as you mentioned. The main advantages are performance-related:
That being said, I can't say whether performance considerations are important here, or how much. But this was discussed thoroughly in #8 . There is one other benefit of using a proxy which hasn't been discussed yet, which is the ability of static type checkers to look at the return type of the custom match function and use it to infer the available properties for the match statement. I also wanted to respond to @gvanrossum's comment about the length of the examples (I think your spell checker replaced the word 'regex' with 'regency' which confused me for a while). I deliberately wrote that example in a lengthy way to highlight some features of match proxies, specifically the use of lazily-evaluated properties: avoiding the calls to start() and end() unless specifically asked for. I could have written that example much more succinctly if I didn't care about that point. Specifically, I would probably just have returned the result as a namedtuple or something. |
Hi @viridia, I think we're talking about orthogonal problems. I made a list about "requirements" of our customized matching and was addresing the first 2 items, and you're addressing the third, so when you say "decide after the fact that, no, we really want to evaluate these arguments like a normal function call, not without generating a lot of extra code." is about the problem 3, and I was never suggesting to solve that problem with this proposal, so I think that you may have misread my proposal or expected something different from it. So no, what I propose actually could be implemented without changing code generation on the compiler at all, but instead adding some extra logic to the For parametrizing, which implies evaluating expression and passing those as parameters to the matcher together with the matched object, I generally like your square brackets idea; I'd do it slightly differently (I'll post that later) but it's more an implementation detail. But note that the square bracket idea could work in your model ( Replying at something else in your post:
All you need should be https://docs.python.org/3/c-api/type.html#c.PyType_Check . And we're already calling it in python/cpython@master...brandtbucher:patma#diff-7f17c8d8448b7b6f90549035d2147a9fR979 , so Regarding the performance, I've seen some concern about that, and my first idea was return another function instead of an object (again, this is NOT your alternative proposal from parametrized patterns, which also uses a high order function, because I haven't ever talked about parametrized patterns yet). Esentially the deconstruction could be lazy, a "lazy dictionary" is a function taking a string and returning a value, or an int if you want to support positional args). And for the simple cases you don't even need to write your own function, my Another note is that my proposal as is could be made lazy, you can use a proxy object but instead of getting attributes you use def SquareMatrix(obj):
class detproxy:
def __getitem__(self, key):
if key == "det": return numpy.linalg.det(obj)
raise KeyError
match obj:
case numpy.array(shape=[w,h]) if w==h:
return detproxy
match m:
case SquareMatrix(det=0): ... # singular matrix
case SquareMatrix(): ... # non-singular square matrix
case np.array(shape=[_,_]): ... # rectangular matrix This example is not as nice as the previous, but still as readable or more as other implementations of proxy objects I've seen here, and can work for the few cases where you don't want to do an expensive computation (computing a determinant) unless actually required. So for easy cases, the matcher code is easy, and for hard cases, the code is not that complicated. Performance is tricky in any case, because (indenendantly of what protocol we're using) we may end up with code like this: match m:
case SquareMatrix(det=0): ...
case SquareMatrix(det=x): ... Which most likely will compute the determinant twice. I don't have a solution for that, and at this point of particularity, I'm happy to throw the towel and let the application programmer sort it out manually. |
Now let me add my proposal for parametrization (which is more or less independent from how we do proxys or where do we put the deconstructor): I think square brackets are a cool solution. I would implement it differently than your proposal, putting most of the burden in library instead of the interpreter, like this:
That's all you need on the interpreter side. Now, if you want to be able to write from patma import parametrized_matcher
@parametrized_matcher("lo hi")
def InRange(lo, hi, target):
if lo <= target < hi:
return target
match x:
case InRange[0, 6]:
print("In the range [0, 6)!") Note that in the end, the code written by the matcher author and the matcher user are essentially the same as your original proposal, just the change to the interpreter is much smaller. |
I problem that there is with this particular duality between Types/Callables is that the behaviour of the match is that is not obvious without proper context of what object are you matching, and this hurts readability and makes the logic go into the C++ territory of "you need to look at the implementation to know what the semantics will be. For instance", if I read: match m:
case SquareMatrix(det=0): ... # singular matrix
case SquareMatrix(): ... # non-singular square matrix
case np.array(shape=[_,_]): ... # rectangular matrix without knowing if |
As a side note to the semantics I've defined: it would allow you to do def fun(param):
g = globals()
ctx = {"param": param}
match color:
g["BLACK"]: ... # Take that, load semantics!
ctx["param"]: ... # loading a local! I found both slightly better than creating an ad-hoc class |
I think this will add more layers of confusion because this syntax collides with |
Regarding comments by @pablogsal and @thautwarm about this breaking "constructor symmetry": yes, it does. Many times during the discussion of this PEP there has been interest from the authors of making class Patterns do "other stuff" unrelated to class constructions (see #30 , #12, #8 some initial opposition to #115). I understand @viridia opened this to brainstorm around that idea, and I'm following along, but definitely there's no consensus that this is actually a desirable feature. |
Perhaps if I just followed regular conventions and my proposal that a custom matcher is a function, not a class, I could have written: match m:
case square_matrix(det=0): ... # singular matrix
case square_matrix(): ... # non-singular square matrix
case np.array(shape=[_,_]): ... # rectangular matrix And that makes it more clear that this is a custom matcher. For simple patterns like my But again, this is all under the assumption that having this kind of this is a good idea (if you've checked #115 this is NOT going to be in PEP-622, so it's a more open discussion) |
I agree and I certainly don't want to discourage brainstorming. My comments are centred on the design domain that involves the relationship between syntax and semantics, not on the "feature" side (making class patterns do "other stuff" is a gigantic domain for a new feature). Once you start to formalize the proposed semantics into syntax, the tension between the addition and the rest of the language starts to manifest and I think is useful to have it present even with brainstorming as this allows the discussion to focus.
It really does not, because for what its worth
The "effort" and the cognitive load is key here. Having syntactic coherence is not only "to look good", is the fundamental piece that reduces cognitive load. For instance, if a user understands the syntax and semantics for list comprehension, once it understands dict/set literals, jumping into set/dict comprehension is almost immediate (some people would even say "intuitive"). There is a great synergy into the semantics and the syntax. In your example with the To highlight what I mean that can happen when there is dissonance between syntax and semantics, creating toons of cognitive load, consider for instance that this in C++ this is not ambiguous: int main() {
MyClass myclass(Object());
} but is almost certain that a person that has learned the syntax and semantics of classes will think that is creating a What I am trying to convey here is the importance to not create a "syntactic buble" in which users need to introduce a new element in the "mental stack" when they are reading match constructs.
Absolutely, I recognize that we are just brainstorming but I think having these things present is important in order to have more scoped discussions that could eventually be considered in the future without going back to previous design discussions, including the ones that happened outside the scope of this PEP. |
As I already told Talin in person, I'm going to have to ignore this issue (too much stuff going on already), and I don't think we should propose this as a "follow-on" PEP in the same release cycle as PEP 622. I think people should have had a chance to use match statements as defined by PEP 622 before we can seriously work on a customization protocol. |
Thanks @dmoisset , I believe I understand what you are saying now, and in general I have no problem with it. However, I want to shift gears for a moment - I thought about creating a new thread for this, but decided that I didn't want to create further distractions - and discuss a very different approach to custom matching protocol (again, just brainstorming). In formulating this proposal, I have been assuming that a desire for a more powerful customization mechanism than the So what I want to discuss is a custom matching protocol that is less powerful than the previous Earlier, I said that the purpose of the
However, it's not clear that there's much value in customizing the first step without additional context information, and once you add that additional information you get a wild and zany world of possibilities. So what about defining a custom match protocol that only allows you to customize the second part? In other words, by the time we get to our custom matcher, the type test has already been performed, and the only thing you are doing is tweaking the set of matchable names. In fact, this is exactly what One can imagine an analogous property, It seems to me that there are two main uses cases:
For the first case, the representation seems rather obvious: The second case could be handled by allowing the values of the map to be functions. In other words, for each named attribute we are matching, we lookup the name in One critique of using a static data structure like this is that you might want to return a different set of matchable properties based on the type of the subject. I don't know of any actual use cases for this, but the idea has been brought up. However, it seems to me that if this is indeed a problem, then |
Since I have been quite involved in the original discussion on custom Overall, I quite side with @viridia and even think that the current discussion is heading in a direction where it solves the wrong problem: we do not necessarily need something with the full power of expressions to replace constructor names or even callables in general. The actual problem is rather: how to we express shape and structure? The idea of general callables, say, is also missing the main point of this. Just to stick to a very trivial example that I brought up before: the idea of specifying structure is that the same notation can go both ways as in, e.g.: def multiply(x, y):
match (x, y):
case Polar(r1, phi1), Polar(r2, phi2):
return Polar(r1 * r2, phi1 + phi2) Anyway, as Guido already hinted at: this is probably not the best time and place for this brainstorming session and discussion. We have just decided to focus the PEP more on the essentials. Having a discussion on the possible shortcomings of our proposals and in what way we want to extend it at the next possible occassion might send out the wrong signal. It could make the PEP feel a bit like a Trojan horse... |
Another use case to consider (or reject) was written up by Yury: https://mail.python.org/archives/list/python-dev@python.org/message/SCJ6H6KAE2WNHIVOEQ7M5YCZMU4HCYN6/ |
I wanted to brainstorm a bit about ideas for a custom matching protocol that would go in a follow-up PEP (note the new issue label).
Some thoughts as to what I would like to see:
__match__
protocol.Here's a straw-man proposal based on our earlier discussions:
Custom Matching Protocol
Constructor patterns now accept two parameter lists:
Either of these parameter lists can be omitted, but one of them must be present in order for the compiler to recognize the pattern as a constructor pattern. So the following formats are valid:
The pattern parameters (in square brackets) are values meant to modify the behavior of the type being matched against. (Another name for the two parameter lists might be pre-match and post-match, in that the first set is evaluated before the constructor pattern matcher executes, and the second set are evaluated afterward.)
Unlike earlier proposals where the pattern parameters were used as constructor parameters to construct a new pattern instance, in this proposal the pattern parameters are passed into
__match__
directly:In other words, the pattern parameters are applied to the
__match__
method just like a regular function call, the only difference being the presence of the initial parameter being the target object to be matched. This saves the cost of allocating a new class instance for every match. It also allows smart type checkers to ensure that the pattern parameters match the signature of the__match__
method.Another example:
Alternative Proposal: Instead of having all the parameters together, we could make
__match__
return a function which accepted the target object:Although this has a bit more overhead due to the need to allocate a closure object, it makes it much simpler for a type-checker to validate that the pattern params match the signature of the
__match__
method, since it no longer has to deal with the extra parameter. The downside is that every class that implements__match__
has to pay the cost of creating the closure, regardless of whether they use pattern parameters or not.Note that
__match__
does not have any access to the set of sub-patterns. There are several reasons for this:__match__
function has run.__match__
method could do even if it had access to the set of sub-patterns; it can't really invoke them directly without help from the VM.The text was updated successfully, but these errors were encountered: