Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Custom match protocol, revisited #121

Open
viridia opened this issue Jul 2, 2020 · 24 comments
Open

Custom match protocol, revisited #121

viridia opened this issue Jul 2, 2020 · 24 comments
Labels
postponed Deferred for now, possibly adopted in the future son-of-pepzilla Discussions for a near-term follow-on PEP

Comments

@viridia
Copy link
Collaborator

viridia commented Jul 2, 2020

I wanted to brainstorm a bit about ideas for a custom matching protocol that would go in a follow-up PEP (note the new issue label).

Some thoughts as to what I would like to see:

  • It should be more powerful / extensible than the current __match__ protocol.
  • It would be nice to support either parameterization (Parametrization #24), or match signatures (Defining a custom match protocol (__match__) #8) in such a way as to allow "interesting" match predicates to be created.
  • It shouldn't slow down matches for types that don't opt-in to the new protocol. (Although it may require extra match code to be generated, or extra constant data structures to be created by the compiler.)

Here's a straw-man proposal based on our earlier discussions:

Custom Matching Protocol

Constructor patterns now accept two parameter lists:

  • A set of pattern parameters, in square brackets.
  • A set of sub-patterns to match against properties of the object, in regular parens.

Either of these parameter lists can be omitted, but one of them must be present in order for the compiler to recognize the pattern as a constructor pattern. So the following formats are valid:

TypeName(a0, a1, a2)
TypeName[p1, p2]
TypeName[p1, p2](a0, a1, a2)

TypeName[p1, p2] is equivalent to TypeName[p1, p2]()
TypeName(a0, a1, a2) is equivalent to TypeName[](a0, a1, a2)

The pattern parameters (in square brackets) are values meant to modify the behavior of the type being matched against. (Another name for the two parameter lists might be pre-match and post-match, in that the first set is evaluated before the constructor pattern matcher executes, and the second set are evaluated afterward.)

Unlike earlier proposals where the pattern parameters were used as constructor parameters to construct a new pattern instance, in this proposal the pattern parameters are passed into __match__ directly:

def __match__(target, *pattern_args, **pattern_kwargs) {
    # Etc.
}

In other words, the pattern parameters are applied to the __match__ method just like a regular function call, the only difference being the presence of the initial parameter being the target object to be matched. This saves the cost of allocating a new class instance for every match. It also allows smart type checkers to ensure that the pattern parameters match the signature of the __match__ method.

Another example:

class InRange:
    @classmethod
    def __match__(cls, target, lo, hi):
        return target if isinstance(target, int) and target >= lo and target < hi else None

match x:
  case InRange[0, 6]:
    print("In the range [0, 6)!")

Alternative Proposal: Instead of having all the parameters together, we could make __match__ return a function which accepted the target object:

class InRange:
    @classmethod
    def __match__(cls, lo, hi):
        function test(target):
            return target if isinstance(target, int) and target >= lo and target < hi else None
        return test

Although this has a bit more overhead due to the need to allocate a closure object, it makes it much simpler for a type-checker to validate that the pattern params match the signature of the __match__ method, since it no longer has to deal with the extra parameter. The downside is that every class that implements __match__ has to pay the cost of creating the closure, regardless of whether they use pattern parameters or not.

Note that __match__ does not have any access to the set of sub-patterns. There are several reasons for this:

  • These sub-patterns are typically evaluated after the __match__ function has run.
  • These parameters don't exist as data, but rather as code generated by the compiler. There's no realistic way to introspect them unless we have the compiler build an entirely parallel data structure representing these sub-patterns.
  • If such a data structure did exist, it would be quite complex, and it would be easy to get a key error or some other exception by inspecting it carelessly.
  • It's not really clear what the __match__ method could do even if it had access to the set of sub-patterns; it can't really invoke them directly without help from the VM.
@viridia viridia added open Still under discussion son-of-pepzilla Discussions for a near-term follow-on PEP labels Jul 2, 2020
@viridia
Copy link
Collaborator Author

viridia commented Jul 2, 2020

Another example: a 2D spatial matcher.

@dataclass
class Point:
    x: int
    y: int

class InCircle:
    @staticmethod
    def __match__(target, center, radius):
        # Note use of nested match statement and guard
        match target:
            case Point(x, y) if (dist2 := (x - center.x)**2 + (y - center.y)**2) < radius**2:
                return (sqrt(dist2,) # Return distance from center as a tuple (or could be a proxy)
            case _:
                return None

match pt:
    case InCircle[chicago, 2.0](dist):
        print(f"{dist} miles from Chicago")
    case InCircle[new_york, 2.0](dist):
        print("{dist} miles from New York")

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 3, 2020

Before derailing this issue, would you be interested in collecting other alternatives here, or did you wrote this to discuss over the alternative you put in the description?

@viridia
Copy link
Collaborator Author

viridia commented Jul 3, 2020

A few more examples:

class EqualTo:
    """Matcher class that matches values by equality."""
    @staticmethod
    def __match__(target, expected):
        return target if target == expected else None

match x:
    # Value being tested can be an expression.
    case EqualTo[2 * 5]:
        print("x is equal to 2 * 5")
class RegExMatchGroupProxy:
    """A match proxy that permits extraction of the regex group value, start and end."""
    __match_arg__ = ['value', 'start', 'end']

    def __init__(self, match, group):
        self.match = match
        self.group = group

    @property
    def value(self):
        return self.match.group(self.group)

    @property
    def start(self):
        return self.match.start(self.group)

    @property
    def end(self):
        return self.match.end(self.group)

class RegExGroup:
    """Matcher class that tests for the existence of a group on a regex match,
        and allows extraction of properties of the group."""
    def ___match___(target, group_name):
        if isinstance(target, re.Match) and re.group(group_name):
            return RegExMatchGroupProxy(target, group_name)
        else:
            return None

const TOKEN_RE = re.compile("(?P<digits>\d+)|(?P<letters>)\a+')
match TOKEN_RE.match(input):
    case RegExGroup["letters"](value, start, end):
        print(f"found a sequence of letters: {value}")
    case RegExGroup["digits"](value, start, end):
        print(f"found a sequence of digits: {value}")

Note to self: the hacker / language syntax geek side of me really likes this.

@viridia
Copy link
Collaborator Author

viridia commented Jul 3, 2020

Before derailing this issue, would you be interested in collecting other alternatives here, or did you wrote this to discuss over the alternative you put in the description?

Sure, I don't want to exclude other ideas or suggestions. The only danger is that folks will pile on with creativity and things will get so noisy as to make a discussion difficult (which has happened before). But let's not worry about that until it becomes a problem.

@viridia
Copy link
Collaborator Author

viridia commented Jul 4, 2020

A few more notes on this. A custom matcher invocation has the following lifecycle:

  • pre-match parameter evaluation
  • invoke the __match__ method:
    • 'the test' - determine whether the match succeeds or fails.
    • preparing the match result
  • post-match, recursively applying the match result to sub-patterns.

Note that this lifecycle definition is independent of whatever syntax we choose to represent parameters. However, even fixing on this particular lifecycle eliminates a large number of alternate designs, some of which were discussed in earlier threads.

Putting aside syntax, the key features of this proposal are:

  • The ability to pass in positional and keyword parameters to the custom matcher, distinct from the list of sub-patterns.
  • The special dunder method for custom matching has an ordinary function signature which matches the shape of the pattern parameters - i.e. we don't just pass the whole parameter list in a single 'context' parameter. (The advantage of this is that custom matchers which don't support any additional parameters pay nothing for this capability.)
  • Sub-pattern evaluation is strictly ordered after the custom match method, and the match method has no way to access them.

It shares these characteristics with earlier proposals:

  • There is a single, special dunder method named __match__ which is an attribute on the type that handles custom matching. (It can either be a classmethod or a staticmethod.)
  • This special method returns either a match result, or None if the match failed.
  • The return result is either the original target object or a match proxy object.

@gvanrossum
Copy link
Owner

BTW did you review Ivan's original proposal for __match__ in #19? it's either in his draft linked from there or in the discussion there -- IIRC there was quite a bit of discussion, not in the least about who would need a customization protocol.

@viridia
Copy link
Collaborator Author

viridia commented Jul 4, 2020

I read it a while back, but forgot the details. I just re-read it.

Ivan's proposed match protocol is quite a bit different than what I outline here. The basic difference has to do with division of responsibility. In Ivan's doc, the custom match method is responsible for the entire match operation, including matching of sub-patterns. I avoided that design because, in our previous discussions, there was a concern that this put too much machinery into the custom match method instead of being in the VM. This meant that the implementer of a custom match method would have to deal with a lot of complexity, and also it might be slow.

In my proposal the custom match method is only responsible for the decision at one level of the nested hierarchy of patterns - a level representing a reference to a type name. This, if you have pattern A(B, C), the __match__ method for A is only responsible for deciding if A matches, and it knows nothing about B or C. This is similar to what we came up with for __match__ in the PEP (before we decided to defer it).

The main difference between what was in the PEP and this proposal is the ability to pass in parameters to the matcher - something that I believe Tobias wanted. Other than the ability to pass in parameters, it's not that different from what we had before. It's just a refinement of ideas that the rest of you came up with.

Without the ability to dynamically parameterize the match operation, the behavior of __match__ is always fixed, independent of context. Thus, you can make a custom matcher MatchTrue, or MatchFalse, because these are constants. But you can't have a matcher which decides at runtime whether it is going to match true or false.

A custom matcher, as defined in the earlier PEP, really has two jobs: to decide whether the match should succeed at all, and to determine the set of matchable attributes. However, that first step is so constrained that it might as well not be customized at all. By this I mean that the __match__ method has so little information to work with, it's hard to imagine any other behavior than the default, which is calling isinstance() on the target object. So the only wiggle room left for customization is in the second step, which is generating the match proxy. Part of the reason why the __match__ method was 'ditchable' IMHO is because the set of use cases allowed by this protocol is so narrow that it doesn't apply to very many real-world situations.

Adding additional parameters which can be specified at the match site opens up a huge space of possibilities - perhaps too much. This is what we need to discuss. What I have discovered is that it allows a large space of interesting use cases. However, the downside is that it may be too flexible - it allows clever users to implement strange behaviors that are highly un-match-like. Potentially, any if statement could be re-written as a custom matcher, and this is not necessarily a good thing.

Similarly, a class with a custom match method could take over logic that would otherwise have to be put in a guard condition. All of the examples I posted in this thread could have been done with guard conditions instead of a custom match. Whether or not that is a good thing depends on whether the match class makes sense as a unitary element, rather than having the guard condition be separately visible and explicit.

So even though this proposal offers much less flexibility and variability than Ivan's proposal, I imagine there are some who would judge this as having far too much of those qualities.

I'm OK if this idea is rejected, but I would like it to be discussed. I don't know if I mentioned this, but I don't have an agenda with respect to my participation in authoring this PEP - my primary agenda is to help other people with their agendas. Many of the arguments that I have made in the other threads are merely my attempt to clarify things that other people have said, to try and help get to a consensus.

@gvanrossum
Copy link
Owner

gvanrossum commented Jul 4, 2020

My issue with your latest proposal is that it is a bit heavy on syntax in the pattern, but especially heavy in what you have to do in the implementation. Look e.g. at the regex match example.

@gvanrossum
Copy link
Owner

gvanrossum commented Jul 4, 2020

(There’s also a bug in @viridia's RegexMatch example, __match_args__ should be in the RegExGroup class.)

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

As far as I can tell, we have the following reasons to desire customizing the matching process:

  1. We'd like to have matching conditions that are different of "obj is instance of class X". for example "this string is an email address".
  2. We'd like to extract things that aren't attributes. The polar coordinates of a complex, the user/server part of an email string, or a distance to a reference point as @viridia says above.
  3. We'd like to parametrize the matcher with values evaluated when describing the pattern.

I'll ignore item 3 for now, @viridia seems to have focused mostly on that and I think his ideas can be used even in alternate approaches.

I want to work out from the other 2 trying to get to a decent UX (i.e, the life of the person customizing the matcher should be easy, not just the life of the person writing the match statement).

With the current state of affairs after #115, item 1 is possible but cumbersome (overriding isinstance needs a metaclass, although a library helper could improve that). Item 2 can be workarounded sometimes adding a new property to a class we own. None of those workarounds will help us if we're matching strings, or objects built by 3rd party libraries, in that case, we're forced to write our new "constructor" and use that one in the pattern. That's the same that the examples above have been doing: to get an "email-like string" behaviour Tobias added a new Email class unrelated to string, same with Polar and complex, and @viridia's InCircle which is unrelated to Point2D.

What strikes me most about these classes is that they are never instantiated. They have a single method, which is a staticmethod. That looks like a convoluted way of writing a function.

I think the right way of doing this shouldn't be looking up a __match__ method in the pattern class. Instead, if it's a class we go for the current semantics, and if it's not, it should be a callable. That callable returns either "None" to indicate a failed match, or a "destructured version" of the object. This is very similar to the "partial" version of F# active patterns that @Tobias-Kohn pointed me at recently: a single function returning either None or the destructured object. I'll call this "a destructure function"

The previous version of "destructured" object was a proxy, which will be normally hard to build: you need to define a new class, forward all the methods to the original except the new ones, it's quite a lot of work. Or you can not make it behave like a proxy, but still you need to set up a lot of properties. Most of the proxies I saw in previous discussions here ended up being simpler python types (like a tuple) and then this relied on the "single argument match with self" magic, but once you wanted to write Polar so you can match a complex with Polar(abs=x, rho=math.PI/2) you are in for trouble. I propose that a "destructured version" is one of these (we don't need to support all of these, these are options in the design menu):

  • A tuple. if we do that, we can easily create custom matchers with positional-only args
  • A dict. Easier to build than an object with certain attributes, lends perfect to positional arg matching
  • Allow the destructure function to return either of the above, so these patterns would allow positional or keyword args (but not both) depending on how it was built.
  • Use dicts and rely on the fact that dicts are ordered. That would allow mixed positional and keyword args

Let me show how this "protocol" (which is no longer a protocol, because we no longer have special methods) would work in the previous examples (I'm not covering parametrization, I'll discuss that in a later comment):

The email matcher:

def Email(obj):
    if isinstance(obj, str) and str.count("@") == 1:
        return obj.split("@")

match user.address:
    case Email("postmaster"): ...  # Note that we use a single argument here
    case Email(user, "gmail.com"): ...

A polar matcher, supporting kwargs

def Polar(obj):
    if isinstance(obj, complex):
        rho, theta = cmath.polar(obj)
        return {"rho": rho, "theta": theta}

match c:
    case Polar(theta=t) if t < Math.pi/6: ... 

Let's say datetime implements __match_args__ and I would like instead to have the magic "extract self" semantics that used to be the default and isn't any longer. I can define:

def ExtractDate(obj):
    if isinstance(obj, datetime):
        return (obj,)

match e:
    case Event(start=ExtractDate(s), end=ExtractDate(e)):
        duration = e-s  # should be a timedelta

That makes the "match type and get object" functionality no longer magic and anybody can get it.

So, to summarize, this would be my proposed change wrt the current PEP semantics: when matching a pattern Foo(a1, a2, ..., k1=ka1, k2=ka2, ...) against object obj the semantics are:

  1. get Foo (same as now)
  2. check if Foo is a type; if that's the case do exactly what's being done now
  3. otherwise let destructure = exp(obj)
  4. if destructure is None, match fails
  5. match every i-th positional parameter with destructure[i]
  6. match ka1 with destructure[k1], ka2 with destructure[k2], etc.

@thautwarm
Copy link

@dmoisset
I'm glad to see you are taking this feature into consideration, however it seems that your proposed design breaks a feeling that "destructuring should look like an inverse of construction":

otherwise let destructure = exp(obj)

I remember one of you authors mentioned the principle and others agreed in some threads of this tracker.

@viridia
Copy link
Collaborator Author

viridia commented Jul 5, 2020

Thanks @dmoisset for a detailed and thoughtful response.

There are two issues I am concerned about.

I did actually consider using a callable instead of a type. However, in Python the line between type and callable is sometimes fuzzy, and I'm not certain whether it is always possible to distinguish between them. Perhaps someone with more experience in Python types can offer some guidance.

The second, more serious, problem is that the test for whether a name is a callable or a type can only be made at runtime, but unfortunately we need to know at compile time when compiling the match statement. The reason is that when compiling a pattern, the compiler transforms the symbols in parenthesis into a sequence of tests and branches ("does the tuple have length 3?", "does the first element match this pattern?" and so on). We can't decide after the fact that, no, we really want to evaluate these arguments like a normal function call, not without generating a lot of extra code.

Recall in the other thread how we talked about the fact that the syntax inside of a pattern describes an operation that is the inverse of what it would be outside of a pattern - deconstruction instead of construction. However, in order to pass in parameters to a custom matcher, we need to reverse this reversal - that is, we need to pass in those parameters as if we were calling an ordinary function. This leads to a syntactical paradox:

  • We've established as a design principle that class deconstructors should "look like" function calls.
  • Regular function calls should also look like function calls.

The problem is, they can't both look like function calls, otherwise we have no way to distinguish them, which means we can't mix them together in pattern context. (Well, that's not entirely true - we could distinguish them by position, such as Type(params)(subpatterns), but this means that in a lot of cases you end up with empty parens, i.e. Type(params)(), and it was already decided in another thread that this was ugly, so I avoided that choice.)

Thus, we need some syntactical hint that these are 'normal' function arguments that are evaluated in the normal way, rather than patterns to be matched. I thought of a couple of different approaches, such as having a single argument list with some separator symbol, e.g. (param, param @ pattern, pattern). Eventually I settled on using square brackets, based on a earlier common that @Tobias-Kohn made about an analogy with the use of square brackets to indicate generic type parameters (which is, admittedly, a stretch). A better reason is that parameterized matchers are rare and should visually stand out so that one doesn't easily confuse them with subpatterns.

That being said, I don't really care about the syntax, that is just bikeshedding. The main decision point, however, is that there is some syntactic hint that the compiler can use to tell whether an argument list represents a set of normal function call arguments or a list of subpatterns.

If we have some sort of syntactic hint, then sure, we can use either types or callables here.

Using a proxy object as a match result is more complex to code, as you mentioned. The main advantages are performance-related:

  • For the vast majority of matchers, you can simply return the target object without having to construct a new dictionary.
  • Dictionaries don't support lazy evaluation, so it would require all matchable properties - including expensive ones - to be eagerly evaluated.

That being said, I can't say whether performance considerations are important here, or how much. But this was discussed thoroughly in #8 .

There is one other benefit of using a proxy which hasn't been discussed yet, which is the ability of static type checkers to look at the return type of the custom match function and use it to infer the available properties for the match statement.

I also wanted to respond to @gvanrossum's comment about the length of the examples (I think your spell checker replaced the word 'regex' with 'regency' which confused me for a while). I deliberately wrote that example in a lengthy way to highlight some features of match proxies, specifically the use of lazily-evaluated properties: avoiding the calls to start() and end() unless specifically asked for. I could have written that example much more succinctly if I didn't care about that point. Specifically, I would probably just have returned the result as a namedtuple or something.

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

Hi @viridia, I think we're talking about orthogonal problems. I made a list about "requirements" of our customized matching and was addresing the first 2 items, and you're addressing the third, so when you say "decide after the fact that, no, we really want to evaluate these arguments like a normal function call, not without generating a lot of extra code." is about the problem 3, and I was never suggesting to solve that problem with this proposal, so I think that you may have misread my proposal or expected something different from it. So no, what I propose actually could be implemented without changing code generation on the compiler at all, but instead adding some extra logic to the MATCH_CLASS VM opcode.

For parametrizing, which implies evaluating expression and passing those as parameters to the matcher together with the matched object, I generally like your square brackets idea; I'd do it slightly differently (I'll post that later) but it's more an implementation detail. But note that the square bracket idea could work in your model (__match__ is a method looked up in classes that returns a proxy) or in mine (any non-type function is a matcher).

Replying at something else in your post:

However, in Python the line between type and callable is sometimes fuzzy

All you need should be https://docs.python.org/3/c-api/type.html#c.PyType_Check . And we're already calling it in python/cpython@master...brandtbucher:patma#diff-7f17c8d8448b7b6f90549035d2147a9fR979 , so
all my new code would go inside that if.

Regarding the performance, I've seen some concern about that, and my first idea was return another function instead of an object (again, this is NOT your alternative proposal from parametrized patterns, which also uses a high order function, because I haven't ever talked about parametrized patterns yet). Esentially the deconstruction could be lazy, a "lazy dictionary" is a function taking a string and returning a value, or an int if you want to support positional args). And for the simple cases you don't even need to write your own function, my Polar version could just return {"rho": rho, "theta": theta}.__getitem__. But after writing it, I noticed that this concern is somewhat theoretical and complicated the proposal so I went for the simpler one. We'll have to see what people actually need to decide is.

Another note is that my proposal as is could be made lazy, you can use a proxy object but instead of getting attributes you use __getitem__ on it. For example if you wanted a SquareMatrix pattern with a determinant kwarg that's expensive to compute, you could do in my current version:

def SquareMatrix(obj):
    class detproxy:
        def __getitem__(self, key):
            if key == "det": return numpy.linalg.det(obj)
            raise KeyError
    match obj:
        case numpy.array(shape=[w,h]) if w==h:
            return detproxy

match m:
    case SquareMatrix(det=0): ... # singular matrix
    case SquareMatrix(): ... # non-singular square matrix
    case np.array(shape=[_,_]): ... # rectangular matrix

This example is not as nice as the previous, but still as readable or more as other implementations of proxy objects I've seen here, and can work for the few cases where you don't want to do an expensive computation (computing a determinant) unless actually required. So for easy cases, the matcher code is easy, and for hard cases, the code is not that complicated.

Performance is tricky in any case, because (indenendantly of what protocol we're using) we may end up with code like this:

match m:
    case SquareMatrix(det=0): ... 
    case SquareMatrix(det=x): ... 

Which most likely will compute the determinant twice. I don't have a solution for that, and at this point of particularity, I'm happy to throw the towel and let the application programmer sort it out manually.

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

Now let me add my proposal for parametrization (which is more or less independent from how we do proxys or where do we put the deconstructor): I think square brackets are a cool solution. I would implement it differently than your proposal, putting most of the burden in library instead of the interpreter, like this:

  • we extend the syntax of class patterns to accept on the left: name ( "." NAME | "[" expression "]")+ (plus the normal parenthesized args/kwargs)
  • the semantics of that is evaluating whatever is on the left as if it was an expression (it's the same as now, but we now accept more expressions, like "Foo[3]" and "Bar[4].Baz[5][6]")
  • once the left side it's evaluated, the normal matching procedure proceeds

That's all you need on the interpreter side. Now, if you want to be able to write case InRange[0, 10](int()) what you need is essentially construct a python object InRange that makes InRange[0,10] evaluate to the matcher you want. You could have a library decorator that allowed you to do this:

from patma import parametrized_matcher

@parametrized_matcher("lo hi")
def InRange(lo, hi, target):
    if lo <= target < hi:
        return target

match x:
  case InRange[0, 6]:
    print("In the range [0, 6)!")

Note that in the end, the code written by the matcher author and the matcher user are essentially the same as your original proposal, just the change to the interpreter is much smaller.

@pablogsal
Copy link

pablogsal commented Jul 5, 2020

I problem that there is with this particular duality between Types/Callables is that the behaviour of the match is that is not obvious without proper context of what object are you matching, and this hurts readability and makes the logic go into the C++ territory of "you need to look at the implementation to know what the semantics will be. For instance", if I read:

match m:
    case SquareMatrix(det=0): ... # singular matrix
    case SquareMatrix(): ... # non-singular square matrix
    case np.array(shape=[_,_]): ... # rectangular matrix

without knowing if SquareMatrix is a type or a callable I don't know what will happen: Will this do the "reverse constructor" or will this call something over "m" as it is? The logic for functions also seems confusing for me because the semantics imply that the function SquareMatrix will receive the object "m" and then it will return something that will match against what is between parenthesis but this is in direct contradiction with what the syntax is implying: that the SquareMatrix will be called with something involving det=0. I think this also adds to the initial surprise of "something that seems like a call is not a call" of the reverse constructor case, being in this case "something that seems like a call may or may not be a call and you cannot know until runtime....and even if you know, is not the call you think it is".

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

As a side note to the semantics I've defined: it would allow you to do

def fun(param):
    g = globals()
    ctx = {"param": param}
    match color:
        g["BLACK"]: ...  # Take that, load semantics!
        ctx["param"]: ... # loading a local!

I found both slightly better than creating an ad-hoc class

@pablogsal
Copy link

pablogsal commented Jul 5, 2020

  • the semantics of that is evaluating whatever is on the left as if it was an expression (it's the same as now, but we now accept more expressions, like "Foo[3]" and "Bar[4].Baz[5][6]")

I think this will add more layers of confusion because this syntax collides with type.__getitem__ for typing. The more we overload existing syntax in the match grammar subtree, the more cognitive load you add to readers and given that these patterns can be recursive and that you also have Load/Store differences, the last thing you want is to overload grammar rules that have different meaning outside the match rules.

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

Regarding comments by @pablogsal and @thautwarm about this breaking "constructor symmetry": yes, it does. Many times during the discussion of this PEP there has been interest from the authors of making class Patterns do "other stuff" unrelated to class constructions (see #30 , #12, #8 some initial opposition to #115). I understand @viridia opened this to brainstorm around that idea, and I'm following along, but definitely there's no consensus that this is actually a desirable feature.

@dmoisset
Copy link
Collaborator

dmoisset commented Jul 5, 2020

I problem that there is with this particular duality between Types/Callables is that the behaviour of the match is that is not obvious without proper context of what object are you matching, and this hurts readability

Perhaps if I just followed regular conventions and my proposal that a custom matcher is a function, not a class, I could have written:

match m:
    case square_matrix(det=0): ... # singular matrix
    case square_matrix(): ... # non-singular square matrix
    case np.array(shape=[_,_]): ... # rectangular matrix

And that makes it more clear that this is a custom matcher. For simple patterns like my email example ago, you'd go to the definition and find it more or less connected to what happens. (Yes, you need to understand a bit about how the parts connect to each other, but that's the same of mostly every Python statement/protocol. No one was born knowing what with does, the important thing is that it should be learnable).

But again, this is all under the assumption that having this kind of this is a good idea (if you've checked #115 this is NOT going to be in PEP-622, so it's a more open discussion)

@pablogsal
Copy link

pablogsal commented Jul 5, 2020

I understand @viridia opened this to brainstorm around that idea, and I'm following along, but definitely there's no consensus that this is actually a desirable feature.

I agree and I certainly don't want to discourage brainstorming. My comments are centred on the design domain that involves the relationship between syntax and semantics, not on the "feature" side (making class patterns do "other stuff" is a gigantic domain for a new feature). Once you start to formalize the proposed semantics into syntax, the tension between the addition and the rest of the language starts to manifest and I think is useful to have it present even with brainstorming as this allows the discussion to focus.

And that makes it more clear that this is a custom matcher.

It really does not, because for what its worth square_matrix could be a class (you can't rely on the name to do the reasoning, as this certainly will fail in static analyzers that do not do type propagation).

but that's the same of mostly every Python statement/protocol.

The "effort" and the cognitive load is key here. Having syntactic coherence is not only "to look good", is the fundamental piece that reduces cognitive load. For instance, if a user understands the syntax and semantics for list comprehension, once it understands dict/set literals, jumping into set/dict comprehension is almost immediate (some people would even say "intuitive"). There is a great synergy into the semantics and the syntax. In your example with the with statement, the overlapping syntactic part does not contradict previous knowledge: what looks like a function call is a function call and what looks like expression is an expression. The new syntactic part, as x is even consistent with other targets in the language (such as the ones in for loops).

To highlight what I mean that can happen when there is dissonance between syntax and semantics, creating toons of cognitive load, consider for instance that this in C++ this is not ambiguous:

int main() {
    MyClass myclass(Object());
}

but is almost certain that a person that has learned the syntax and semantics of classes will think that is creating a MyClass object that requires a Object object in its constructor but is not. This syntactic clash was so bad that even another way to construct classes (using curly braces) needed to be introduced to reduce cognitive load (among many other things, not exclusively to solve this problem).

What I am trying to convey here is the importance to not create a "syntactic buble" in which users need to introduce a new element in the "mental stack" when they are reading match constructs.

But again, this is all under the assumption that having this kind of this is a good idea

Absolutely, I recognize that we are just brainstorming but I think having these things present is important in order to have more scoped discussions that could eventually be considered in the future without going back to previous design discussions, including the ones that happened outside the scope of this PEP.

@gvanrossum
Copy link
Owner

As I already told Talin in person, I'm going to have to ignore this issue (too much stuff going on already), and I don't think we should propose this as a "follow-on" PEP in the same release cycle as PEP 622. I think people should have had a chance to use match statements as defined by PEP 622 before we can seriously work on a customization protocol.

@viridia
Copy link
Collaborator Author

viridia commented Jul 6, 2020

Thanks @dmoisset , I believe I understand what you are saying now, and in general I have no problem with it.

However, I want to shift gears for a moment - I thought about creating a new thread for this, but decided that I didn't want to create further distractions - and discuss a very different approach to custom matching protocol (again, just brainstorming).

In formulating this proposal, I have been assuming that a desire for a more powerful customization mechanism than the __match__ method that was originally in the PEP and was subsequently removed. But that is not necessarily true - that just because you can do something, doesn't mean that you should. One of my concerns about the above proposal is that the space of possibilities that it opens up is vast, and there is some benefit to avoiding cleverness just for the sake of cleverness.

So what I want to discuss is a custom matching protocol that is less powerful than the previous __match__ proposal. One that only supports a tightly-defined set of use cases.

Earlier, I said that the purpose of the __match__ method was twofold:

  • To decide if the subject of the match has the correct type.
  • To define the set of matchable attributes on the result.

However, it's not clear that there's much value in customizing the first step without additional context information, and once you add that additional information you get a wild and zany world of possibilities. So what about defining a custom match protocol that only allows you to customize the second part? In other words, by the time we get to our custom matcher, the type test has already been performed, and the only thing you are doing is tweaking the set of matchable names.

In fact, this is exactly what __match_args__ does for positional args: it defines which properties can be matched by position, but takes no role in deciding whether the type of the subject passes or fails the match criteria.

One can imagine an analogous property, __match_kwargs__, which does exactly the same thing that __match_args__ does, except for properties that can be accessed by name rather than by position. Obviously, this would be a mapped type such as a set or dict, but what would this actually do?

It seems to me that there are two main uses cases:

  • Redirecting a named property to some other named property that already exists on the subject.
  • Synthesizing a new property which doesn't exist on the subject.

For the first case, the representation seems rather obvious: __match_kwargs__ is a dict whose keys are matchable property names, and whose values are the names of attributes on the subject that those keys map to.

The second case could be handled by allowing the values of the map to be functions. In other words, for each named attribute we are matching, we lookup the name in __match_kwargs__; if it's a string, we call getitem() on the named property, and if it's a function, we call that function with the subject as the only parameter.

One critique of using a static data structure like this is that you might want to return a different set of matchable properties based on the type of the subject. I don't know of any actual use cases for this, but the idea has been brought up. However, it seems to me that if this is indeed a problem, then __match_args__ already suffers from it; and one would expect that if you are matching against a known name X that X would define the set of properties you are interested in, and that set of properties would be relatively fixed and static. My proposal does allow the values of those properties to be customized based on the type of the subject, but it doesn't allow properties to be dynamically added or removed based on same.

@Tobias-Kohn
Copy link
Collaborator

Since I have been quite involved in the original discussion on custom __match__ protocols and parametrization, I would like to briefly add my take here.

Overall, I quite side with @viridia and even think that the current discussion is heading in a direction where it solves the wrong problem: we do not necessarily need something with the full power of expressions to replace constructor names or even callables in general. The actual problem is rather: how to we express shape and structure?

The idea of general callables, say, is also missing the main point of this. Just to stick to a very trivial example that I brought up before: the idea of specifying structure is that the same notation can go both ways as in, e.g.:

def multiply(x, y):
    match (x, y):
        case Polar(r1, phi1), Polar(r2, phi2):
            return Polar(r1 * r2, phi1 + phi2)

Anyway, as Guido already hinted at: this is probably not the best time and place for this brainstorming session and discussion. We have just decided to focus the PEP more on the essentials. Having a discussion on the possible shortcomings of our proposals and in what way we want to extend it at the next possible occassion might send out the wrong signal. It could make the PEP feel a bit like a Trojan horse...

@gvanrossum
Copy link
Owner

Another use case to consider (or reject) was written up by Yury: https://mail.python.org/archives/list/python-dev@python.org/message/SCJ6H6KAE2WNHIVOEQ7M5YCZMU4HCYN6/

@gvanrossum gvanrossum added postponed Deferred for now, possibly adopted in the future and removed open Still under discussion labels Oct 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
postponed Deferred for now, possibly adopted in the future son-of-pepzilla Discussions for a near-term follow-on PEP
Projects
None yet
Development

No branches or pull requests

6 participants