Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Accepting AND Patterns: AND patterns give us program composability #97

Closed
thautwarm opened this issue Jun 25, 2020 · 10 comments
Closed
Labels
rejected A rejected idea

Comments

@thautwarm
Copy link

This is a track of my communication with Guido(as his status registered busy, I don't mention him once again),
his attitude is more of a positive one, but adding AND patterns is considered as a "two-way door" thing.

I present some concerns here, to stress that AND patterns will be practical for many common use cases.

A Case Study

If I have a pattern 'Urlsplit' from one library implemented by 'urllib.parse.urlsplit',
and I have a pattern 'Urlopen` from another library implemented by 'urllib.request.urlopen'.

Assume that I want to match a URL, one case is:
it's hosted in "docs.python.org" but the response code is 404.

match url:
    case Urlsplit(netloc=n) and Urlopen(code=c)
        if c == 404 and n == "docs.python.org":
        ...

By providing AND patterns, we can allow users to reuse the existing patterns 'Urlsplit' and 'Urlopen' together.

Abstractly speaking, we can use a pattern to extract information instead of only destructuring data and getting components. In this situation, pattern matching is like seeing data under a point of view.

AND patterns allow us to gather many point of views, hence contribute to the code reuse.

@Tobias-Kohn
Copy link
Collaborator

I am afraid this is a very nice example of what we explicitly did not intent the pattern matching for. I would claim that this kind of code is much more clearly written along the lines of:

if Urlsplit(url) == "docs.python.org" and Urlopen(url) == 404:
    ...

Depending on what you want to express here, you might even use guards:

match url:
    case Urlsplit(netloc="docs.python.org") if Urlopen(url) == 404:
        ...

First, if you can come up with a good use case for an AND-pattern, I would be delighted to hear of it. However, in order to convince me, your example would probably need more than one case clause to begin with, or two different ways to de-struct the data. The point is: you need a use case where pattern matching is obviously the best choice to express it.

It is very natural to want to have AND-patterns, given that we already have OR-patterns. And some languages even introduces NOT-patterns to be complete. But this quickly ends up being a prime example of overengineering—adding a feature because it looks nice and because we can, rather than driven be actual demand or necessity.

@thautwarm
Copy link
Author

NOT-patterns considered harmful because it encourages a way to write extremely slow code.

if Urlsplit(url) == "docs.python.org" and Urlopen(url) == 404:
    ...

this does not suffice, because Urlsplit will not just return one section. I guess you might want to write something like this:

if urlsplit(url).netloc == "docs.python.org" and urlopen(url).code == 404:
  1. In above example, I have to write url for multiple times, the redundant chracters take a notable part in the if statement.

  2. If I import patterns Urlsplit and Urlopen, in practice I might not mix them with the function urlsplit and urlopen. This is also considered redundant, and makes me feel the code not structured and decoupled well.

  3. In wider scope, there're TOO MANY examples that AND patterns can be used. What if matching something with both sequence pattern ad mapping pattern? What if you want to match a macth result of a regex expression as well as something computed by the target string?

When you have a few alternative cases,
some of case clauses need to

  1. verify the target under separate perspectives.
  2. bind the computed values produced by those verifier functions/operations.
  3. use the bound values in the body.

AND latterns always help a lot to such a scenario.

@Tobias-Kohn
Copy link
Collaborator

In wider scope, there're TOO MANY examples that AND patterns can be used.

Well, then it should not be a problem for you to give two explicit examples, would it?

... because Urlsplit will not just return one section

Let's keep in mind that this is just a hypothetical function/class. Without exact specification, there is little point in arguing about it. But it highlights exactly my primary critique here: we are discussing some hypothetical scenario, for which I cannot see a real use case. Please provide something more tangible as a basis for discussion.

@thautwarm
Copy link
Author

To prevent trivial transformation from match/case to if,
one datum requiring 2 factors helps, where a factor is

  • a variable introduction, or
  • a verification of data

Such a template may build you some intuitions.

match a:
    case Tool1(o1, o2) && Tool2(o3, o4) && ...:
        # do stuff with variables bound from o1, o2, o3, o4
    ...

I'd give 3 examples below.

Example 1: Slightly more complex than the original one

match url:
    case Urlsplit(netloc="docs.python.org", path=Re("/dis/.*")) && ReadPage("English", text):
        # do some with text
    case Urlopen(code=200, header=header) && ... :
        ...

Note that the library authors already provide patterns like Urlsplit, Re and Status,
don't say you have such functions.

Even though you're so lucky that all of those authors implemented __call__s providing what __match__s provides.

if (split:=Urlsplit(url)
    and split.netloc == "docs.python.org"
    and Re(split.path) == "/dis/.*"
    and (read := ReadPage(url))
    and read[0] == "English"):

    text = read[1]

You might still prefer this, we should ask more in the community.

Example 2: Data Query

Think if the government wants to filter out specific people from the database.

  • Case 1: Query the databases of medical system and education system, get certain results and check,
  • Case 2: Query the databases of human resources system and ...
  • ...
match person_data:
    case MedicalSys(history=h, expense=e) && EducationSys(university=u, degree=d) 
    if some_check_for(h, e, u, d):
        print("this person is put into group 1")

    case HRSystem(employments=e, ...) ...:
        print("this person is put into group 2")

Example 3: High Level Applications of Machine Learning

match dataset:
    case LinearModel(result1, model_params) && TreeModel(result2, depth=depth) if depth < 20:
        # do some with 'model_params', 'shape' and 'feature_shape'
    case BayesModel(..., ) ... :
        ...            

@brandtbucher
Copy link
Collaborator

I'm sorry, but you've made me like this feature much less now.

We explicitly state in the opening paragraphs of the PEP that we are creating pattern matching specifically for expressive destructuring and handling of heterogeneous data. What you've shown is just obfuscated control-flow logic.

@thautwarm
Copy link
Author

That is still okay, after staying up late I found I got smarter..
AND pattern can be implemented by custom patterns. So it's totally okay to reject it.

@brandtbucher brandtbucher added the rejected A rejected idea label Jun 26, 2020
@Tobias-Kohn
Copy link
Collaborator

To briefly add to @brandtbucher's excellent reply: I think your examples show quite clearly that you are talking about things like database lookups and requests, and not about destructuring. At its core, pattern matching should ideally expose information already present "inside" the object, and it should not have side effects. We cannot (and don't want to) enforce either of these principle, of course, as there are valid use cases to violate one or the other, but they still apply as our guide line.

In your machine learning example, for instance, LinearModel, TreeModel and BayesModel should actually be thin data classes that merely represent the outcome of some previous computation. I.e., the first line should be something like match getModelFromData(dataset):. And then case LinearModel() and TreeModel(): really does not make sense anymore, as you are practically saying that the resulting object must have the structure of two different classes simultaneously—which is better expressed by introducing a LinearTreeModel() in the first place.

Yes, on the surface pattern matching shares a couple of ideas with machine learning, database queries, but also regular expressions. Indeed, pattern matching as such is an important theme of computer science. However, the syntactic support we are introducing here is really about matching the existing structure of objects, not discovering new structure in data (as is the case with regular expressions and machine learning).

@viridia
Copy link
Collaborator

viridia commented Jun 26, 2020

I want to mention that there is a hack which can be used to extend matching to any number of AND-like terms. I don't recommend this hack - it feels like an anti-pattern - but I wanted to mention it anyway.

Basically what you do is construct a custom tuple as the match expression:

match (x, x):
  case (SomeCondition(value1), OtherCondition(value2)):
    # etc...

This can be extended to do a simultaneous matching on any number of arbitrary terms. So if you really want to have multiple match patterns applied to the same value, you can do it - because there is an implicit AND joining all of the tuple elements (the same logic holds true for mappings and other composite data structures).

In fact, any arbitrary if-elif-elif-else chain can be converted into a match statement via the following pattern:

match (a, b, c, d, e):
  case [1, _, _, _, _]:
    # a == 1
  case [_, 2, _, _, _]:
    # b == 2
  case [_, _, 3, _, _]:
    # c == 3
  case [_, _, _, 4, _]:
    # d == 4
  case [_, _, _, _, 5]:
    # e == 5

This looks ugly, but it is actually less inefficient than you might think - because Brandt's reference implementation inlines all of the tests and ignores the wildcards, each case is ignoring all of the other sequence elements, other than the specific one being tested. There is still the overhead of constructing the tuple, however.

Now, obviously a real if-elif chain would be much cleaner and easier to understand.

Bonus points for anyone who can come up with a clever and memorable name for this anti-pattern :)

@thautwarm
Copy link
Author

@viridia

match x:
    And([P1(x), P2()]):

@Tobias-Kohn

I think your examples show quite clearly that you are talking about things like database lookups and requests, and not about destructuring

This related terms are Active Patterns/View Patterns, they regard extracting abstract/high level structures from a data as a variant of destructuring.

A datum has different structures under different point of views(called Projections), and separating all those destructuring into different packages make a package's usage clean, and contributes to modularization.

If we don't want to support this kind of destructuring, we might drop the __match__ protocol for this reason:

Making __match_args_ a map, mapping private attribute to a public property or attribute will suffice the most use cases of destructuring under a certain view point. Dropping __match__ will also make pattern matching faster, simpler, and "safer"(if you think the patterns I talked here are harmful).

If __match__ protocol not dropped, I don't think we can prevent people to implement active/view/AND patterns with the infrastructure introduced by PEP 622.

For people have concerned demands or feeling like to provide higher level interfaces, they will still try to implement those advanced patterns that're gaining popular rapidly in recent years.

for instance, LinearModel, TreeModel and BayesModel should actually be thin data classes

Again, if we would restrict things to data classes and builtin classes, no need for __match__.

We just cannot stop people if we provide such a modern and front-edge __match__ protocol.

Besides, predictions of traditional ML models are cheap, this will encourage high level ML users to use pattern matching.

And then case LinearModel() and TreeModel(): really does not make sense anymore, as you are practically saying that the resulting object must have the structure of two different classes simultaneously —which is better expressed by introducing a LinearTreeModel() in the first place.

Once you said this, you reject the program composability.

A user will not expect to implement specific and trivial ensemble models themselves, and the library providers would say "I've implemented everything orthogonally, what's the matter?".

@gvanrossum
Copy link
Owner

Taine, this comes across as gibberish. Please stop.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
rejected A rejected idea
Projects
None yet
Development

No branches or pull requests

5 participants