Accepting AND Patterns: AND patterns give us program composability #97

thautwarm · 2020-06-25T03:17:20Z

This is a track of my communication with Guido(as his status registered busy, I don't mention him once again),
his attitude is more of a positive one, but adding AND patterns is considered as a "two-way door" thing.

I present some concerns here, to stress that AND patterns will be practical for many common use cases.

A Case Study

If I have a pattern 'Urlsplit' from one library implemented by 'urllib.parse.urlsplit',
and I have a pattern 'Urlopen` from another library implemented by 'urllib.request.urlopen'.

Assume that I want to match a URL, one case is:
it's hosted in "docs.python.org" but the response code is 404.

match url:
    case Urlsplit(netloc=n) and Urlopen(code=c)
        if c == 404 and n == "docs.python.org":
        ...

By providing AND patterns, we can allow users to reuse the existing patterns 'Urlsplit' and 'Urlopen' together.

Abstractly speaking, we can use a pattern to extract information instead of only destructuring data and getting components. In this situation, pattern matching is like seeing data under a point of view.

AND patterns allow us to gather many point of views, hence contribute to the code reuse.

The text was updated successfully, but these errors were encountered:

Tobias-Kohn · 2020-06-25T20:37:22Z

I am afraid this is a very nice example of what we explicitly did not intent the pattern matching for. I would claim that this kind of code is much more clearly written along the lines of:

if Urlsplit(url) == "docs.python.org" and Urlopen(url) == 404:
    ...

Depending on what you want to express here, you might even use guards:

match url:
    case Urlsplit(netloc="docs.python.org") if Urlopen(url) == 404:
        ...

First, if you can come up with a good use case for an AND-pattern, I would be delighted to hear of it. However, in order to convince me, your example would probably need more than one case clause to begin with, or two different ways to de-struct the data. The point is: you need a use case where pattern matching is obviously the best choice to express it.

It is very natural to want to have AND-patterns, given that we already have OR-patterns. And some languages even introduces NOT-patterns to be complete. But this quickly ends up being a prime example of overengineering—adding a feature because it looks nice and because we can, rather than driven be actual demand or necessity.

thautwarm · 2020-06-25T21:17:39Z

NOT-patterns considered harmful because it encourages a way to write extremely slow code.

if Urlsplit(url) == "docs.python.org" and Urlopen(url) == 404:
    ...

this does not suffice, because Urlsplit will not just return one section. I guess you might want to write something like this:

if urlsplit(url).netloc == "docs.python.org" and urlopen(url).code == 404:

In above example, I have to write url for multiple times, the redundant chracters take a notable part in the if statement.
If I import patterns Urlsplit and Urlopen, in practice I might not mix them with the function urlsplit and urlopen. This is also considered redundant, and makes me feel the code not structured and decoupled well.
In wider scope, there're TOO MANY examples that AND patterns can be used. What if matching something with both sequence pattern ad mapping pattern? What if you want to match a macth result of a regex expression as well as something computed by the target string?

When you have a few alternative cases,
some of case clauses need to

verify the target under separate perspectives.
bind the computed values produced by those verifier functions/operations.
use the bound values in the body.

AND latterns always help a lot to such a scenario.

Tobias-Kohn · 2020-06-25T21:28:16Z

In wider scope, there're TOO MANY examples that AND patterns can be used.

Well, then it should not be a problem for you to give two explicit examples, would it?

... because Urlsplit will not just return one section

Let's keep in mind that this is just a hypothetical function/class. Without exact specification, there is little point in arguing about it. But it highlights exactly my primary critique here: we are discussing some hypothetical scenario, for which I cannot see a real use case. Please provide something more tangible as a basis for discussion.

thautwarm · 2020-06-25T23:49:54Z

To prevent trivial transformation from match/case to if,
one datum requiring 2 factors helps, where a factor is

a variable introduction, or
a verification of data

Such a template may build you some intuitions.

match a:
    case Tool1(o1, o2) && Tool2(o3, o4) && ...:
        # do stuff with variables bound from o1, o2, o3, o4
    ...

I'd give 3 examples below.

Example 1: Slightly more complex than the original one

match url:
    case Urlsplit(netloc="docs.python.org", path=Re("/dis/.*")) && ReadPage("English", text):
        # do some with text
    case Urlopen(code=200, header=header) && ... :
        ...

Note that the library authors already provide patterns like Urlsplit, Re and Status,
don't say you have such functions.

Even though you're so lucky that all of those authors implemented __call__s providing what __match__s provides.

if (split:=Urlsplit(url)
    and split.netloc == "docs.python.org"
    and Re(split.path) == "/dis/.*"
    and (read := ReadPage(url))
    and read[0] == "English"):

    text = read[1]

You might still prefer this, we should ask more in the community.

Example 2: Data Query

Think if the government wants to filter out specific people from the database.

Case 1: Query the databases of medical system and education system, get certain results and check,
Case 2: Query the databases of human resources system and ...
...

match person_data:
    case MedicalSys(history=h, expense=e) && EducationSys(university=u, degree=d) 
    if some_check_for(h, e, u, d):
        print("this person is put into group 1")

    case HRSystem(employments=e, ...) ...:
        print("this person is put into group 2")

Example 3: High Level Applications of Machine Learning

match dataset:
    case LinearModel(result1, model_params) && TreeModel(result2, depth=depth) if depth < 20:
        # do some with 'model_params', 'shape' and 'feature_shape'
    case BayesModel(..., ) ... :
        ...

brandtbucher · 2020-06-26T00:01:55Z

I'm sorry, but you've made me like this feature much less now.

We explicitly state in the opening paragraphs of the PEP that we are creating pattern matching specifically for expressive destructuring and handling of heterogeneous data. What you've shown is just obfuscated control-flow logic.

thautwarm · 2020-06-26T00:04:34Z

That is still okay, after staying up late I found I got smarter..
AND pattern can be implemented by custom patterns. So it's totally okay to reject it.

Tobias-Kohn · 2020-06-26T07:30:56Z

To briefly add to @brandtbucher's excellent reply: I think your examples show quite clearly that you are talking about things like database lookups and requests, and not about destructuring. At its core, pattern matching should ideally expose information already present "inside" the object, and it should not have side effects. We cannot (and don't want to) enforce either of these principle, of course, as there are valid use cases to violate one or the other, but they still apply as our guide line.

In your machine learning example, for instance, LinearModel, TreeModel and BayesModel should actually be thin data classes that merely represent the outcome of some previous computation. I.e., the first line should be something like match getModelFromData(dataset):. And then case LinearModel() and TreeModel(): really does not make sense anymore, as you are practically saying that the resulting object must have the structure of two different classes simultaneously—which is better expressed by introducing a LinearTreeModel() in the first place.

Yes, on the surface pattern matching shares a couple of ideas with machine learning, database queries, but also regular expressions. Indeed, pattern matching as such is an important theme of computer science. However, the syntactic support we are introducing here is really about matching the existing structure of objects, not discovering new structure in data (as is the case with regular expressions and machine learning).

viridia · 2020-06-26T07:53:45Z

I want to mention that there is a hack which can be used to extend matching to any number of AND-like terms. I don't recommend this hack - it feels like an anti-pattern - but I wanted to mention it anyway.

Basically what you do is construct a custom tuple as the match expression:

match (x, x):
  case (SomeCondition(value1), OtherCondition(value2)):
    # etc...

This can be extended to do a simultaneous matching on any number of arbitrary terms. So if you really want to have multiple match patterns applied to the same value, you can do it - because there is an implicit AND joining all of the tuple elements (the same logic holds true for mappings and other composite data structures).

In fact, any arbitrary if-elif-elif-else chain can be converted into a match statement via the following pattern:

match (a, b, c, d, e):
  case [1, _, _, _, _]:
    # a == 1
  case [_, 2, _, _, _]:
    # b == 2
  case [_, _, 3, _, _]:
    # c == 3
  case [_, _, _, 4, _]:
    # d == 4
  case [_, _, _, _, 5]:
    # e == 5

This looks ugly, but it is actually less inefficient than you might think - because Brandt's reference implementation inlines all of the tests and ignores the wildcards, each case is ignoring all of the other sequence elements, other than the specific one being tested. There is still the overhead of constructing the tuple, however.

Now, obviously a real if-elif chain would be much cleaner and easier to understand.

Bonus points for anyone who can come up with a clever and memorable name for this anti-pattern :)

thautwarm · 2020-06-26T08:28:20Z

@viridia

match x:
    And([P1(x), P2()]):

@Tobias-Kohn

I think your examples show quite clearly that you are talking about things like database lookups and requests, and not about destructuring

This related terms are Active Patterns/View Patterns, they regard extracting abstract/high level structures from a data as a variant of destructuring.

A datum has different structures under different point of views(called Projections), and separating all those destructuring into different packages make a package's usage clean, and contributes to modularization.

If we don't want to support this kind of destructuring, we might drop the __match__ protocol for this reason:

Making __match_args_ a map, mapping private attribute to a public property or attribute will suffice the most use cases of destructuring under a certain view point. Dropping __match__ will also make pattern matching faster, simpler, and "safer"(if you think the patterns I talked here are harmful).

If __match__ protocol not dropped, I don't think we can prevent people to implement active/view/AND patterns with the infrastructure introduced by PEP 622.

For people have concerned demands or feeling like to provide higher level interfaces, they will still try to implement those advanced patterns that're gaining popular rapidly in recent years.

for instance, LinearModel, TreeModel and BayesModel should actually be thin data classes

Again, if we would restrict things to data classes and builtin classes, no need for __match__.

We just cannot stop people if we provide such a modern and front-edge __match__ protocol.

Besides, predictions of traditional ML models are cheap, this will encourage high level ML users to use pattern matching.

And then case LinearModel() and TreeModel(): really does not make sense anymore, as you are practically saying that the resulting object must have the structure of two different classes simultaneously —which is better expressed by introducing a LinearTreeModel() in the first place.

Once you said this, you reject the program composability.

A user will not expect to implement specific and trivial ensemble models themselves, and the library providers would say "I've implemented everything orthogonally, what's the matter?".

gvanrossum · 2020-06-26T22:42:06Z

Taine, this comes across as gibberish. Please stop.

brandtbucher added the rejected A rejected idea label Jun 26, 2020

stereobutter mentioned this issue Jun 26, 2020

Revisit load vs. store #90

Open

gvanrossum closed this as completed Jun 26, 2020

gvanrossum mentioned this issue Jul 1, 2020

Proposing a simpler match protocol: ditch it #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accepting AND Patterns: AND patterns give us program composability #97

Accepting AND Patterns: AND patterns give us program composability #97

thautwarm commented Jun 25, 2020

Tobias-Kohn commented Jun 25, 2020

thautwarm commented Jun 25, 2020

Tobias-Kohn commented Jun 25, 2020

thautwarm commented Jun 25, 2020

brandtbucher commented Jun 26, 2020

thautwarm commented Jun 26, 2020

Tobias-Kohn commented Jun 26, 2020

viridia commented Jun 26, 2020 •

edited

Loading

thautwarm commented Jun 26, 2020

gvanrossum commented Jun 26, 2020

Accepting AND Patterns: AND patterns give us program composability #97

Accepting AND Patterns: AND patterns give us program composability #97

Comments

thautwarm commented Jun 25, 2020

A Case Study

Tobias-Kohn commented Jun 25, 2020

thautwarm commented Jun 25, 2020

Tobias-Kohn commented Jun 25, 2020

thautwarm commented Jun 25, 2020

Example 1: Slightly more complex than the original one

Example 2: Data Query

Example 3: High Level Applications of Machine Learning

brandtbucher commented Jun 26, 2020

thautwarm commented Jun 26, 2020

Tobias-Kohn commented Jun 26, 2020

viridia commented Jun 26, 2020 • edited Loading

thautwarm commented Jun 26, 2020

gvanrossum commented Jun 26, 2020

viridia commented Jun 26, 2020 •

edited

Loading