Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

What should the pattern [x, x] mean? #26

Open
gvanrossum opened this issue May 18, 2020 · 11 comments
Open

What should the pattern [x, x] mean? #26

gvanrossum opened this issue May 18, 2020 · 11 comments
Labels
fully pepped Issues that have been fully documented in the PEP postponed Deferred for now, possibly adopted in the future

Comments

@gvanrossum
Copy link
Owner

There's disagreement over whether this checks for a tuple of two equal items, blindly assigns x the second item, or gets rejected statically. Please discuss.

@gvanrossum gvanrossum changed the title What does the pattern (x, x) mean? What should the pattern (x, x) mean? May 18, 2020
@viridia
Copy link
Collaborator

viridia commented May 18, 2020

Having it check for a tuple of two equal items seems like the kind of thing that Haskell would do. Whether this is an argument in favor or against I am not sure. Mathematically / algebraically it makes perfect sense; but it also engenders the kind of cleverness that is one of my major gripes about Haskell (not that I am fluent in Haskell by any means).

@Tobias-Kohn
Copy link
Collaborator

The two alternatives that I see here are either to reject it completely, or to just bind x to the last value encountered. I would be in favour of rejecting it—if the compiler can do it in a sensible way.

  1. Both alternatives have some precedent in Python. Rejecting it is similar to how parameter names must be unique. Bindings it to the last value assigned to x copies the behaviour of tuple assignment.
  2. Having a mixed load/store semantics would mean that the semantics of x depended on its context. Although it is only a subtle difference, I think that context-sensitive semantics quickly introduce hard-to-understand corner cases. Just to bring up one silly example here: if the second x is just a value we compare against, then the following would be legal, too: (x, x | None).
  3. In issue Reject float constant patterns? #16 we briefly discussed the question whether 3 and 3.0 should be accepted as equal. What happens in this case? Does (3, 3.0) fit the pattern (x, x) whereas (3.0, 3) does not? Or do we use standard equal semantics in this case, i.e. so that (x, x) really is the same as (x, y) if x == y?
  4. Could you then do a pattern like (x, x, _) | (x, _, x) to mean that either the second or the third element must be the same as the first one? And if yes, which xs have load and which have store semantics in such patterns?
  5. Another case I have brought up before is Foo(Bar(x), x). From a performance point of view, it would make sense to check Foo's second argument first (think, e.g., of Foo(Bar(x), 3), where we could reject many values without having to invoke Bar), before calling Bar(x), which, however, would go against the usual left-to-right evaluation order. Now, we might have to discuss how the compiler is to treat this, anyway, and whether we stick to strict left-to-right order, but probably something to consider nonetheless.

@Tobias-Kohn
Copy link
Collaborator

Let me briefly pick up two things from @viridia's comment—probably in the spirit of @viridia's comment.

Having it check for a tuple of two equal items seems like the kind of thing that Haskell would do.

Haskell (and other languages) are much stricter in everything and in some sense about as much of contrast to Python as possible. Most of all, these languages allow for very thorough static analysis (that is basically what these languages are built for), so that they can get more context on what x might or might not be in the first place. I think we should think first and foremost of what makes sense for Python developers.

Mathematically / algebraically it makes perfect sense

We should be very careful about mixing mathematical notation/reasoning and programming in such scenarios. The problem ist that x*x = n also makes perfect sense from a mathematical point of view, where we basically assign the square root of n to x; but the concept of variable differs significantly between maths and programming (this is even true for functional languages which claim to be "very mathematical").

@ilevkivskyi
Copy link
Collaborator

I have seen in few issues an argument that more people in Python are familiar with C-like languages than functional languages, and therefore we should do something like in the former. However, I think for many people Python may be the first language they learn (and I think it will be even more so in few years, taking into account current trends).

My intuition about coinciding names was based on what would an inexperienced person think when they see:

match obj
    as Foo(x, x):
        ...

I think this would read like "Does obj look like Foo(x, x) for some x?"

IOW this is different from sequence unpacking: when I see x, y = obj, I just read "take first element from obj and put it into x, take second element and put it into y". While for pattern matching assignment looks more like a convenient "side effect". After all it is called "pattern matching", not "object unpacking", even though in simple cases these two are the same.

Even in the case where a variable appears once, I see matching against Foo(1, x) as this kind of a dialog with interpreter: "Does this look like Foo(1, something)?" -- "Yes, it looks like this, and something is 1".

I don't think this is important, but may be occasionally useful (like similar feature for regexes). Finally, I am against blindly assigning the second value, if we are not supporting this now, it is better to raise an exception.

@viridia
Copy link
Collaborator

viridia commented May 21, 2020

Throwing an exception is a two-way door: we can always change it to one of the other alternatives later.

Blindly assigning it, or using it as a comparison value, are both one-way doors.

@gvanrossum
Copy link
Owner Author

Yeah, let's recognize this and raise. The traditional "out" in languages like C is to declare it undefined, but since Python's standard is so weak, "whatever the current interpreter does" ends up becoming the standard, and docs stating it's really undefined don't change that.

@gvanrossum gvanrossum added the postponed Deferred for now, possibly adopted in the future label May 21, 2020
@viridia viridia added the fully pepped Issues that have been fully documented in the PEP label Jun 1, 2020
@brandtbucher
Copy link
Collaborator

brandtbucher commented Jun 2, 2020

I just realized, while fiddling around with the implementation, that the desired check is trivially spelled with the current spec by mixing loads and stores:

>>> def check(stuff):
...     match stuff:
...         case [x, .x, .x]:
...             print(f"Three of a kind: {x}!")
...         case [x, .x, _] | [x, _, .x] | [_, x, .x]:
...             print(f"Pair: {x}!")
...         case [_, _, _]:
...             print("Junk!")
...         case _:
...             raise TypeError("Nice try!")
... 
>>> check([1, 2, 3])
Junk!
>>> check([1, 2, 2])
Pair: 2!
>>> check([3, 1, 3])
Pair: 3!
>>> check([3, 3, 3])
Three of a kind: 3!
>>> check([3, 3, 3, 4])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in check
TypeError: Nice try!

So I think we made the right choice here, either way. Do we want to point this out in the PEP, or let the community "discover" it themselves? 🙂

@brandtbucher
Copy link
Collaborator

brandtbucher commented Jun 2, 2020

Also, off-topic... but while looking at the example above, I wonder if there is some way we could omit the match stuff: line and unindent. So case blocks starting at function-scope just use the named arguments by default, as a form of multi-dispatch.

It's only a quarter-baked idea though... I should probably go to sleep!

@Tobias-Kohn
Copy link
Collaborator

The proposal to omit the match stuff: seems to remove a single line at the expense of much added complexity to the grammar and compiler. We would also have to make sure that we do not inadvertedly open the door for some strange corner cases.

On the flip side, multi dispatch has been on the wish list of a couple of people for quite some time. But they would then probably want to go at least one step further and do proper multi dispatch, arriving at something like:

def fact:
    case 0 | 1:
        return 1
    case int(n) if n > 0:
        return n * fact(n-1)

And now are back discussing some fundamental issues: clearly, calling fact(1.5) has no implementation. Is this now a case of TypeError because of incompatible arguments, or do we just silently return None since we decided that pattern matching does not need to be exhaustive?

Having multi-dispatch sure would be a great thing, but I think it is rather an entirely new discussion and not something that we should just bring in as an aside to pattern matching.

@gvanrossum
Copy link
Owner Author

Please create a fresh issue and copy those thoughts there.

@gvanrossum
Copy link
Owner Author

FWIW I am not delighted by patterns like [x, .x], but I am a big fan of specifying left-to-right execution where possible (e.g. Python requires that in a+b, a is evaluated before b), and so it seems necessary that this should work. It just will be pretty mysterious for those new to the fine points of patterns (since I am not expecting any educators to lead with .x).

@gvanrossum gvanrossum changed the title What should the pattern (x, x) mean? What should the pattern [x, x] mean? Jun 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
fully pepped Issues that have been fully documented in the PEP postponed Deferred for now, possibly adopted in the future
Projects
None yet
Development

No branches or pull requests

5 participants