Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence #88904

Closed
PierreQuentel mannequin opened this issue Jul 26, 2021 · 12 comments
Closed
Assignees
Labels
3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@PierreQuentel
Copy link
Mannequin

PierreQuentel mannequin commented Jul 26, 2021

BPO 44741
Nosy @gvanrossum, @stevendaprano, @markshannon, @PierreQuentel, @pablogsal, @brandtbucher

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/brandtbucher'
closed_at = <Date 2021-07-26.19:17:38.892>
created_at = <Date 2021-07-26.14:23:39.497>
labels = ['interpreter-core', 'type-bug', 'invalid', '3.10', '3.11']
title = 'Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence'
updated_at = <Date 2021-07-29.07:03:04.699>
user = 'https://github.com/PierreQuentel'

bugs.python.org fields:

activity = <Date 2021-07-29.07:03:04.699>
actor = 'quentel'
assignee = 'brandtbucher'
closed = True
closed_date = <Date 2021-07-26.19:17:38.892>
closer = 'brandtbucher'
components = ['Interpreter Core']
creation = <Date 2021-07-26.14:23:39.497>
creator = 'quentel'
dependencies = []
files = []
hgrepos = []
issue_num = 44741
keywords = []
message_count = 12.0
messages = ['398226', '398244', '398245', '398246', '398248', '398266', '398268', '398271', '398280', '398301', '398309', '398461']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'steven.daprano', 'Mark.Shannon', 'quentel', 'pablogsal', 'brandtbucher']
pr_nums = []
priority = 'high'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue44741'
versions = ['Python 3.10', 'Python 3.11']

@PierreQuentel
Copy link
Mannequin Author

PierreQuentel mannequin commented Jul 26, 2021

This code

match range(42):
    case [x, *w, y]:
        z = 0

sets w to a list with 40 items : the length of the subject, minus the number of non-star subpatterns.

But this code (adapted from test_patma_186) enters an infinite loop

    import collections.abc

    class Seq(collections.abc.Sequence):
        def __getitem__(self, i):
            print('get item', i)
            return i
        def __len__(self):
            return 42
match Seq():
    case [x, *w, y]:
        z = 0

__getitem__ gets called forever, instead of stopping when the expected number of items in w is reached.

@PierreQuentel PierreQuentel mannequin added type-crash A hard crash of the interpreter, possibly with a core dump 3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jul 26, 2021
@brandtbucher
Copy link
Member

(Raising the priority to "high" because any decision on this should ideally be made before the 3.10 RCs.)

Hm, interesting. This is because we use UNPACK_EX for these patterns, so the destructuring is basically the same as if you had written:

[x, *w, y] = Seq()

...which also hangs.

We actually have three implementations for destructuring sequences:

  • Using UNPACK_EX, when there is a starred name.
  • Using BINARY_SUBSCR, when there is a starred wildcard.
  • Using UNPACK_SEQUENCE, when there is no star.

When using your Seq class:

  • The first fails by falling into an infinite loop.
  • The second works as expected.
  • The third fails with a ValueError at runtime (for a length mismatch).

*If* we decide that this is a big enough problem to fix, then I think the easiest way of doing it is to use the BINARY_SUBSCR implementation for all three paths (we'll just also need to use BUILD_LIST / LIST_APPEND to actually collect the starred items). It will simplify the pattern compiler a bit, but I imagine it will also come with a performance penalty as well.

In my opinion, I don't think we should rewrite everything to support Seq (though my mind isn't made up entirely). Sequences are supposed to be sized and iterable, but Seq doesn't really support the iteration protocol correctly (it expects the iterating code to stop once the length is exhausted, rather than raising StopIteration).

I'm curious to hear opinions on whether we want to actually fix this, though. It seems that it will always be possible to write classes like Seq the hack our pattern-matching implementation with dubious sequences and mappings, so it really comes down to: is supporting classes like Seq worth potentially slowing down all other sequence patterns?

@pablogsal
Copy link
Member

I don't think there is anything technically wrong here. The Seq class is relying on the legacy version of the iterator protocol, which specifies that you need to raise IndexError there to stop the iteration.

For instance, this version works:

import collections.abc

class Seq(collections.abc.Sequence):
    def __getitem__(self, i):
        if i >= len(self):
            raise IndexError
        return i
    def __len__(self):
        return 42

match Seq():
case [x, *w, y]:
z = 0
print(z)

Also, one could argue that Seq has exactly the same problem all over the language. For instance

> list(Seq())

will hang

> [x,*y] = Seq()

will hang

and so on and so forth. So, if I am a user and I am trying to **predict** how this would behave based on these two experiments my conclusion would be:

"Anything that tries to iterate on this thing will hang"

I think that whatever we do, we should make it *predictable* and consistency across the language, and IMHO only fixing it in pattern matching doesn't make it predictable. Being said that, I don't think these use cases justifies a major overhaul of how this behaves everywhere.

@pablogsal
Copy link
Member

In general I would propose to close this as "not a bug"

@brandtbucher
Copy link
Member

I agree, Pablo.

The only thing that makes me pause is that, depending on the pattern, *sometimes* we use iteration and *sometimes* we use indexing. That might be sort of surprising to classes like Seq if you change patterns or major Python versions and this thing starts hanging. Hopefully, though, the reason will be relatively easy to debug and fix.

Closing as "not a bug".

@brandtbucher brandtbucher added invalid type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 26, 2021
@brandtbucher brandtbucher added invalid type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 26, 2021
@stevendaprano
Copy link
Member

How is this not a regression? And a very serious one by the sound of it. All the examples that are said to go into an infinite loop work fine in 3.9.

(Obviously I can't check the match statement example in 3.9.)

If the Sequence Protocol really has been dropped from Python's execution model, shouldn't there be a PEP for such a major change in behaviour? Legacy or not, it is still a part of the language.

At the very least, the examples shouldn't swallow the IndexError and go into an infinite loop.

Brandt said:

the destructuring is basically the same as if you had written:

[x, *w, y] = Seq()

...which also hangs.

It works fine in 3.9, with or without inheriting from the Sequence ABC.

# Python 3.9
>>> class A:
...     def __len__(self):
...             return 5
...     def __getitem__(self, i):
...             print(i)
...             if i < 5:
...                     return i
...             raise IndexError
... 
>>> [x, *y, z] = A()
0
1
2
3
4
5
>>> x, y, z
(0, [1, 2, 3], 4)

Likewise for list(A()), etc.

The __len__ method isn't needed for iteration to work correctly. I just included it to match Pierre's example.

Inheriting from collections.abc.Sequence does not change the behaviour.

Still in 3.9:

>>> list(A())
0
1
2
3
4
5
[0, 1, 2, 3, 4]

I think that closing this is as Not A Bug was premature. If this isn't a bug, then I have no idea what counts as a bug any longer :-(

@brandtbucher
Copy link
Member

All the examples that are said to go into an infinite loop work fine in 3.9. [...] At the very least, the examples shouldn't swallow the IndexError and go into an infinite loop.

Note that the original example posted did not include a "raise IndexError" branch in its __getitem__. That was class I was referring to, since the lack of an IndexError causes the iteration to continue forever.

As far as I know, nothing has changed about the iteration protocol in 3.10.

@stevendaprano
Copy link
Member

Ah, I missed that and totally misinterrpreted other comments.
Confirmation bias at work: I saw the code I expected to see, not the
code that was actually there :-(

All good, I agree that it is a bug in the class, not in the language.
Sorry for the noise.

@PierreQuentel
Copy link
Mannequin Author

PierreQuentel mannequin commented Jul 27, 2021

Thanks for the explanations, but I feel unconfortable with the fact that variable-length sequence patterns are implemented the same as unpacking. (sorry if this has been discussed before, I can't find references to the discussions that lead to the current version of the PEP).

There are obvious similarities between

[x, *y, z] = A

and

match A:
    case [x, *y, z]:
        print('ok')

but a big difference, put forward in PEP-634 : the classes supported for pattern matching are limited (unpacking a generator expression is possible, but they are not supported as subjects for sequence patterns), and the PEP explicitely refers to them having a len().

It seems to me that this implies that the algorithms should be different:

  • for unpacking, the iterable must be completely consumed before binding the names on the left-hand side. If it is infinite, unpacking fails

  • for variable-length sequence pattern matching (this is how I understand the last paragraph about them in PEP-634):
    . get the subject length
    . iterate one by one before the star pattern
    . iterate (len(subject) - number of non-star patterns) times for the star pattern
    . iterate one by one after the star pattern

In the second case, even if the subject never raises StopIteration, the match succeeds.

Does this make sense ?

@gvanrossum
Copy link
Member

Given that the class used to demonstrate this behavior has a bug (len is inconsistent with __getitem__), I don't think it matters much -- if doing it your way is (as I suspect) slower than the current implementation for other (well-behaved) sequences, we should not change the implementation.

@PierreQuentel
Copy link
Mannequin Author

PierreQuentel mannequin commented Jul 27, 2021

Oh, I did not invent this class, it is in the test script for pattern matching :

def test_patma_186(self):

With this class, [x, *_, y] matches, but not [x, *w, y] : this is what made me create this issue. Maybe it would be a good idea to change this class in test_patma.py ?

OTOH, if the current implementation remains the same, why does the PEP insist on subjects having a len() ? Could sequence patterns match a wider range of subjects that can be unpacked ?

@PierreQuentel
Copy link
Mannequin Author

PierreQuentel mannequin commented Jul 29, 2021

I found why len() is required, it's to avoid trying to match the subject (thus consuming a part of it) if its length is less than the number of non-star patterns, as explained in the PEP.

My mistake, sorry.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants