New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence #88904
Comments
This code
sets w to a list with 40 items : the length of the subject, minus the number of non-star subpatterns. But this code (adapted from test_patma_186) enters an infinite loop import collections.abc
class Seq(collections.abc.Sequence):
def __getitem__(self, i):
print('get item', i)
return i
def __len__(self):
return 42
__getitem__ gets called forever, instead of stopping when the expected number of items in w is reached. |
(Raising the priority to "high" because any decision on this should ideally be made before the 3.10 RCs.) Hm, interesting. This is because we use UNPACK_EX for these patterns, so the destructuring is basically the same as if you had written: [x, *w, y] = Seq() ...which also hangs. We actually have three implementations for destructuring sequences:
When using your Seq class:
*If* we decide that this is a big enough problem to fix, then I think the easiest way of doing it is to use the BINARY_SUBSCR implementation for all three paths (we'll just also need to use BUILD_LIST / LIST_APPEND to actually collect the starred items). It will simplify the pattern compiler a bit, but I imagine it will also come with a performance penalty as well. In my opinion, I don't think we should rewrite everything to support Seq (though my mind isn't made up entirely). Sequences are supposed to be sized and iterable, but Seq doesn't really support the iteration protocol correctly (it expects the iterating code to stop once the length is exhausted, rather than raising StopIteration). I'm curious to hear opinions on whether we want to actually fix this, though. It seems that it will always be possible to write classes like Seq the hack our pattern-matching implementation with dubious sequences and mappings, so it really comes down to: is supporting classes like Seq worth potentially slowing down all other sequence patterns? |
I don't think there is anything technically wrong here. The Seq class is relying on the legacy version of the iterator protocol, which specifies that you need to raise IndexError there to stop the iteration. For instance, this version works: import collections.abc
class Seq(collections.abc.Sequence):
def __getitem__(self, i):
if i >= len(self):
raise IndexError
return i
def __len__(self):
return 42 match Seq(): Also, one could argue that Seq has exactly the same problem all over the language. For instance
will hang
will hang and so on and so forth. So, if I am a user and I am trying to **predict** how this would behave based on these two experiments my conclusion would be: "Anything that tries to iterate on this thing will hang" I think that whatever we do, we should make it *predictable* and consistency across the language, and IMHO only fixing it in pattern matching doesn't make it predictable. Being said that, I don't think these use cases justifies a major overhaul of how this behaves everywhere. |
In general I would propose to close this as "not a bug" |
I agree, Pablo. The only thing that makes me pause is that, depending on the pattern, *sometimes* we use iteration and *sometimes* we use indexing. That might be sort of surprising to classes like Seq if you change patterns or major Python versions and this thing starts hanging. Hopefully, though, the reason will be relatively easy to debug and fix. Closing as "not a bug". |
How is this not a regression? And a very serious one by the sound of it. All the examples that are said to go into an infinite loop work fine in 3.9. (Obviously I can't check the match statement example in 3.9.) If the Sequence Protocol really has been dropped from Python's execution model, shouldn't there be a PEP for such a major change in behaviour? Legacy or not, it is still a part of the language. At the very least, the examples shouldn't swallow the IndexError and go into an infinite loop. Brandt said:
It works fine in 3.9, with or without inheriting from the Sequence ABC. # Python 3.9
>>> class A:
... def __len__(self):
... return 5
... def __getitem__(self, i):
... print(i)
... if i < 5:
... return i
... raise IndexError
...
>>> [x, *y, z] = A()
0
1
2
3
4
5
>>> x, y, z
(0, [1, 2, 3], 4) Likewise for list(A()), etc. The __len__ method isn't needed for iteration to work correctly. I just included it to match Pierre's example. Inheriting from collections.abc.Sequence does not change the behaviour. Still in 3.9: >>> list(A())
0
1
2
3
4
5
[0, 1, 2, 3, 4] I think that closing this is as Not A Bug was premature. If this isn't a bug, then I have no idea what counts as a bug any longer :-( |
Note that the original example posted did not include a "raise IndexError" branch in its __getitem__. That was class I was referring to, since the lack of an IndexError causes the iteration to continue forever. As far as I know, nothing has changed about the iteration protocol in 3.10. |
Ah, I missed that and totally misinterrpreted other comments. All good, I agree that it is a bug in the class, not in the language. |
Thanks for the explanations, but I feel unconfortable with the fact that variable-length sequence patterns are implemented the same as unpacking. (sorry if this has been discussed before, I can't find references to the discussions that lead to the current version of the PEP). There are obvious similarities between
and
but a big difference, put forward in PEP-634 : the classes supported for pattern matching are limited (unpacking a generator expression is possible, but they are not supported as subjects for sequence patterns), and the PEP explicitely refers to them having a len(). It seems to me that this implies that the algorithms should be different:
In the second case, even if the subject never raises StopIteration, the match succeeds. Does this make sense ? |
Given that the class used to demonstrate this behavior has a bug (len is inconsistent with __getitem__), I don't think it matters much -- if doing it your way is (as I suspect) slower than the current implementation for other (well-behaved) sequences, we should not change the implementation. |
Oh, I did not invent this class, it is in the test script for pattern matching : cpython/Lib/test/test_patma.py Line 1932 in 6948964
With this class, [x, *_, y] matches, but not [x, *w, y] : this is what made me create this issue. Maybe it would be a good idea to change this class in test_patma.py ? OTOH, if the current implementation remains the same, why does the PEP insist on subjects having a len() ? Could sequence patterns match a wider range of subjects that can be unpacked ? |
I found why len() is required, it's to avoid trying to match the subject (thus consuming a part of it) if its length is less than the number of non-star patterns, as explained in the PEP. My mistake, sorry. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: