-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple separators in Stream.readuntil #81322
Comments
Text-based protocols sometimes allow a choice of newline separator - I work with one that allows either \r or \n. Unfortunately that doesn't work with StreamReader.readuntil, which only accepts a single separator, so I've had to do some hacky things to obtain lines without having to From discussion in bpo-32052, it sounded like extending StreamReader.readuntil to support a tuple of separators would be feasible. |
Would you make a PR? |
I wasn't aware of that deprecation - it doesn't seem to be mentioned at https://docs.python.org/3.8/library/asyncio-stream.html. What is the replacement? |
Docs will be updated soon. The idea is: StreamReader and StreamWriter are merged into just Stream, open_connection() is replaced with connect() etc. |
Ok. Does the new Stream still have a similar interface for readuntil i.e. is this still a relevant request against the new API? I'm happy to let deprecated APIs stay as-is. |
Yes, Stream supports all StreamReader and StreamWriter methods |
Ok, I've changed the issue title to refer to Stream. Since this would be a new feature, I assume it's off the table for 3.8, but I'll see if I get time to implement a PR in time for 3.9 (and get someone at work to sign off on the contributor agreement, which might be the harder part). Thanks for the quick and helpful responses. |
I finally have permission from my employer to sign the contributors agreement, so I'll take a stab at this when I have some free time (unless nobody else gets to it first). |
please do |
I've submitted a PR: #16429 |
I already mentioned that in the PR, but we'll have to hit a pause on this. We decided to revert the latest streams implementation from 3.8, see https://bugs.python.org/issue38242. The upshot is that what we have in 3.9 should be more amenable for things like this one. |
So this is stuck because of the new Stream implementation which was supposed to be implemented in 3.8 but today is 3.12. Unless we have a plan to reinvent the streams, this would have to be added to the existing |
Worth to ressurect the initiative? ... reimplementing custom reader when multiple separators required ... is tough |
The PR looks reasonable. Maybe someone can champion it and make it up to date for release in 3.13? I'm willing to release. Assuming the OP (@bmerry) has moved on, it would be easiest for someone else to check out their branch into their own clone of the cpython repo and merge or rebase main on it, etc. Alternatively, @bmerry, if you're still around, please have at it! |
I'm still around - I stopped trying to keep the PR up to date with main because it wasn't clear if it would ever reach the front of the priority queue for the async devs. It sounds like @gvanrossum is happy to review an updated version now, so I'm happy to dust it off and update it when I find some free time. |
Yup let’s do that! Feature freeze is late May, but don’t wait till the last minute. |
@gvanrossum I didn't get any merge conflicts when I merged in |
Thanks again! There's one detail in a different repo that you may want to get ahead of. There's a declaration for this method in typeshed; ideally we should update this conditionally with a check for |
Thanks for the reminder - I'd meant to ask about typeshed but forgot. I'll sort out a PR there when I get a chance. |
Looking at the typeshed entry for # Can be any buffer that supports len(); consider changing to a Protocol if PEP 688 is accepted
async def readuntil(self, separator: bytes | bytearray | memoryview = b"\n") -> bytes: ... On the other hand, in this PR the new use case is detected like this: if isinstance(separator, bytes):
separator = [separator]
else:
# Makes sure shortest matches wins, and supports arbitrary iterables
separator = sorted(separator, key=len) So I think this PR will actually break the existing use case of passing a bytearray (or memoryview), because it would be incorrectly interpreted as an iterable of separators rather than a single separator. @gvanrossum any thoughts on how to better decide whether we've been passed a single separator or an iterable of them? Perhaps instead of |
PR python#16429 introduced support for an iterable of separators in Stream.readuntil. Since bytes-like types are themselves iterable, this can introduce ambiguities in deciding whether the argument is an iterator of separators or a singleton separator. In python#16429, only 'bytes' was considered a singleton, but this will break code that passes other buffer object types. The Python library docs don't indicate what separator types were permitted in Python <=3.12, but comments in typeshed indicate that it would work with types that implement the buffer protocol and provide a len(). To keep those cases working the way they did before, I've changed the detection logic to consider any instance of collections.abc.Buffer as a singleton separator. There may still be corner cases where this doesn't do what the user wants e.g. a numpy array of byte strings will implement the buffer protocol and hence be treated as a singleton; but at least those corner cases should behave the same in 3.13 as they did in 3.12. Relates to python#81322.
I've opened #117653 to use |
Well, my intuition would be to only allow tuples -- this follows the lead of e.g. >>> "abc".startswith("a")
True
>>> "abc".startswith(("a", "b"))
True
>>> "abc".startswith(["a", "b"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
"abc".startswith(["a", "b"])
~~~~~~~~~~~~~~~~^^^^^^^^^^^^
TypeError: startswith first arg must be str or a tuple of str, not list
>>> |
I can see the appeal in that. It's also consistent with If someone does pass a list, they're probably going to get a cryptic error message; so it would be nice if we could detect incorrect usage. So maybe if it's not a tuple AND it's not a collections.abc.Buffer, raise a helpful TypeError? |
Maybe, though I worry about overthinking this. If they pass something incorrect they already get a cryptic error message (e.g. what if they pass a string?). Do we care? Probably not, as long as it doesn't silently do the wrong thing. |
I've created a patch to accept only tuples in #117723. |
I've opened python/typeshed#11755 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: