-
-
Notifications
You must be signed in to change notification settings - Fork 29.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding bytes.frombuffer(byteslike) constructor #73364
Comments
# Summary ## 1. Making bytes from slice of bytearray easy and efficient. bs = bytes(memoryview(bytelike)[start:end]) works fine on CPython,
but it will cause issue on PyPy. Since memoryview is not closed explicitly, exception like ## 2. bytes(x) constructor is too overloaded. It has undocumented corner cases. See PEP-467 and bpo-29159 # ML threads https://mail.python.org/pipermail/python-dev/2016-October/146668.html +1 from: Nathaniel Smith, Alexander Belopolsky, Yury Selivanov Nick proposed put it on separated module, instead of adding it as builtin method. # First draft patch bytes-frombuffer.patch is first draft patch. It implements frombuffer to only bytes, frombuffer(byteslike, length=-1, offset=0) method of builtins.type instance
|
I've added a couple of review comments. Also, it looks like you can count Antoine Pitrou as +1 too. Two questions:
|
Count me as -1 too. This is just a two-liner: with memoryview(bytelike) as m:
bs = bytes(m[start:end]) In most cases, when all content is used, the bytes constructor works fine. bs = bytes(bytelike) This works not just with bytes, but with bytearray and most other bytes-like arrays. With frombuffer() you need to add a new class method to all these classes. Adding new method to builtin type has high bar. I doubts that there are enough use cases in which bytes.frombuffer() has an advantage. The signature of bytes.frombuffer() looks questionable. Why length and offset instead of start and stop indices as in slices? Why length is first and offset is last? This contradicts the interface of Python 2 buffer(), socket.sendfile(), os.sendfile(), etc. There is also a problem with returned type for subclasses (this is always a problem for alternate constructors). Should B.frombuffer() where B is a bytes subclass return an instance of bytes or B? If it always returns a bytes object, we need to use a constructor for subclasses, if it returns an instance of a subclass, how can it be implemented efficiently? |
Which virtually no one follows :(
Any protocol parsing code has a lot of slicing.
I propose to make both arguments keyword-only.
Good point. How do we usually solve this in CPython? |
Sad. But adding bytes.frombuffer() wouldn't make it magically used. If you are
How much code you expect to update with bytes.frombuffer()? And why not use the
It is deemed that returning an instance of a subclass is more preferable. |
I'm -1 if the intention is about easiness and efficiency. I think a new API is usually added due to functional defect not performance defect. We get a way here though the performance seems not ideal, according to INADA's mail. I think we should first check if memoryview gets an optimization chance to fit more in such a case. Creating a memoryview is not cheap enough in such a case. About easiness to use, when a user considering such low level details, it's reasonable to know memoryview and it needs to be released. But if this API is added to simplify bytes(), I think it makes sense but it's not only about adding a frombuffer(). |
Do you +1 when adding it to stdlib (say "bufferlib")?
Actually speaking, it's 5 calls + 2 temporary memoriview.
It can be bottleneck very easily, when writing protocol parser like HTTP parser.
closing memoryview is not strict requirements in some cases.
Yes. See msg284813. But if PEP-467 is accepted, bytes() constructor is simple enough. |
Isn't the proposed workaround also relying on CPython reference counting to immediately deallocate the sliced view? It fails if I keep a reference to the sliced view: byteslike = bytearray(b'abc')
with memoryview(byteslike) as m1:
m2 = m1[1:]
bs = bytes(m2)
>>> byteslike += b'd'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized It seems to me that it should be written as follows: with memoryview(byteslike) as m1:
with m1[1:] as m2:
bs = bytes(m2) >>> byteslike += b'd'
>>> byteslike
bytearray(b'abcd') The memoryview constructor could take start, stop, and step keyword-only arguments to avoid having to immediately slice a new view. |
You're right! How difficult working with memoryview!
Maybe, memoryview.to_bytes() is better place to add such options. memoryview(x) can accept multi dimensional arrays, and itemsize can be >=1. |
The complexity you're hitting here is the main reason I'm a fan of creating a dedicated library for dealing with these problems, rather than trying to handle them directly on the builtins. Given a bufferlib module, for example, you could have a higher level API like: def snapshot(source, *, result_type=bytes):
... That handled any source object that supported the buffer protocol, and any target type that accepted a memoryview instance as input to the constructor. # Default bytes snapshot
data = snapshot(original)
# Mutable snapshot without copying and promptly releasing the view
data = snapshot(original, result_type=bytearray) The start/stop/step or length+offset question could be handled at that level by allowing both, but also offering lower level APIs with less argument processing overhead: def snapshot_slice(source, start, stop, step=1, *, result_type=bytes):
...
def snapshot_at(source, *, offset=0, count=None, result_type=bytes):
... |
FYI, Tornado 4.5b switched buffer implementation from deque-of-bytes This is common pitfall when implementing buffer. |
I am more warm to this feature now (+0). Not because it would allow to write some code shorter, but because it can be in line with other type-strict alternate constructors, like int.fromnumber() and int.parse(). But questions about parameters and subclasses still stay. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: