Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer protocol types #593

Closed
srittau opened this issue Nov 20, 2018 · 21 comments
Closed

Buffer protocol types #593

srittau opened this issue Nov 20, 2018 · 21 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@srittau
Copy link
Collaborator

srittau commented Nov 20, 2018

We had several typeshed issues and pull requests lately that try to work around the fact that there is no way to express that a method receives any object following the buffer protocol. The typing documentation mentions BytesType ByteString, which is an alias for Union[bytes, memoryview, bytearray] (and that bytes can be used as an alias in argument types), but this is missing other types such as array.array or user-defined objects. As this is a C API protocol, just defining such a protocol in typeshed in not possible.

@JelleZijlstra
Copy link
Member

Since the buffer protocol is a standard Python feature and we need to be able to use it in type annotations, it makes sense to me to add typing.Buffer or similar. Perhaps we can make concrete buffer classes inherit from Buffer, so we'd do something like

# typing.pyi
class Buffer: ...  # empty at the Python level

# builtins.pyi
class bytes(Buffer, Sequence[int]): ...

We'd have to add Buffer also to typing_extensions.

(Side note: I think you mean ByteString (https://docs.python.org/3/library/typing.html#typing.ByteString), not BytesType.)

@ilevkivskyi
Copy link
Member

I like what @JelleZijlstra proposes (obviously the type should be abstract).

@JelleZijlstra
Copy link
Member

Actually, it's a bit more complicated, since some buffers are writable and others aren't (see the types in python/typeshed#2610). This is controlled by whether the type responds to requests with PyBuf_WRITABLE (https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE) set. So here's a revised proposal:

# typing.pyi
class ReadableBuffer: ...  # abstract, no Python attributes; corresponds to C types that expose buffers without PyBUF_WRITABLE set
class WriteableBuffer(ReadableBuffer): ...  # same; corresponds to C types that expose buffers with PyBUF_WRITABLE set

# builtins.pyi
class bytes(ReadableBuffer, Sequence[int]): ...
class bytearray(WritableBuffer, Sequence[int]): ...

There are a number of other flags controlling format, dimensions, etc., but I'm not sure those could be easily expressed in the type system. Perhaps we could implement format flag by making Buffer generic over a typevar that is restricted to certain types, but Python types don't map cleanly to C types, so that doesn't seem like it would work well.

@srittau
Copy link
Collaborator Author

srittau commented Mar 30, 2019

python/typeshed#2895 is one example where this could be useful.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Apr 2, 2019

python/typeshed#2895 is one example where this could be useful.

It is fine to experiment with such things defined locally (with an underscore, like _Reader), we can put something in typing later, when we will have more experience.

@ilevkivskyi
Copy link
Member

Oh sorry, this is a wrong issue, disregard my last comment.

@srittau
Copy link
Collaborator Author

srittau commented Aug 7, 2019

Is this something that could be considered? What steps are necessary to continue?

@ilevkivskyi
Copy link
Member

@srittau If it is not too hard maybe you can directly make a PoC PR to typeshed, so that we can discuss the details (IIUC you want this to be a stub-only feature).

cc @gvanrossum

@christopher-hesse
Copy link

I believe there is an open python issue about this: https://bugs.python.org/issue27501

@rwarren
Copy link

rwarren commented Mar 28, 2020

Until a proper type for the buffer protocol is available, would it make sense to at least partially fix this (in places like zlib) with "better-than-just-bytes" coverage workarounds? For example:

Union[bytes, bytearray, memoryview]

It seems like that would cover the vast majority of use cases.

An example of something missing from that type definition that works, at least, for zlib.compress is an array.array of bytes. I can't seem to figure out how to force to be an array of bytes from a typing perspective, though.

Also - typing.Bytestring (mentioned in the original post) doesn't seem appropriate in all cases since it looks like it is Sequence[int] in there. A sequence of ints is not accepted by zlib.decompress, for example, although my dusty memory uncertainly thinks that a sequence of ints was supposed to be legit for a true buffer protocol (I'm really not sure, though).

bmerry added a commit to bmerry/typeshed that referenced this issue Jun 15, 2020
Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.

I'm not 100% happy with the names: bytes-like is slightly stricter than
just buffer protocol (it must be able to export a C-contiguous buffer),
but in practice I'd be surprised if there are types for which there is a
difference at static analysis time (e.g. not every memoryview instance
is bytes-like, but that's a property of instances, not types).
srittau pushed a commit to python/typeshed that referenced this issue Jun 19, 2020
Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.
vishalkuo pushed a commit to vishalkuo/typeshed that referenced this issue Jun 26, 2020
Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.
@covert-encryption
Copy link

As mentioned in #997 it would also be useful to be able to specify length for any buffer types, in particular where a fixed length string is expected.

@itaisteinherz
Copy link

Bump :)

@ilevkivskyi @srittau What's the process to getting this accepted? Does this require a new PEP? I'd be open to working on this, but I'm not sure where to start.

@jakirkham
Copy link

A related question is how this would be handled in Python given the move to builtins for type hints (like with PEP 585)

@itaisteinherz
Copy link

By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:

https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/__init__.pyi#L185-L191

I suppose that means those should be moved here in order to consider this issue resolved?

@JelleZijlstra
Copy link
Member

I think it doesn't necessarily require a PEP: we could just add the types to typing.pyi and typing_extensions.pyi (as I suggested in #593 (comment) a long time ago). The process could be similar to what we just did with reveal_type(): a typing-sig discussion, followed by direct implementation in CPython.

@jakirkham
Copy link

By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:

https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/__init__.pyi#L185-L191

I suppose that means those should be moved here in order to consider this issue resolved?

This doesn't seem quite right either as memoryview and mmap are being treated as writeable. However they may or may not be. For example a memoryview of a bytes object or an mmap of a read-only file are not writeable

@JelleZijlstra
Copy link
Member

I am preparing a PEP to support checking the buffer protocol not only in the type system, but also at runtime. A first draft is at https://github.com/JelleZijlstra/peps/blob/bufferpep/pep-9999.rst. Any early feedback is welcome.

@srittau
Copy link
Collaborator Author

srittau commented Apr 22, 2022

@JelleZijlstra LGTM so far, although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.

@rgommers
Copy link

although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.

Readonly is only one of a number of attributes that are important for determining whether a buffer can be used. Some libraries can only deal with contiguous buffers, or native endianness, or aligned data. It looks to me like the PEP does the right thing here - best to support either all attributes or none, but not make readonly more important than other attributes.

@JelleZijlstra
Copy link
Member

This is now PEP 688: https://peps.python.org/pep-0688/.

@JelleZijlstra
Copy link
Member

Fixed by PEP-688.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

9 participants