-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Description
Feature or enhancement
Proposal:
This is a WIP draft.
Rationale
At present, the only standard way to wrap a bytes-like buffer in a file-like object is io.BytesIO.
However, BytesIO always copies the input.
For some use cases this is perfectly fine, but in other cases you already have the data in memory (e.g. bytes, bytearray, memoryview, or array.array) and just need a thin file-like reader on top of it. Creating another copy seems unnecessary.
I think a MemoryviewReader class could fill this gap. It would act like a lightweight, zero-copy BytesIO, wrapping any buffer that supports the buffer protocol.
That said, maybe this is too niche, or maybe there are drawbacks I haven’t considered — I could be wrong.
Notes on Implementation
The following is just a proof of concept in pure Python to illustrate the idea. If there is interest, a C implementation might be worthwhile, since in Python 2 there was both BytesIO and cBytesIO, with the latter being noticeably faster.
import io
from typing import Self
class MemoryviewReader:
"""
File-like reader over internal buffer of bytes or bytearray using memoryview.
Basic read:
>>> r = MemoryviewReader(b"hello world")
>>> r.read(5), r.read()
(b'hello', b' world')
Read lines:
>>> r = MemoryviewReader(b"a\\nb\\nc")
>>> r.readline(99), r.readline(1), r.readline(), r.readline(), r.readline(99)
(b'a\\n', b'b', b'\\n', b'c', b'')
Iterate lines:
>>> list(MemoryviewReader(b"x\\ny"))
[b'x\\n', b'y']
Seek/tell:
>>> r = MemoryviewReader(memoryview(b"abcdef"))
>>> r.read(3), r.tell()
(b'abc', 3)
>>> r.seek(0), r.read(2)
(0, b'ab')
>>> r.seek(2, io.SEEK_CUR), r.read(2)
(4, b'ef')
>>> r.seek(-3, io.SEEK_END), r.read(99)
(3, b'def')
Errors:
>>> r.seek(0, 99)
Traceback (most recent call last):
...
ValueError: Invalid whence
>>> r.seek(-10, io.SEEK_SET)
Traceback (most recent call last):
...
ValueError: Seek out of range
>>> r.seek(10, io.SEEK_SET)
Traceback (most recent call last):
...
ValueError: Seek out of range
""" # noqa: D301
__slots__ = ("_mv", "_pos", "size")
def __init__(self, buf: bytes | bytearray | memoryview):
if not isinstance(buf, memoryview):
buf = memoryview(buf)
self._mv = buf
self._pos = 0
self.size = len(self._mv)
def read(self, amount: int = -1, /) -> bytes:
if amount < 0:
end = self.size
else:
end = self._pos + amount
if end > self.size:
end = self.size
start = self._pos
self._pos = end
return self._mv[start:end].tobytes()
def readline(self, amount: int = -1, /) -> bytes:
if self._pos >= self.size:
return b""
if amount < 0:
end = self.size
else:
end = self._pos + amount
if end > self.size:
end = self.size
start = self._pos
i = start
while i < end:
if self._mv[i] == 10: # ord('\n')
i += 1
break
i += 1
self._pos = i
return self._mv[start:i].tobytes()
def __iter__(self) -> Self:
return self
def __next__(self) -> bytes:
line = self.readline()
if not line:
raise StopIteration
return line
def seek(self, offset: int, whence: int = 0) -> int:
if whence == io.SEEK_SET:
new = offset
elif whence == io.SEEK_CUR:
new = self._pos + offset
elif whence == io.SEEK_END:
new = self.size + offset
else:
raise ValueError("Invalid whence")
if not (0 <= new <= self.size):
raise ValueError("Seek out of range")
self._pos = new
return self._pos
def tell(self) -> int:
return self._pos
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere