Skip to content

Add file-like reader/(maybe writer) class for memoryview #138293

@abebus

Description

@abebus

Feature or enhancement

Proposal:

This is a WIP draft.

Rationale

At present, the only standard way to wrap a bytes-like buffer in a file-like object is io.BytesIO.
However, BytesIO always copies the input.

For some use cases this is perfectly fine, but in other cases you already have the data in memory (e.g. bytes, bytearray, memoryview, or array.array) and just need a thin file-like reader on top of it. Creating another copy seems unnecessary.

I think a MemoryviewReader class could fill this gap. It would act like a lightweight, zero-copy BytesIO, wrapping any buffer that supports the buffer protocol.

That said, maybe this is too niche, or maybe there are drawbacks I haven’t considered — I could be wrong.

Notes on Implementation

The following is just a proof of concept in pure Python to illustrate the idea. If there is interest, a C implementation might be worthwhile, since in Python 2 there was both BytesIO and cBytesIO, with the latter being noticeably faster.

import io
from typing import Self

class MemoryviewReader:
    """
    File-like reader over internal buffer of bytes or bytearray using memoryview.

    Basic read:
    >>> r = MemoryviewReader(b"hello world")
    >>> r.read(5), r.read()
    (b'hello', b' world')

    Read lines:
    >>> r = MemoryviewReader(b"a\\nb\\nc")
    >>> r.readline(99), r.readline(1), r.readline(), r.readline(), r.readline(99)
    (b'a\\n', b'b', b'\\n', b'c', b'')

    Iterate lines:
    >>> list(MemoryviewReader(b"x\\ny"))
    [b'x\\n', b'y']

    Seek/tell:
    >>> r = MemoryviewReader(memoryview(b"abcdef"))
    >>> r.read(3), r.tell()
    (b'abc', 3)
    >>> r.seek(0), r.read(2)
    (0, b'ab')
    >>> r.seek(2, io.SEEK_CUR), r.read(2)
    (4, b'ef')
    >>> r.seek(-3, io.SEEK_END), r.read(99)
    (3, b'def')

    Errors:
    >>> r.seek(0, 99)
    Traceback (most recent call last):
    ...
    ValueError: Invalid whence
    >>> r.seek(-10, io.SEEK_SET)
    Traceback (most recent call last):
    ...
    ValueError: Seek out of range
    >>> r.seek(10, io.SEEK_SET)
    Traceback (most recent call last):
    ...
    ValueError: Seek out of range
    """  # noqa: D301

    __slots__ = ("_mv", "_pos", "size")

    def __init__(self, buf: bytes | bytearray | memoryview):
        if not isinstance(buf, memoryview):
            buf = memoryview(buf)
        self._mv = buf
        self._pos = 0
        self.size = len(self._mv)

    def read(self, amount: int = -1, /) -> bytes:
        if amount < 0:
            end = self.size
        else:
            end = self._pos + amount
            if end > self.size:
                end = self.size
        start = self._pos
        self._pos = end
        return self._mv[start:end].tobytes()

    def readline(self, amount: int = -1, /) -> bytes:
        if self._pos >= self.size:
            return b""
        if amount < 0:
            end = self.size
        else:
            end = self._pos + amount
            if end > self.size:
                end = self.size
        start = self._pos
        i = start
        while i < end:
            if self._mv[i] == 10:  # ord('\n')
                i += 1
                break
            i += 1
        self._pos = i
        return self._mv[start:i].tobytes()

    def __iter__(self) -> Self:
        return self

    def __next__(self) -> bytes:
        line = self.readline()
        if not line:
            raise StopIteration
        return line

    def seek(self, offset: int, whence: int = 0) -> int:
        if whence == io.SEEK_SET:
            new = offset
        elif whence == io.SEEK_CUR:
            new = self._pos + offset
        elif whence == io.SEEK_END:
            new = self.size + offset
        else:
            raise ValueError("Invalid whence")
        if not (0 <= new <= self.size):
            raise ValueError("Seek out of range")
        self._pos = new
        return self._pos

    def tell(self) -> int:
        return self._pos

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions