Skip to content

array.array initializer does not properly handle memoryviews (and other buffer-protocol objects) #101071

@rhpvorderman

Description

@rhpvorderman

Bug report

The array.array docs state for the initializer:

If given a list or string, the initializer is passed to the new array’s fromlist(), frombytes(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method.

code:

import array
start = array.array("Q", [1, 2, 3])
untyped_buf = memoryview(start).cast("B")  # Unsigned bytes is the default buffer type IIRC
result = array.array("Q", untyped_buf)

expected = array("Q", [1, 2, 3])
actual = array('Q', [1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0])

What happens is that the memoryview is not treated as a buffer and passed to the from_bytes method. Instead it is treated as a generic iterable.

I discovered this bug when working on a C-extension where I expose almost 7_000_000 64-bit integers to Python using PyMemoryView_FromMemory. Since this is roughly 56 MiB it should be a breeze to memcpy this into an array.array. Instead I get 56 million unsigned bytes types through a Python iterator, which causes a massive bug in my application while also being incredibly slow.

Looking at the code the arraymodule checks for array, bytes, and bytesarray before delegating to the from_bytes method. I think it is much more appropriate to use PyBuffer_Check this will also work appropiately whilst also catching memoryview.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions