ENH: always make ndarrays from msgpack writable #12359

Addresses the case where 'compress' was not none. The old implementation would decompress the data and then call np.frombuffer on a bytes object. Because a bytes object is not a mutable buffer, the resulting ndarray had writeable=False. The new implementation ensures that the pandas is the only owner of this new buffer and then sets it to mutable without copying it. This means that we don't need to do a copy of the data coming in AND we can mutate it later. If we are not the only owner of this data then we just copy it with np.fromstring.

In many capi functions for python the empty string and single characters are memoized to return preallocated bytes objects. This could be a potential problem with the implementation of pandas.io.packers.unconvert so we are adding a test for this explicitly. Currently neither zlib nor blosc hit this case because they use PyBytes_FromStringAndSize(NULL, n) which does not pull from the shared pool of bytes objects; however, this test will guard us against changes to this implementation detail in the future.

Pulls the logic that does the buffer management into a new private function named `_move_into_mutable_buffer`. The function acts like a move constructor from 'bytes' to `memoryview`. We now only call `np.frombuffer` on the resulting `memoryview` object. This accomplishes the original task of getting a mutable array from a `bytes` object without copying with the added safety of making the `base` field of the array that is returned to the user a mutable `memoryview`. This eliminates the risk of users obtaining the original `bytes` object which is now pointing to a mutable buffer and violating the immutability contract of `bytes` objects.

By writing our move function in C we can hide the original bytes object from the user while still ensuring that the lifetime is managed correctly. This implementation is designed to make it impossible to get access to the invalid bytes object from pure python.

This function temporarily replaces a an attribute on an object for the duration of some context manager. This is useful for patching methods to inject assertions or mock out expensive operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: always make ndarrays from msgpack writable #12359

ENH: always make ndarrays from msgpack writable #12359

Commits on Mar 17, 2016