Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: always make ndarrays from msgpack writable #12359

Closed

Commits on Mar 17, 2016

  1. ENH: always make ndarrays from msgpack writable

    Addresses the case where 'compress' was not none. The old implementation
    would decompress the data and then call np.frombuffer on a bytes
    object. Because a bytes object is not a mutable buffer, the resulting
    ndarray had writeable=False. The new implementation ensures that the
    pandas is the only owner of this new buffer and then sets it to mutable
    without copying it. This means that we don't need to do a copy of the
    data coming in AND we can mutate it later. If we are not the only owner
    of this data then we just copy it with np.fromstring.
    Joe Jevnik committed Mar 17, 2016
    Configuration menu
    Copy the full SHA
    9cd9d80 View commit details
    Browse the repository at this point in the history
  2. TST: adds test for memoized empty string and chars

    In many capi functions for python the empty string and single characters
    are memoized to return preallocated bytes objects. This could be a
    potential problem with the implementation of pandas.io.packers.unconvert
    so we are adding a test for this explicitly.
    
    Currently neither zlib nor blosc hit this case because they use
    PyBytes_FromStringAndSize(NULL, n) which does not pull from the shared
    pool of bytes objects; however, this test will guard us against changes
    to this implementation detail in the future.
    Joe Jevnik committed Mar 17, 2016
    Configuration menu
    Copy the full SHA
    830abba View commit details
    Browse the repository at this point in the history
  3. ENH: updates unconvert no-copy code to be safer

    Pulls the logic that does the buffer management into a new private
    function named `_move_into_mutable_buffer`. The function acts like a
    move constructor from 'bytes' to `memoryview`. We now only call
    `np.frombuffer` on the resulting `memoryview` object. This accomplishes
    the original task of getting a mutable array from a `bytes` object
    without copying with the added safety of making the `base` field of the
    array that is returned to the user a mutable `memoryview`. This
    eliminates the risk of users obtaining the original `bytes` object which
    is now pointing to a mutable buffer and violating the immutability
    contract of `bytes` objects.
    Joe Jevnik committed Mar 17, 2016
    Configuration menu
    Copy the full SHA
    330dd76 View commit details
    Browse the repository at this point in the history
  4. ENH: reimplement _move_into_mutable_buffer in C.

    By writing our move function in C we can hide the original bytes object
    from the user while still ensuring that the lifetime is managed
    correctly. This implementation is designed to make it impossible to get
    access to the invalid bytes object from pure python.
    Joe Jevnik committed Mar 17, 2016
    Configuration menu
    Copy the full SHA
    84d7275 View commit details
    Browse the repository at this point in the history
  5. TST: adds pandas.util.testing.patch

    This function temporarily replaces a an attribute on an object for the
    duration of some context manager. This is useful for patching methods to
    inject assertions or mock out expensive operations.
    Joe Jevnik committed Mar 17, 2016
    Configuration menu
    Copy the full SHA
    e896603 View commit details
    Browse the repository at this point in the history