Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays #57043

Closed
skrah mannequin opened this issue Aug 24, 2011 · 41 comments
Closed

memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays #57043

skrah mannequin opened this issue Aug 24, 2011 · 41 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@skrah
Copy link
Mannequin

skrah mannequin commented Aug 24, 2011

BPO 12834
Nosy @loewis, @birkenfeld, @ncoghlan, @abalkin, @pitrou, @vstinner, @tiran, @benjaminp, @skrah, @meadori
Files
  • issue12834.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-10-14.15:02:51.625>
    created_at = <Date 2011-08-24.18:45:04.391>
    labels = ['interpreter-core', 'type-bug']
    title = 'memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays'
    updated_at = <Date 2015-01-31.15:17:31.451>
    user = 'https://github.com/skrah'

    bugs.python.org fields:

    activity = <Date 2015-01-31.15:17:31.451>
    actor = 'skrah'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-10-14.15:02:51.625>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2011-08-24.18:45:04.391>
    creator = 'skrah'
    dependencies = []
    files = ['26512']
    hgrepos = []
    issue_num = 12834
    keywords = ['patch']
    message_count = 41.0
    messages = ['142894', '142895', '142899', '151358', '151958', '154241', '165788', '165828', '165839', '165916', '165917', '165919', '166012', '166400', '166550', '166562', '166564', '166638', '166639', '166644', '166645', '166712', '166728', '166754', '166764', '166840', '166842', '166891', '166894', '166917', '167761', '167762', '167764', '167765', '169125', '169126', '169127', '169128', '229295', '235113', '235114']
    nosy_count = 11.0
    nosy_names = ['loewis', 'georg.brandl', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'christian.heimes', 'benjamin.peterson', 'skrah', 'meador.inge', 'python-dev']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12834'
    versions = ['Python 2.7']

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 24, 2011

    memoryview.tobytes() converts a non-contiguous array to
    a contiguous representation. This result is not right:

    >>> from numpy import *
    >>> x = array([1,2,3,4,5], dtype="B")
    >>> y = x[::-1]
    >>> y
    array([5, 4, 3, 2, 1], dtype=uint8)
    >>> m = memoryview(y)
    >>> m.tobytes()
    '\x04\x03\x02\x01\x05'
    >>>

    @skrah skrah mannequin self-assigned this Aug 24, 2011
    @skrah skrah mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Aug 24, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Aug 24, 2011

    That's rather funky. What should the right result be?

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 24, 2011

    Antoine Pitrou <report@bugs.python.org> wrote:

    That's rather funky. What should the right result be?

    Basically [5, 4, 3, 2, 1] as bytes:

    '\x05\x04\x03\x02\x01'

    Looks like an off-by-one error.

    I was a bit surprised that tobytes() automatically converts anything
    to a C-contiguous array. The result can be completely disconnected
    from the raw memory.

    [The bug also exists for forward strides.]

    array([1, 3, 5], dtype=uint64)
    >>> m = memoryview(y)
    >>> m.tobytes()
    '\x03\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
    >>>

    @ncoghlan
    Copy link
    Contributor

    Added 10181 as a dependency - as noted in my review comments on that issue, I think this becomes fairly trivial to fix (and test) given Stefan's other improvements.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 25, 2012

    I removed the dependency since PyBuffer_ToContiguous() still needs to
    be fixed in abstract.c while memoryview.tobytes() now works in the
    PEP-3118 repo.

    @skrah skrah mannequin changed the title memoryview.tobytes() incorrect for non-contiguous arrays PyBuffer_ToContiguous() incorrect for non-contiguous arrays Jan 25, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 25, 2012

    New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':

    @tiran
    Copy link
    Member

    tiran commented Jul 18, 2012

    I think that I run into the same bug today. I've developing a PEP-3118 buffer interface for my wrapper of FreeImage. It returns the data as non-contiguous 3d array with the dimension height, width, color.

    I've created a small test image with 5x7 RGB pixels. The first line black, second white, third grey values and the rest are red, green blue and cyan, magenta yellow in little endian (BGR order).

    NumPy handles the buffer correctly:
    >>> arr = numpy.asarray(img)
    >>> print(arr)

    [[[ 0 0 0]
    [ 0 0 0]
    [ 0 0 0]
    [ 0 0 0]
    [ 0 0 0]]

    [[255 255 255]
    [255 255 255]
    [255 255 255]
    [255 255 255]
    [255 255 255]]

    [[ 80 80 80]
    [112 112 112]
    [160 160 160]
    [192 192 192]
    [240 240 240]]

    [[ 0 0 255]
    [ 0 255 0]
    [255 0 0]
    [ 0 0 255]
    [ 0 255 0]]

    [[255 0 0]
    [ 0 0 255]
    [ 0 255 0]
    [255 0 0]
    [ 0 0 255]]

    [[ 0 255 0]
    [255 0 0]
    [ 0 0 255]
    [ 0 255 0]
    [255 0 0]]

    [[255 255 0]
    [255 0 255]
    [ 0 255 255]
    [255 255 0]
    [255 0 255]]]

    but memoryview.tobytes() seems to have an off-by-one error:

    >>> m = memoryview(img)
    >>> data = m.tobytes()
    >>> len(data) ==  5 * 7 * 3
    True
    >>> for i in range(7):
    ...     print(" ".join("%3i" % ord(v) for v in data[i * 5 * 3:(i + 1) * 5 * 3]))
      0   0   0   0   0   0   0   0   0   0   0   0   0   0 255
    255 255 255 255 255 255 255 255 255 255 255 255 255 255  80
     80  80 112 112 112 160 160 160 192 192 192 240 240 240   0
      0 255   0 255   0 255   0   0   0   0 255   0 255   0 255
      0   0   0   0 255   0 255   0 255   0   0   0   0 255   0
    255   0 255   0   0   0   0 255   0 255   0 255   0   0 255
    255   0 255   0 255   0 255 255 255 255   0 255   0 255   0

    As you can see the first byte is missing.

    Python 2.7.3 and Python 3.2.3 with numpy 1.6.1 and https://bitbucket.org/tiran/smc.freeimage/changeset/3134ecee2984 on 64bit Ubuntu.

    @tiran
    Copy link
    Member

    tiran commented Jul 19, 2012

    It looks like Stefan has fixed the issue in Python 3.3 a while ago. tobytes() returns the correct values with a fresh build of Python 3.3.

    $ PYTHONPATH="." /home/heimes/dev/python/py3k/python smc/freeimage/tests/test_image.py
    test_newbuffer (__main__.TestImageNewBuffer) ... 

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
    80 80 80 112 112 112 160 160 160 192 192 192 240 240 240
    0 0 255 0 255 0 255 0 0 0 0 255 0 255 0
    255 0 0 0 0 255 0 255 0 255 0 0 0 0 255
    0 255 0 255 0 0 0 0 255 0 255 0 255 0 0
    255 255 0 255 0 255 0 255 255 255 255 0 255 0 255

    However it's still broken in 3.2 (also up to date hg pull).

    $ PYTHONPATH="." /home/heimes/dev/python/3.2/python smc/freeimage/tests/test_image.py
    test_newbuffer (__main__.TestImageNewBuffer) ... 

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 255
    255 255 255 255 255 255 255 255 255 255 255 255 255 255 80
    80 80 112 112 112 160 160 160 192 192 192 240 240 240 0
    0 255 0 255 0 255 0 0 0 0 255 0 255 0 255
    0 0 0 0 255 0 255 0 255 0 0 0 0 255 0
    255 0 255 0 0 0 0 255 0 255 0 255 0 0 255
    255 0 255 0 255 0 255 255 255 255 0 255 0 255 0

    Stefan, could you please port your fix to Python 3.2 and 3.3? Thanks!

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 19, 2012

    In Python 3.3 memoryobject.c is a complete rewrite. Porting the fix
    separately would be quite a bit of work.

    PyBuffer_ToContiguous(), which causes the problem in 2.7/3.2 and is
    still broken in 3.3, could be fixed by using the recursive copy_buffer()
    function from the new memoryobject.c.

    I don't know if I can fix it before the 3.3 release. When are the
    next 2.7/3.2 releases?

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 20, 2012

    The fix would require all of these functions from memoryview.c (3.3):

    last_dim_is_contiguous
    cmp_structure
    copy_base
    copy_rec
    copy_buffer

    How to avoid code duplication? I could move them into abstract.c,
    but conceptually they're really just low level buffer interface
    functions. Also, they make a lot of assumptions (ndim >= 1,
    PyBUF_FULL) that are a little dangerous for a general interface.

    @tiran
    Copy link
    Member

    tiran commented Jul 20, 2012

    You could move PyBuffer_ToContiguous() from abstract.c to memoryview.c.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 20, 2012

    You could move PyBuffer_ToContiguous() from abstract.c to memoryview.c.

    For 3.3 that would be ideal, yes. I asked a while ago on python-dev
    whether to backport the memoryview rewrite. The general mood was
    against it.

    So, for 2.7/3.2 I could add all these functions to abstract.c.
    But an additional problem is that the whole test infrastructure of
    Lib/test/test_buffer.py and Modules/_testbuffer.c would be missing.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 21, 2012

    There is an additional problem with PyBuffer_ToContiguous():

    Suppose 'view' is multi-dimensional, C-contiguous and initialized
    according to PyBUF_ND, i.e. view->shape != NULL but view->strides == NULL.

    Now if PyBuffer_ToContiguous() is called with 'F', PyBuffer_IsContiguous()
    returns false and view->strides will be accessed.

    This means that incomplete buffer information will have to be
    reconstructed like it is done in the 3.3 memoryview.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 25, 2012

    Here's a patch for 3.3, which consists mostly of tests. A couple of points:

    o I removed the len > view->len check in PyBuffer_ToContiguous(), since
    the function is not documented and silently accepting output buffers
    that are too large seems unusual to me.

    o Removed the comment in bytesobject.c "Better way to get to internal buffer?".
    IMO the answer is no: There isn't any better way that works in full generality.

    o Removed the "need to check for overflow" comment: ndim is now officially
    limited to 64.

    I think this can go into 3.3: It's easy to see that for contiguous buffers
    PyBuffer_ToContiguous() behaves in the same manner as before (except for
    the len issue).

    For memoryview, buffer_to_contiguous(x, y 'C') is literally the same
    code as buffer_to_c_contiguous().

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 27, 2012

    Any objections to committing this before beta2? What about the
    len > view->len change: Does that look reasonable?

    @ncoghlan
    Copy link
    Contributor

    Georg, need a call on how close you are to cutting beta 2 and whether you want this to wait until rc1.

    @ncoghlan
    Copy link
    Contributor

    Summary of my review for Georg's benefit: I had one minor formatting nit with the patch (which Stefan can easily fix before committing), but it otherwise looked fine to me.

    I also agree that the old implicit truncation was unusable in practice, as the *actual* length is not returned from the function, and in fact cannot be returned because the return type is int rather than Py_ssize_t. A length mismatch therefore needs to be an error.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 28, 2012

    Thanks, Nick! I'll move the function declaration back to abstract.h.

    Waiting for Georg's input. -- It seems to me that bpo-14330 is a blocker
    that will only be fixed on Monday.

    @birkenfeld
    Copy link
    Member

    With the beta delayed as you say, I'm okay with this going in now.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 28, 2012

    New changeset 8fbbc7c8748e by Stefan Krah in branch 'default':
    Issue bpo-12834: Fix PyBuffer_ToContiguous() for non-contiguous arrays.
    http://hg.python.org/cpython/rev/8fbbc7c8748e

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 28, 2012

    All right, 3.3 is fixed. Re-targeting for 3.2 and 2.7.

    @skrah skrah mannequin removed the release-blocker label Jul 28, 2012
    @ncoghlan
    Copy link
    Contributor

    Was the point that memoryview.tobytes() has a known data corruption bug in 3.2 and 2.7 raised in the previous discussion? I'm pretty sure I had forgotten about it, and I don't remember it coming up in the thread.

    The trickiest aspect of a backport of the new implementation is that we need to preserve the C ABI - extensions compiled against any maintenance release should work with all maintenance releases in that series.

    The new APIs aren't a major problem - just sprinkle a few underscores around to mark them as private on the older versions (I've certainly done that before when a bug fix genuinely needed something that qualified as a new feature: implemented a private version to use in fixing the bug on the maintenance branch, then promote that to a public API on trunk)

    @ncoghlan ncoghlan changed the title PyBuffer_ToContiguous() incorrect for non-contiguous arrays memorview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays Jul 29, 2012
    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 29, 2012

    Christian's posts and my initial report were about memoryview.tobytes(). It's
    good that you changed the title: memoryview.tobytes() is more meaningful than
    the slightly obscure PyBuffer_ToContiguous().

    BTW, I'm sure that PyBuffer_FromContiguous() and PyObject_CopyData() have the
    same problem, but they aren't used in the Python source tree and they are
    undocumented.

    @ncoghlan ncoghlan changed the title memorview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays Jul 29, 2012
    @tiran
    Copy link
    Member

    tiran commented Jul 29, 2012

    Thanks Stefan and Nick!

    I tried to find the off-by-one bug myself but gave up quickly. Stefan's rewrite is a better approach.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 29, 2012

    Nick, are you talking about a complete backport or just about pulling
    in all functions needed by PyBuffer_ToContiguous()?

    @ncoghlan
    Copy link
    Contributor

    I suspect the need to preserve the C ABI would make a complete backport a challenge. I'm still tempted though, mainly to give third parties a robust core implementation of the buffer API to test against.

    This is especially so with Python 2.7 still having a couple of years of full maintenance left - that's a long time to leave it with a known broken memoryview implementation. I'm less worried about 3.2, since the upgrade path to 3.3 is easier in that case, but even that version is likely to see widespread use for a long time.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 30, 2012

    This is especially so with Python 2.7 still having a couple of years
    of full maintenance left - that's a long time to leave it with a known
    broken memoryview implementation. I'm less worried about 3.2, since
    the upgrade path to 3.3 is easier in that case, but even that version
    is likely to see widespread use for a long time.

    Well, there's a reason we don't backport features to bugfix branches,
    especially when we're talking about a complete rewrite of the
    implementation. So I really don't agree this should be backported.

    @tiran
    Copy link
    Member

    tiran commented Jul 30, 2012

    Right now major parts of the buffer API are broken for non-trivial buffer definitions. IMHO a backport of the fix doesn't count as a new feature although it needs some new internal functions.

    I don't quite understand why Nick thinks that ABI compatibility is a challenge. The structs, typedefs and function definitions aren't modified. The new functions aren't visible because they can be implemented as static functions if PyBuffer_ToContiguous() is moved to memoryview.c. That won't break the ABI eiter.

    If we want to keep the function in its old place then we can prefix the new functions with _Py and include them in a private header file. That would export new function.

    @ncoghlan
    Copy link
    Contributor

    I was musing about backporting *all* the memoryview fixes, including the buffer lifecycle ones. Antoine and Stefan rightly pointed out that was a bad idea, and not necessary to address this problem.

    So +1 for backporting just this specific fix, with the necessary infrastructure to make it work.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jul 30, 2012

    Right now major parts of the buffer API are broken for non-trivial
    buffer definitions. IMHO a backport of the fix doesn't count as a new
    feature although it needs some new internal functions.

    This particular bug fix for PyBuffer_ToContiguous() (which would automatically
    fix memoryview.tobytes() in 2.7 and 3.2) is easy to backport.

    For non-trivial buffers it's likely though that other problems
    will show up in versions 2.7 and 3.2.

    Backporting the *complete* rewrite would be relatively easy, too.
    But that would change the layout of the MemoryView object etc.

    Fixing all of bpo-10181 without the drastic changes would be very time
    consuming (if at all possible).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 9, 2012

    Removing 3.1, since its addition was apparently done by mistake.

    Again, I wonder why this is marked release-critical. It's not a regression from previous versions, is it? Please use release-critical only if not making a release at a certain point in the future is better than making the release (despite all the advantages that the release otherwise might have). If you merely think that the issue is "really important" and "should not be forgotten", use "critical" or "high".

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 9, 2012

    Since the next release of 3.2 is the *last* release of 3.2, yet it will remain supported by distros well beyond that, yes, I think this should block the final 3.2 release.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 9, 2012

    Since the next release of 3.2 is the *last* release of 3.2, yet it
    will remain supported by distros well beyond that, yes, I think this
    should block the final 3.2 release.

    But the same will be true for any other bug that 3.2 has. If they
    don't get fixed now, they will remain unfixed. Should we therefore
    delay the last release of 3.2 until all bugs in it are fixed?

    The consequence instead should be that people experiencing this
    bug will have to move to Python 3.3 to get it fixed. Since it only
    affects NumPy users (AFAICT), and then only those who use tobytes,
    I wouldn't consider this bug even critical.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 9, 2012

    Removing 3.1, since its addition was apparently done by mistake.

    I'm unable to set 2.7 and 3.2 in my browser without also setting
    3.1 (using the Shift key to mark multiple fields).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 25, 2012

    The last bugfix release of 3.2 will happen "shortly" after the release of 3.3, so in a not-too-far future (compared to the age of this issue, which just had its first birthday yesterday).

    So if this issue should really block 3.2 (which I still think it should not), I'd urge possible contributors to start working on it now - especially if this is going to be a large patch. If such patch introduces new bugs, it won't be possible to ever fix them (unless they are security-critical). Also note that this is almost the only release blocker for the 3.2 release, next to a easy-to-implement request to update the expat code.

    @ncoghlan
    Copy link
    Contributor

    Despite my earlier comments, I'm now inclined to agree with Martin here - upgrading to 3.3 fixes so many other problems with memoryview, that's a more compelling solution.

    And, of course, using NumPy instead always remains an option for more robust buffer API support in older versions.

    @ncoghlan
    Copy link
    Contributor

    (Not saying this shouldn't be fixed, just saying it's not a disaster if it isn't)

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Aug 25, 2012

    I agree that for 3.2 this isn't so important given that non-contiguous
    arrays have multiple issues there.

    Christian, does a fix for 3.2 benefit FreeImage? Don't you run into
    other problems with memoryview?

    If it helps, I can try to write a patch for 3.2.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Oct 14, 2014

    2.7 is the only remaining candidate for the fix. I'm not going to
    work on it: somehow seems too risky for 2.7 at this stage.

    @skrah skrah mannequin removed their assignment Oct 14, 2014
    @pitrou pitrou closed this as completed Oct 14, 2014
    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 31, 2015

    Fixed for 2.7 in bpo-22668.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 31, 2015

    Sorry, bpo-23349.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants