-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays #57043
Comments
memoryview.tobytes() converts a non-contiguous array to >>> from numpy import *
>>> x = array([1,2,3,4,5], dtype="B")
>>> y = x[::-1]
>>> y
array([5, 4, 3, 2, 1], dtype=uint8)
>>> m = memoryview(y)
>>> m.tobytes()
'\x04\x03\x02\x01\x05'
>>> |
That's rather funky. What should the right result be? |
Antoine Pitrou <report@bugs.python.org> wrote:
Basically [5, 4, 3, 2, 1] as bytes: '\x05\x04\x03\x02\x01' Looks like an off-by-one error. I was a bit surprised that tobytes() automatically converts anything [The bug also exists for forward strides.] array([1, 3, 5], dtype=uint64)
>>> m = memoryview(y)
>>> m.tobytes()
'\x03\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
>>> |
Added 10181 as a dependency - as noted in my review comments on that issue, I think this becomes fairly trivial to fix (and test) given Stefan's other improvements. |
I removed the dependency since PyBuffer_ToContiguous() still needs to |
New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':
|
I think that I run into the same bug today. I've developing a PEP-3118 buffer interface for my wrapper of FreeImage. It returns the data as non-contiguous 3d array with the dimension height, width, color. I've created a small test image with 5x7 RGB pixels. The first line black, second white, third grey values and the rest are red, green blue and cyan, magenta yellow in little endian (BGR order). NumPy handles the buffer correctly:
>>> arr = numpy.asarray(img)
>>> print(arr) [[[ 0 0 0] [[255 255 255] [[ 80 80 80] [[ 0 0 255] [[255 0 0] [[ 0 255 0] [[255 255 0] but memoryview.tobytes() seems to have an off-by-one error: >>> m = memoryview(img)
>>> data = m.tobytes()
>>> len(data) == 5 * 7 * 3
True
>>> for i in range(7):
... print(" ".join("%3i" % ord(v) for v in data[i * 5 * 3:(i + 1) * 5 * 3]))
0 0 0 0 0 0 0 0 0 0 0 0 0 0 255
255 255 255 255 255 255 255 255 255 255 255 255 255 255 80
80 80 112 112 112 160 160 160 192 192 192 240 240 240 0
0 255 0 255 0 255 0 0 0 0 255 0 255 0 255
0 0 0 0 255 0 255 0 255 0 0 0 0 255 0
255 0 255 0 0 0 0 255 0 255 0 255 0 0 255
255 0 255 0 255 0 255 255 255 255 0 255 0 255 0 As you can see the first byte is missing. Python 2.7.3 and Python 3.2.3 with numpy 1.6.1 and https://bitbucket.org/tiran/smc.freeimage/changeset/3134ecee2984 on 64bit Ubuntu. |
It looks like Stefan has fixed the issue in Python 3.3 a while ago. tobytes() returns the correct values with a fresh build of Python 3.3. $ PYTHONPATH="." /home/heimes/dev/python/py3k/python smc/freeimage/tests/test_image.py
test_newbuffer (__main__.TestImageNewBuffer) ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 However it's still broken in 3.2 (also up to date hg pull). $ PYTHONPATH="." /home/heimes/dev/python/3.2/python smc/freeimage/tests/test_image.py
test_newbuffer (__main__.TestImageNewBuffer) ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 255 Stefan, could you please port your fix to Python 3.2 and 3.3? Thanks! |
In Python 3.3 memoryobject.c is a complete rewrite. Porting the fix PyBuffer_ToContiguous(), which causes the problem in 2.7/3.2 and is I don't know if I can fix it before the 3.3 release. When are the |
The fix would require all of these functions from memoryview.c (3.3): last_dim_is_contiguous How to avoid code duplication? I could move them into abstract.c, |
You could move PyBuffer_ToContiguous() from abstract.c to memoryview.c. |
For 3.3 that would be ideal, yes. I asked a while ago on python-dev So, for 2.7/3.2 I could add all these functions to abstract.c. |
There is an additional problem with PyBuffer_ToContiguous(): Suppose 'view' is multi-dimensional, C-contiguous and initialized Now if PyBuffer_ToContiguous() is called with 'F', PyBuffer_IsContiguous() This means that incomplete buffer information will have to be |
Here's a patch for 3.3, which consists mostly of tests. A couple of points: o I removed the len > view->len check in PyBuffer_ToContiguous(), since o Removed the comment in bytesobject.c "Better way to get to internal buffer?". o Removed the "need to check for overflow" comment: ndim is now officially I think this can go into 3.3: It's easy to see that for contiguous buffers For memoryview, buffer_to_contiguous(x, y 'C') is literally the same |
Any objections to committing this before beta2? What about the |
Georg, need a call on how close you are to cutting beta 2 and whether you want this to wait until rc1. |
Summary of my review for Georg's benefit: I had one minor formatting nit with the patch (which Stefan can easily fix before committing), but it otherwise looked fine to me. I also agree that the old implicit truncation was unusable in practice, as the *actual* length is not returned from the function, and in fact cannot be returned because the return type is int rather than Py_ssize_t. A length mismatch therefore needs to be an error. |
Thanks, Nick! I'll move the function declaration back to abstract.h. Waiting for Georg's input. -- It seems to me that bpo-14330 is a blocker |
With the beta delayed as you say, I'm okay with this going in now. |
New changeset 8fbbc7c8748e by Stefan Krah in branch 'default': |
All right, 3.3 is fixed. Re-targeting for 3.2 and 2.7. |
Was the point that memoryview.tobytes() has a known data corruption bug in 3.2 and 2.7 raised in the previous discussion? I'm pretty sure I had forgotten about it, and I don't remember it coming up in the thread. The trickiest aspect of a backport of the new implementation is that we need to preserve the C ABI - extensions compiled against any maintenance release should work with all maintenance releases in that series. The new APIs aren't a major problem - just sprinkle a few underscores around to mark them as private on the older versions (I've certainly done that before when a bug fix genuinely needed something that qualified as a new feature: implemented a private version to use in fixing the bug on the maintenance branch, then promote that to a public API on trunk) |
Christian's posts and my initial report were about memoryview.tobytes(). It's BTW, I'm sure that PyBuffer_FromContiguous() and PyObject_CopyData() have the |
Thanks Stefan and Nick! I tried to find the off-by-one bug myself but gave up quickly. Stefan's rewrite is a better approach. |
Nick, are you talking about a complete backport or just about pulling |
I suspect the need to preserve the C ABI would make a complete backport a challenge. I'm still tempted though, mainly to give third parties a robust core implementation of the buffer API to test against. This is especially so with Python 2.7 still having a couple of years of full maintenance left - that's a long time to leave it with a known broken memoryview implementation. I'm less worried about 3.2, since the upgrade path to 3.3 is easier in that case, but even that version is likely to see widespread use for a long time. |
Well, there's a reason we don't backport features to bugfix branches, |
Right now major parts of the buffer API are broken for non-trivial buffer definitions. IMHO a backport of the fix doesn't count as a new feature although it needs some new internal functions. I don't quite understand why Nick thinks that ABI compatibility is a challenge. The structs, typedefs and function definitions aren't modified. The new functions aren't visible because they can be implemented as static functions if PyBuffer_ToContiguous() is moved to memoryview.c. That won't break the ABI eiter. If we want to keep the function in its old place then we can prefix the new functions with _Py and include them in a private header file. That would export new function. |
I was musing about backporting *all* the memoryview fixes, including the buffer lifecycle ones. Antoine and Stefan rightly pointed out that was a bad idea, and not necessary to address this problem. So +1 for backporting just this specific fix, with the necessary infrastructure to make it work. |
This particular bug fix for PyBuffer_ToContiguous() (which would automatically For non-trivial buffers it's likely though that other problems Backporting the *complete* rewrite would be relatively easy, too. Fixing all of bpo-10181 without the drastic changes would be very time |
Removing 3.1, since its addition was apparently done by mistake. Again, I wonder why this is marked release-critical. It's not a regression from previous versions, is it? Please use release-critical only if not making a release at a certain point in the future is better than making the release (despite all the advantages that the release otherwise might have). If you merely think that the issue is "really important" and "should not be forgotten", use "critical" or "high". |
Since the next release of 3.2 is the *last* release of 3.2, yet it will remain supported by distros well beyond that, yes, I think this should block the final 3.2 release. |
But the same will be true for any other bug that 3.2 has. If they The consequence instead should be that people experiencing this |
I'm unable to set 2.7 and 3.2 in my browser without also setting |
The last bugfix release of 3.2 will happen "shortly" after the release of 3.3, so in a not-too-far future (compared to the age of this issue, which just had its first birthday yesterday). So if this issue should really block 3.2 (which I still think it should not), I'd urge possible contributors to start working on it now - especially if this is going to be a large patch. If such patch introduces new bugs, it won't be possible to ever fix them (unless they are security-critical). Also note that this is almost the only release blocker for the 3.2 release, next to a easy-to-implement request to update the expat code. |
Despite my earlier comments, I'm now inclined to agree with Martin here - upgrading to 3.3 fixes so many other problems with memoryview, that's a more compelling solution. And, of course, using NumPy instead always remains an option for more robust buffer API support in older versions. |
(Not saying this shouldn't be fixed, just saying it's not a disaster if it isn't) |
I agree that for 3.2 this isn't so important given that non-contiguous Christian, does a fix for 3.2 benefit FreeImage? Don't you run into If it helps, I can try to write a patch for 3.2. |
2.7 is the only remaining candidate for the fix. I'm not going to |
Fixed for 2.7 in bpo-22668. |
Sorry, bpo-23349. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: