ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) #7467

abalkin · 2016-03-27T04:44:15Z

Added support for PEP 3118 PIL-style buffers in multiarray
constructor.

Closes #5412.

abalkin · 2016-03-27T05:22:56Z

The ndim > 2 case does not quite work yet and I need to figure out where to get the test cases for it.

Added support for PEP 3118 PIL-style buffers in multiarray constructor.

abalkin · 2016-03-27T20:30:10Z

I fixed the ndim > 2 case, added a test and squashed the commits. This is ready for review. Note that an untested case is with multiple nonnegative suboffsets.

abalkin · 2016-03-27T20:37:51Z

numpy/core/src/multiarray/ctors.c

+            goto fail;
+         if (PyArray_SetBaseObject((PyArrayObject *)r, memoryview) < 0) {
+            Py_DECREF(r);
+            goto fail;


PyArray_NewFromDescr steals a reference to descr, so the goto's above should not got to

fail: Py_XDECREF(descr); Py_XDECREF(descr);

…a is copied.

njsmith · 2016-03-27T21:29:33Z

Can you add some text summarizing what "PEP 3118 PIL-style buffers" are and why they need special handling? (Is this python imaging library PIL or something else?) I'm sure I could figure it out with some research but we reviewers are lazy folk ;-)

abalkin · 2016-03-27T22:45:07Z

"PEP 3118 PIL-style buffers" are an integral part of PEP 3118. Here is the description from the PEP:

"""
The PIL uses a more opaque memory representation. Sometimes an image is contained in a contiguous segment of memory, but sometimes it is contained in an array of pointers to the contiguous segments (usually lines) of the image. The PIL is where the idea of multiple buffer segments in the original buffer interface came from. ... The PIL memory model is sometimes used in C-code where a 2-d array can then be accessed using double pointer indirection: e.g. image[i][j] .
"""

Where would you like me to add this text (other than here)?

abalkin · 2016-03-27T22:51:06Z

BTW, I cannot figure out what Travis CI is unhappy about.

jaimefrio · 2016-03-27T23:36:22Z

numpy/core/src/multiarray/ctors.c

+    }
+    /* compute blocksize */
+    blocksize = itemsize;
+    for (ax = last_axis + 1; ax < ndim; ++ax)


If all suboffsets[ax] < 0 then last_axis never gets initialized when we get here. Even if that is not really possible (is it not?) the compiler does not seem able to figure it out, and that's what's making Travis unhappy. I guess the right value to initialize it at the beginning of the function is -1?

suboffsets cannot be all negative because this would mean no dereferencing and the PEP requires that suboffsets pointer be NULL in this case. If not for that requirement, the right initial value would indeed be -1, but I don't want to introduce confusion between real -1 and -1 that means ndim - 1, so I just initialize last_axis to 0.

jaimefrio · 2016-03-28T19:05:58Z

numpy/core/src/multiarray/ctors.c

+_copy_from_pil_style(char* dest, int ndim, char *buf, npy_intp *shape,
+                     npy_intp *strides, npy_intp *suboffsets, npy_intp itemsize)
+{
+    int ax, last_axis = 0;


It may be good to add a comment like /* setting last_axis to anything to silence compiler warnings */ so that no one ends up spending long hours trying to figure out why 0 and not the seemingly more reasonable -1.

I added a comment. Please note that I plan to squash the commits that start with "FIX:" after the review.

seberg · 2016-03-28T19:59:02Z

These types of buffers will be copied, while typically we use a view into the buffer as far as I understand. Just for consideration, I am not quite sure yet whether I think there is anything to worry about at all.

abalkin · 2016-03-28T20:12:45Z

@seberg - yes, I thought about this. Maybe we should only allow np.array(pil_buffer) but not np.asarray(pil_buffer). (Require ENSURECOPY flag?)

While it is not possible to create an array view of a PIL buffer, it is possible to create an array of views into its components in many cases, but I don't think this is what the default constructor should do.

seberg · 2016-03-28T20:19:25Z

Yeah, that kind of special array of arrays should not happen, it would be a different thing/function.
Require the copy flag seems a nice idea, if just to be on the safe side for the moment and not lock us in.
In principle a copy is fine in asarray, it just bothers me a bit, if users might start expecting one and suddenly get the other, i.e. in case PIL or someone switches between the two based on the image format.

njsmith · 2016-03-28T20:23:40Z

It's already that case that asarray's semantics are "maybe copy, maybe not, depending on exactly what the input object is". I'm not too worried about what happens if PIL or someone changes format -- the exact same thing happens if they switch to implementing __array__ or whatever.

seberg · 2016-03-28T20:26:24Z

It is not that they change, but that the same type might sometimes give a copy and sometimes a view. Of course anyone could make such an object easily already. But yes, I am not bought that we should take the safe but maybe uncomfortable side here. Just unsure.

njsmith · 2016-03-28T22:46:44Z

I'm not 100% sure myself, but I feel like most code that calls asarray would be very annoyed if it suddenly had to jump through hoops like

if isinstance(a, PIL.Image):
    a = np.array(a)
else:
    a = np.asarray(a)

And in general code that calls asarray already has to work correctly regardless of whether of it gets back a view.

abalkin · 2016-03-28T23:18:31Z

I tend to agree with @njsmith, having np.asarray fail where np.array succeeds will be pretty annoying.

BTW, does anyone know whether PIL.Image actually supports PEP 3118 buffer protocol? My personal needs are with PyQ, a package which provides Python embedded in kdb+.

charris · 2016-03-29T01:56:04Z

What of pillow?

abalkin · 2016-03-29T02:32:39Z

@charris - I've installed pillow and apparently it does not support PEP 3118.

>>> from PIL import Image
>>> im = Image.open("lena.ppm")
>>> memoryview(im)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: memoryview: a bytes-like object is required, not 'PpmImageFile'

I'll investigate how it manages to be compatible with numpy. It probably supports one of the older protocols:

>>> numpy.asarray(im)
array([[[226, 137, 125],
        [226, 137, 125],
        [223, 137, 133],
        ...

abalkin · 2016-03-29T02:38:57Z

It looks like pillow supports array interface version 3 (whatever it is) but not PEP 3118.

>>> im.__array_interface__['version']
3

Isn't it ironic given that PEP cites PIL as an application?

charris · 2016-04-01T15:42:19Z

Isn't it ironic given that PEP cites PIL as an application?

The PEP came too late, PIL died around 2009 ;) Hence Pillow. The two are incompatible, so remove PIL before installing Pillow. You might open an issue on the Pillow site, I really don't think numpy should special case it, rather the other way around. Note that scikit-image supports PIL images among other things. I'm not clear on exactly what functionality you need, but perhaps that will help.

charris · 2016-04-01T15:47:42Z

I'm inclined to close this unless someone wants to keep it open.

seberg · 2016-04-01T19:12:29Z

Well, PIL style buffers can be used by other people as well, such as possibly DyND. Since as far as I understand all that "PIL style" means is the use of suboffsets, so pointers to chunks.
If there is no actual use case, I am fine with waiting for someone who cares to come around. I suspect that this might be fixing a bug with what currently would happen? In that case I would be glad if that part (which would just create an error) would not get lost.

njsmith · 2016-04-01T21:46:19Z

The original poster's use case is PyQ upthread. Given that there are users and it's in a standard (PEP 3118), supporting it seems reasonable, even if the name is out of date. Though I guess one might ask if pyq would be better off using a part of PEP 3118 that other people actually picked up on :-)

Is there an easy way to delegate the unpacking to something like memoryview or bytes? I assume something inside python knows about this buffer format if they ship a module for testing it...?

seberg · 2016-04-01T22:13:16Z

Maybe stupid question, but by any chance, PyQ does not implement their GetBuffer in such a way that using PyObject_GetBuffer with the STRIDED flag and only then calling PyMemoryView_FromBuffer instead of calling PyMemoryView_FromObject might already help? Of course it seems unlikely that someone who supports suboffsets would do the copying to strided for us when a strided buffer is requested, instead of just erroring out.
Anyway, wanted to note it, since I think that would be the correct fix if we do not want to do the automatic copy. Of course we could also implement an memoryview.as_strided_buffer() or similar functions and push this problem upstream. But that would mean users would have to go through memoryview to get their numpy array.

abalkin · 2016-04-04T19:17:39Z

I'm inclined to close this unless someone wants to keep it open.

@charris - what is your argument for not supporting PEP 3118 in numpy? The behavior shown in #5412 is a bug that potentially can lead to silent data corruption.

abalkin · 2016-04-04T19:28:57Z

@seberg - PyQ is Python running inside kdb+ and uses native objects as data buffers. In kdb+, vector objects are extremely simple: an 8-byte header (with reference count, type code etc.), followed by a 64-bit integer length, followed by a contiguous array of elements.

There is no support for arrays with ndim>1, but you can have vectors of vectors where each element is a pointer to a similar structure. These vectors of vectors are flexible enough to emulate arrays with an arbitrary ndim (and more).

abalkin · 2016-04-04T19:39:34Z

one might ask if pyq would be better off using a part of PEP 3118 that other people actually picked up on

@njsmith - it does and it works fine for ndim <= 1 (scalars and 1d vectors). However, the layout of matrices and higher rank tensors is an array of pointers to arrays (of pointers ...). It would be quite awkward to have memoryview(x) to have all x data copied.

abalkin · 2017-07-27T01:14:06Z

Ping. Is there anything I can do to move this forward?

njsmith · 2017-07-27T01:27:44Z

This adds quite a bit of non-trivial C code. What do you think of an alternative implementation strategy: if the buffer has suboffsets, create a bytearray from it (letting the bytearray code worry about dealing with the complicated bits), and then use the bytearray as our backing buffer for the array?

abalkin · 2017-07-31T13:12:22Z

@njsmith - what do you mean by bytearray? Whatever it is, is it currently capable of initializing itself from a buffer with suboffsets?

njsmith · 2017-07-31T18:13:39Z

I mean the built-in type bytearray. (It was added in 2.6.) I haven't checked that it can initialize from a buffer with suboffsets because I don't have a convenient way to make one of those, but it seems likely if memoryview can then bytearray can. And it definitely has contiguous storage internally and is a buffer provider.

mattip · 2019-04-30T23:27:33Z

Using bytearray seems to work. Perhaps better would be to use PyBuffer_ToContiguous rather than rolling our own verison.

>>> import _testbuffer, numpy
>>> nd = _testbuffer.ndarray(list(range(6)), shape=[2,3],
...                           format='i', flags=_testbuffer.ND_PIL)
>>> numpy.array(bytearray(nd)).view('int32')
array([0, 1, 2, 3, 4, 5], dtype=int32)

SemMulder · 2020-05-24T21:24:05Z

I'm interested in moving this forward :).

Perhaps better would be to use PyBuffer_ToContiguous rather than rolling our own verison.

This should make this PR quite doable I think.

To summarize my understanding of the way we could implement this: If a buffer with suboffsets is encountered during numpy.array (i.e. in _array_from_buffer_3118) we should allocate a contiguous buffer and use PyBuffer_ToContiguous to copy the data into this buffer.

For numpy.asarray I personally agree with what was said before (i.e. that we should do the same: make a copy).

Before I proceed, are there any objections to me opening a new PR?

charris · 2020-12-16T16:40:19Z

@abalkin @seberg Is this still of interest?

seberg · 2020-12-16T16:49:40Z

I think it is still of interest, but maybe we should close the PR anyway, unless we want to pick it up in the foreseeable future.

The use of PyBuffer_ToContiguous seems like a good improvement that we should do (and simplifies it a lot). There might be subtleties around the request of a "writeable" result which is currently not taken care of, but also not tricky at all.
(Mainly, we have a "request writeable" flag. I am not quite sure where it is used, but I believe the intention is that in-place modification is guaranteed to work, which would not be the case here. That should be very simple to do though)

EDIT: About my "copy-flag" comments above. Lets not worry about it. I think the only reasonable solution for that is to add the copy=NoCopy idea.

charris · 2020-12-16T16:55:38Z

We could open an issue. This PR really needs a rebase to continue.

abalkin · 2020-12-16T17:17:46Z

I'll see if I can rebase this with a small effort.

seberg · 2021-12-15T18:20:05Z

Considering that this is stalled so long and nobody ever really asked for it. Going to close this. However, we should make sure to error out gracefully for these buffers.

If anyone finds that this is important, please feel free to resurface it, but the PR here probably needs a pretty big rebase.

charris added 00 - Bug component: numpy._core labels Mar 27, 2016

BUG: Fixes issue numpy#5412.

247dc33

Added support for PEP 3118 PIL-style buffers in multiarray constructor.

abalkin force-pushed the issue-5412 branch from 94c30e1 to 247dc33 Compare March 27, 2016 20:23

abalkin reviewed Mar 27, 2016
View reviewed changes

FIX: PyArray_NewFromDescr steals descr, no need to set base since dat…

ec9e11a

…a is copied.

jaimefrio reviewed Mar 27, 2016
View reviewed changes

FIX: Initialize last_axis to make compiler happy.

76689cf

jaimefrio reviewed Mar 28, 2016
View reviewed changes

FIX: Added a comment explaining initialization for last_axis.

730aa71

charris changed the title ~~BUG: Fixes issue #5412.~~ ENH: Add support for PEP 3118 PIL-style buffers Apr 16, 2016

charris mentioned this pull request Jul 4, 2016

ENH: properly account for trailing padding in PEP3118 #7798

Closed

mattip changed the title ~~ENH: Add support for PEP 3118 PIL-style buffers~~ ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) Apr 30, 2019

HonzaCustomInk mentioned this pull request Sep 13, 2019

Pillow to NumPy without copying data python-pillow/Pillow#4064

Closed

Base automatically changed from master to main March 4, 2021 02:03

github-actions bot added the 01 - Enhancement label Mar 4, 2021

InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Dec 13, 2021

seberg closed this Dec 15, 2021

InessaPawson removed the triage review Issue/PR to be discussed at the next triage meeting label Dec 15, 2021

seberg mentioned this pull request Dec 15, 2021

BUG: Reject buffers with suboffsets #20592

Merged

ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) #7467

ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) #7467

Conversation

abalkin commented Mar 27, 2016 • edited by charris Loading

abalkin commented Mar 27, 2016

abalkin commented Mar 27, 2016

abalkin Mar 27, 2016

Choose a reason for hiding this comment

njsmith commented Mar 27, 2016

abalkin commented Mar 27, 2016

abalkin commented Mar 27, 2016

jaimefrio Mar 27, 2016

Choose a reason for hiding this comment

abalkin Mar 28, 2016

Choose a reason for hiding this comment

jaimefrio Mar 28, 2016

Choose a reason for hiding this comment

abalkin Mar 28, 2016

Choose a reason for hiding this comment

seberg commented Mar 28, 2016

abalkin commented Mar 28, 2016

seberg commented Mar 28, 2016

njsmith commented Mar 28, 2016

seberg commented Mar 28, 2016

njsmith commented Mar 28, 2016

abalkin commented Mar 28, 2016

charris commented Mar 29, 2016

abalkin commented Mar 29, 2016

abalkin commented Mar 29, 2016

charris commented Apr 1, 2016

charris commented Apr 1, 2016

seberg commented Apr 1, 2016

njsmith commented Apr 1, 2016

seberg commented Apr 1, 2016

abalkin commented Apr 4, 2016

abalkin commented Apr 4, 2016

abalkin commented Apr 4, 2016

abalkin commented Jul 27, 2017

njsmith commented Jul 27, 2017

abalkin commented Jul 31, 2017

njsmith commented Jul 31, 2017

mattip commented Apr 30, 2019

SemMulder commented May 24, 2020

charris commented Dec 16, 2020

seberg commented Dec 16, 2020 • edited Loading

charris commented Dec 16, 2020

abalkin commented Dec 16, 2020

seberg commented Dec 15, 2021

abalkin commented Mar 27, 2016 •

edited by charris

Loading

seberg commented Dec 16, 2020 •

edited

Loading