Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) #7467

Closed
wants to merge 4 commits into from

Conversation

abalkin
Copy link
Contributor

@abalkin abalkin commented Mar 27, 2016

Added support for PEP 3118 PIL-style buffers in multiarray
constructor.

Closes #5412.

@abalkin
Copy link
Contributor Author

abalkin commented Mar 27, 2016

The ndim > 2 case does not quite work yet and I need to figure out where to get the test cases for it.

Added support for PEP 3118 PIL-style buffers in multiarray
constructor.
@abalkin
Copy link
Contributor Author

abalkin commented Mar 27, 2016

I fixed the ndim > 2 case, added a test and squashed the commits. This is ready for review. Note that an untested case is with multiple nonnegative suboffsets.

goto fail;
if (PyArray_SetBaseObject((PyArrayObject *)r, memoryview) < 0) {
Py_DECREF(r);
goto fail;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_NewFromDescr steals a reference to descr, so the goto's above should not got to

 fail:
     Py_XDECREF(descr);          Py_XDECREF(descr);

@njsmith
Copy link
Member

njsmith commented Mar 27, 2016

Can you add some text summarizing what "PEP 3118 PIL-style buffers" are and why they need special handling? (Is this python imaging library PIL or something else?) I'm sure I could figure it out with some research but we reviewers are lazy folk ;-)

@abalkin
Copy link
Contributor Author

abalkin commented Mar 27, 2016

"PEP 3118 PIL-style buffers" are an integral part of PEP 3118. Here is the description from the PEP:

"""
The PIL uses a more opaque memory representation. Sometimes an image is contained in a contiguous segment of memory, but sometimes it is contained in an array of pointers to the contiguous segments (usually lines) of the image. The PIL is where the idea of multiple buffer segments in the original buffer interface came from. ... The PIL memory model is sometimes used in C-code where a 2-d array can then be accessed using double pointer indirection: e.g. image[i][j] .
"""

Where would you like me to add this text (other than here)?

@abalkin
Copy link
Contributor Author

abalkin commented Mar 27, 2016

BTW, I cannot figure out what Travis CI is unhappy about.

}
/* compute blocksize */
blocksize = itemsize;
for (ax = last_axis + 1; ax < ndim; ++ax)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all suboffsets[ax] < 0 then last_axis never gets initialized when we get here. Even if that is not really possible (is it not?) the compiler does not seem able to figure it out, and that's what's making Travis unhappy. I guess the right value to initialize it at the beginning of the function is -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suboffsets cannot be all negative because this would mean no dereferencing and the PEP requires that suboffsets pointer be NULL in this case. If not for that requirement, the right initial value would indeed be -1, but I don't want to introduce confusion between real -1 and -1 that means ndim - 1, so I just initialize last_axis to 0.

_copy_from_pil_style(char* dest, int ndim, char *buf, npy_intp *shape,
npy_intp *strides, npy_intp *suboffsets, npy_intp itemsize)
{
int ax, last_axis = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to add a comment like /* setting last_axis to anything to silence compiler warnings */ so that no one ends up spending long hours trying to figure out why 0 and not the seemingly more reasonable -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. Please note that I plan to squash the commits that start with "FIX:" after the review.

@seberg
Copy link
Member

seberg commented Mar 28, 2016

These types of buffers will be copied, while typically we use a view into the buffer as far as I understand. Just for consideration, I am not quite sure yet whether I think there is anything to worry about at all.

@abalkin
Copy link
Contributor Author

abalkin commented Mar 28, 2016

@seberg - yes, I thought about this. Maybe we should only allow np.array(pil_buffer) but not np.asarray(pil_buffer). (Require ENSURECOPY flag?)

While it is not possible to create an array view of a PIL buffer, it is possible to create an array of views into its components in many cases, but I don't think this is what the default constructor should do.

@seberg
Copy link
Member

seberg commented Mar 28, 2016

Yeah, that kind of special array of arrays should not happen, it would be a different thing/function.
Require the copy flag seems a nice idea, if just to be on the safe side for the moment and not lock us in.
In principle a copy is fine in asarray, it just bothers me a bit, if users might start expecting one and suddenly get the other, i.e. in case PIL or someone switches between the two based on the image format.

@njsmith
Copy link
Member

njsmith commented Mar 28, 2016

It's already that case that asarray's semantics are "maybe copy, maybe not, depending on exactly what the input object is". I'm not too worried about what happens if PIL or someone changes format -- the exact same thing happens if they switch to implementing __array__ or whatever.

@seberg
Copy link
Member

seberg commented Mar 28, 2016

It is not that they change, but that the same type might sometimes give a copy and sometimes a view. Of course anyone could make such an object easily already. But yes, I am not bought that we should take the safe but maybe uncomfortable side here. Just unsure.

@njsmith
Copy link
Member

njsmith commented Mar 28, 2016

I'm not 100% sure myself, but I feel like most code that calls asarray would be very annoyed if it suddenly had to jump through hoops like

if isinstance(a, PIL.Image):
    a = np.array(a)
else:
    a = np.asarray(a)

And in general code that calls asarray already has to work correctly regardless of whether of it gets back a view.

@abalkin
Copy link
Contributor Author

abalkin commented Mar 28, 2016

I tend to agree with @njsmith, having np.asarray fail where np.array succeeds will be pretty annoying.

BTW, does anyone know whether PIL.Image actually supports PEP 3118 buffer protocol? My personal needs are with PyQ, a package which provides Python embedded in kdb+.

@charris
Copy link
Member

charris commented Mar 29, 2016

What of pillow?

@abalkin
Copy link
Contributor Author

abalkin commented Mar 29, 2016

@charris - I've installed pillow and apparently it does not support PEP 3118.

>>> from PIL import Image
>>> im = Image.open("lena.ppm")
>>> memoryview(im)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: memoryview: a bytes-like object is required, not 'PpmImageFile'

I'll investigate how it manages to be compatible with numpy. It probably supports one of the older protocols:

>>> numpy.asarray(im)
array([[[226, 137, 125],
        [226, 137, 125],
        [223, 137, 133],
        ...

@abalkin
Copy link
Contributor Author

abalkin commented Mar 29, 2016

It looks like pillow supports array interface version 3 (whatever it is) but not PEP 3118.

>>> im.__array_interface__['version']
3

Isn't it ironic given that PEP cites PIL as an application?

@charris
Copy link
Member

charris commented Apr 1, 2016

Isn't it ironic given that PEP cites PIL as an application?

The PEP came too late, PIL died around 2009 ;) Hence Pillow. The two are incompatible, so remove PIL before installing Pillow. You might open an issue on the Pillow site, I really don't think numpy should special case it, rather the other way around. Note that scikit-image supports PIL images among other things. I'm not clear on exactly what functionality you need, but perhaps that will help.

@charris
Copy link
Member

charris commented Apr 1, 2016

I'm inclined to close this unless someone wants to keep it open.

@seberg
Copy link
Member

seberg commented Apr 1, 2016

Well, PIL style buffers can be used by other people as well, such as possibly DyND. Since as far as I understand all that "PIL style" means is the use of suboffsets, so pointers to chunks.
If there is no actual use case, I am fine with waiting for someone who cares to come around. I suspect that this might be fixing a bug with what currently would happen? In that case I would be glad if that part (which would just create an error) would not get lost.

@njsmith
Copy link
Member

njsmith commented Apr 1, 2016

The original poster's use case is PyQ upthread. Given that there are users and it's in a standard (PEP 3118), supporting it seems reasonable, even if the name is out of date. Though I guess one might ask if pyq would be better off using a part of PEP 3118 that other people actually picked up on :-)

Is there an easy way to delegate the unpacking to something like memoryview or bytes? I assume something inside python knows about this buffer format if they ship a module for testing it...?

@seberg
Copy link
Member

seberg commented Apr 1, 2016

Maybe stupid question, but by any chance, PyQ does not implement their GetBuffer in such a way that using PyObject_GetBuffer with the STRIDED flag and only then calling PyMemoryView_FromBuffer instead of calling PyMemoryView_FromObject might already help? Of course it seems unlikely that someone who supports suboffsets would do the copying to strided for us when a strided buffer is requested, instead of just erroring out.
Anyway, wanted to note it, since I think that would be the correct fix if we do not want to do the automatic copy. Of course we could also implement an memoryview.as_strided_buffer() or similar functions and push this problem upstream. But that would mean users would have to go through memoryview to get their numpy array.

@abalkin
Copy link
Contributor Author

abalkin commented Apr 4, 2016

I'm inclined to close this unless someone wants to keep it open.

@charris - what is your argument for not supporting PEP 3118 in numpy? The behavior shown in #5412 is a bug that potentially can lead to silent data corruption.

@abalkin
Copy link
Contributor Author

abalkin commented Apr 4, 2016

@seberg - PyQ is Python running inside kdb+ and uses native objects as data buffers. In kdb+, vector objects are extremely simple: an 8-byte header (with reference count, type code etc.), followed by a 64-bit integer length, followed by a contiguous array of elements.

There is no support for arrays with ndim>1, but you can have vectors of vectors where each element is a pointer to a similar structure. These vectors of vectors are flexible enough to emulate arrays with an arbitrary ndim (and more).

@abalkin
Copy link
Contributor Author

abalkin commented Apr 4, 2016

one might ask if pyq would be better off using a part of PEP 3118 that other people actually picked up on

@njsmith - it does and it works fine for ndim <= 1 (scalars and 1d vectors). However, the layout of matrices and higher rank tensors is an array of pointers to arrays (of pointers ...). It would be quite awkward to have memoryview(x) to have all x data copied.

@charris charris changed the title BUG: Fixes issue #5412. ENH: Add support for PEP 3118 PIL-style buffers Apr 16, 2016
@abalkin
Copy link
Contributor Author

abalkin commented Jul 27, 2017

Ping. Is there anything I can do to move this forward?

@njsmith
Copy link
Member

njsmith commented Jul 27, 2017

This adds quite a bit of non-trivial C code. What do you think of an alternative implementation strategy: if the buffer has suboffsets, create a bytearray from it (letting the bytearray code worry about dealing with the complicated bits), and then use the bytearray as our backing buffer for the array?

@abalkin
Copy link
Contributor Author

abalkin commented Jul 31, 2017

@njsmith - what do you mean by bytearray? Whatever it is, is it currently capable of initializing itself from a buffer with suboffsets?

@njsmith
Copy link
Member

njsmith commented Jul 31, 2017

I mean the built-in type bytearray. (It was added in 2.6.) I haven't checked that it can initialize from a buffer with suboffsets because I don't have a convenient way to make one of those, but it seems likely if memoryview can then bytearray can. And it definitely has contiguous storage internally and is a buffer provider.

@mattip
Copy link
Member

mattip commented Apr 30, 2019

Using bytearray seems to work. Perhaps better would be to use PyBuffer_ToContiguous rather than rolling our own verison.

>>> import _testbuffer, numpy
>>> nd = _testbuffer.ndarray(list(range(6)), shape=[2,3],
...                           format='i', flags=_testbuffer.ND_PIL)
>>> numpy.array(bytearray(nd)).view('int32')
array([0, 1, 2, 3, 4, 5], dtype=int32)

@mattip mattip changed the title ENH: Add support for PEP 3118 PIL-style buffers ENH: Add support for PEP 3118 buffers with offsets (non-contiguous) Apr 30, 2019
@SemMulder
Copy link

I'm interested in moving this forward :).

Perhaps better would be to use PyBuffer_ToContiguous rather than rolling our own verison.

This should make this PR quite doable I think.

To summarize my understanding of the way we could implement this: If a buffer with suboffsets is encountered during numpy.array (i.e. in _array_from_buffer_3118) we should allocate a contiguous buffer and use PyBuffer_ToContiguous to copy the data into this buffer.

For numpy.asarray I personally agree with what was said before (i.e. that we should do the same: make a copy).

Before I proceed, are there any objections to me opening a new PR?

@charris
Copy link
Member

charris commented Dec 16, 2020

@abalkin @seberg Is this still of interest?

@seberg
Copy link
Member

seberg commented Dec 16, 2020

I think it is still of interest, but maybe we should close the PR anyway, unless we want to pick it up in the foreseeable future.

The use of PyBuffer_ToContiguous seems like a good improvement that we should do (and simplifies it a lot). There might be subtleties around the request of a "writeable" result which is currently not taken care of, but also not tricky at all.
(Mainly, we have a "request writeable" flag. I am not quite sure where it is used, but I believe the intention is that in-place modification is guaranteed to work, which would not be the case here. That should be very simple to do though)

EDIT: About my "copy-flag" comments above. Lets not worry about it. I think the only reasonable solution for that is to add the copy=NoCopy idea.

@charris
Copy link
Member

charris commented Dec 16, 2020

We could open an issue. This PR really needs a rebase to continue.

@abalkin
Copy link
Contributor Author

abalkin commented Dec 16, 2020

I'll see if I can rebase this with a small effort.

Base automatically changed from master to main March 4, 2021 02:03
@InessaPawson InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Dec 13, 2021
@seberg
Copy link
Member

seberg commented Dec 15, 2021

Considering that this is stalled so long and nobody ever really asked for it. Going to close this. However, we should make sure to error out gracefully for these buffers.

If anyone finds that this is important, please feel free to resurface it, but the PR here probably needs a pretty big rebase.

@seberg seberg closed this Dec 15, 2021
@InessaPawson InessaPawson removed the triage review Issue/PR to be discussed at the next triage meeting label Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: NumPy could accept PIL-style buffers (if it copies the data!)
8 participants