New functionality to go to and from numpy arrays. #316

erdmann · 2022-03-27T20:38:45Z

Ok, here's my stab at robust and flexible numpy interoperability.

This PR contains the following:

a definition of Image.fromarray that uses __array_interface__ or, if that's missing, __array__ to ingest numpy-like objects into pyvips. This includes some heuristics that try to set the interpretation to something sensible when the user passes 'auto' as the interpretation argument (the default; it can be easily set to any libvips interpretation by the user as desired.) This means that we can also ingest lots of foreign array-like objects (PIL images, e.g.).
a definition of Image.__array__ to enable pyvips.Image objects to be consumed by numpy. This enables explicit conversion like np.asarray(image) or np.array(image), but also implicit conversion by numpy functions, such as np.diff(image, axis=1). This also means that images can be plt.imshowed and that kind of stuff since functions in other scientific libraries often accept anything that can act like a numpy array.
I use pytorch a lot, and I've grown quite fond of the ability to do stuff like arr = mytensor.op1().op2().numpy(), where the method chain ends with an explicit conversion to a numpy array. I have been using this for a couple years on my private pyvips.Image subclass and found it to be really handy and compact, so I added Image.numpy() too as a convenience function.
A boatload of new tests for conversion to and from numpy (only if it's installed) as well as conversions to/from pytorch and from PIL (also only if they're installed). The tests illustrate and exercise all of the new capabilities.

Some points to note:

There is no way to convert from an 'int64' or 'uint64' numpy array because as far as I know that's not a libvips format. I think I asked this as a question on an issue a long time ago, but please correct me if I'm wrong.
I'm currently not defining conversions from numpy arrays of bools, because this has led to some bugs in my past use. Specifically, the issue is that libvips treats boolean operation results as uchar, and further treats False as 0 and True as 255. But Python treats True as 1 instead. This can cause confusion, so here I'm following the "explicit is better than implicit" philosophy by forcing the user to manually convert numpy bool arrays to 'uint8' prior to giving them to pyvips. I'm torn, though, because I could also see that less suffering would be caused by hard-coding that numpy bool to pyvips turns True to uchar(255). It's already the case that pyvips uchar images get converted to numpy bool arrays with no surprises when the user manually specifies the dtype on Image.__array__.
Single-band images map to 2D numpy arrays shape (H, W) rather than (H, W, 1) to save endless hours of irritation. I retain single-row or single-column axes so that we get (1,W,C) or (1,W) for multi- or single-band one-row images, respectively, or (H,1,C) or(H,1) for multi- or single-band one-column images, respectively. The only exception is a single-row, single-column, single-band image, which gets converted to a 0-dimensional numpy array. All the above ensures that round trips from pyvips to numpy and back will preserve width, height, and bands.
This doesn't make any attempt to implement the so-called buffer protocol, so we can't directly offer up our own __array_interface__ yet. A consequence is that PIL can't directly ingest pyvips Images without the user explicitly converting the image to a to numpy array first.
Related to this, it would be pretty sweet to somehow manage to share memory between a numpy array and a pyvips Image, like how pytorch can make a zero-copy tensor from a numpy array. Being able to Image.copy_memory() and then diddle with the memory directly from numpy or to have a numpy array that gets zero-copy viewed by pyvips as an Image for saving or other operations would be super powerful. Is there a chance that such things would be feasible in the future? (I realize there is complexity associated with avoiding double-freeing memory, etc., but note that __array_interface__ allows offering memory as read-only if needed.)

Sorry for the wall of text; adding this bit of functionality was more subtle than I expected.

jcupitt · 2022-03-27T21:56:17Z

Oh, great! I'll have a read.

Related to this, it would be pretty sweet to somehow manage to share memory between a numpy array and a pyvips Image,

pyvips does this already I think --- these are zero copy and use the buffer protocol:

# numpy array to vips image
def numpy2vips(a):
    height, width, bands = a.shape
    linear = a.reshape(width * height * bands)
    vi = pyvips.Image.new_from_memory(linear.data, width, height, bands,
                                      dtype_to_format[str(a.dtype)])
    return vi


# vips image to numpy array
def vips2numpy(vi):
    return np.ndarray(buffer=vi.write_to_memory(),
                      dtype=format_to_dtype[vi.format],
                      shape=[vi.height, vi.width, vi.bands])

erdmann · 2022-03-28T05:13:52Z

My initial investigation suggested that pyvips was making a copy rather than sharing, but I think that's because I just found a bug in which things get out of sync. Check out this, in which things go very pear-shaped:

In [1]: import pyvips, torch, numpy as np

In [2]: z = np.zeros((1, 1))

In [3]: t = torch.as_tensor(z)

In [4]: p = pyvips.Image.new_from_memory(z.data, 1, 1, 1, 'double')

In [5]: p(0, 0)  
Out[5]: [0.0]

In [6]: z[0, 0] = 1

In [7]: p(0, 0) # from here is looks like pyvips copies
Out[7]: [0.0]

In [8]: t[0, 0]
Out[8]: tensor(1., dtype=torch.float64)

In [9]: wat = p.crop(0,0,1,1)  # an identity crop

In [10]: wat(0, 0) # now it sees it!?
Out[10]: [1.0]

In [11]: p(0, 0) # getpoint can't see it
Out[11]: [0.0]

In [12]: p.avg() # but .avg does!
Out[12]: 1.0

In [13]: p.getpoint(0, 0) # still no
Out[13]: [0.0]

In [14]: wat.getpoint(0, 0) #  
Out[14]: [1.0]

In [15]: q = p.copy()

In [16]: q(0, 0)
Out[16]: [1.0]

In [17]: z[0, 0] = 2

In [18]: p.avg() 
Out[18]: 1.0

In [19]: q.avg()
Out[19]: 2.0

In [20]: q(0, 0)
Out[20]: [1.0]

In [21]: p.max()  # .max sees the latest
Out[21]: 2.0

In [22]: p.avg() # but .avg doesn't !?
Out[22]: 1.0

In [23]: p(0, 0) # so, getpoint, avg, and max have 3 different versions!
Out[23]: [0.0]

In [24]: p.width, p.height, p.bands # verify that it really is a single-pixel image.
Out[24]: (1, 1, 1)

This doesn't use any of my new PR, so I'm going to file this as a separate bug.

erdmann · 2022-03-28T05:28:52Z

Further experimentation showed that this strangeness was caused by operation caching, so that doing pyvips.cache_set_max(0) makes things make sense again. What's the recommended way to flush the cache without turning it off?

jcupitt · 2022-03-28T07:18:49Z

Ah right, yes, that's a mixture of pixel buffer caching and operation caching.

You can flush pixel caches with vips_image_invalidate_all():

https://www.libvips.org/API/current/VipsImage.html#vips-image-invalidate-all

You call that on an image, and all caches on that image and on any downstream images are dropped. It also knocks any operations which depend on that image (directly or indirectly) out of the operation cache.

It's not in pyvips (I've never needed it), so I suppose we should add image.invalidate().

erdmann · 2022-03-28T07:35:04Z

I chewed on the bool question a bit more, and I changed my mind. So, bool arrays are now accepted and converted so that they follow the pyvips convention that True becomes 255. This means that round trips from a boolean calculation in pyvips to an array in numpy and then back to pyvips should preserve the 255. I modified the tests to reflect this.

I also decided that by default the interpretation shouldn't be changed from whatever libvips would normally do on new_from_memory, so I've changed the default to None in which case interpretation isn't set at all. The interpretation argument to fromarray is still there as a convenience, and 'auto' is still an option to pick up the heuristics I've found the most useful in the past. I think the fact that libvips does different things depending on the interpretation is something that trips up new users, so I also like that the presence of this argument makes them aware of interpretation as something to pay attention to. The other option is just to omit an interpretation argument entirely.

erdmann · 2022-03-28T08:11:01Z

It's not in pyvips (I've never needed it), so I suppose we should add image.invalidate().

Having an .invalidate() method would be very helpful, yes!

jcupitt · 2022-03-28T10:47:22Z

OK, I'll make a PR for invalidate.

useful if you're modifying image memory arrays outside libvips, eg. with numpy see #316

jcupitt

Looks great! I left some small comments.

jcupitt · 2022-03-28T11:23:50Z

pyvips/vimage.py

+    elif format in ['complex', 'dpcomplex']:
+        interp = 'fourier'
+
+    return interp


libvips has something like this too:

https://www.libvips.org/API/current/libvips-header.html#vips-image-guess-interpretation

It looks at the current interpretation and tries to sanity check it. If it looks impossible, it attempts to guess a sane interpretation. If you set interpretation to "error", you should trigger this sanitiser (I think).

Maybe we can combine the two sets of rules? You can see the one libvips uses here:

https://github.com/libvips/libvips/blob/master/libvips/iofuncs/header.c#L555-L721

pyvips/vimage.py

* add invalidate() method useful if you're modifying image memory arrays outside libvips, eg. with numpy see #316 * add rob's invalidate test

…nto numpy_interface

erdmann · 2022-03-29T11:01:02Z

Ok, here's a summary of the latest changes:

moved fromarray into new_from_array, so new_from_array now accepts lists (for backward compatibility) and numpy-ish objects
new_from_array delegates to a new method new_from_list when given list inputs
renamed toarray() to tolist() as discussed and rewrote for efficiency and flexibility using the struct library, which is a in the Python standard library (so, we avoid any numpy dependency but still get speed and flexibility). It works for complex types now too.
Added some tests for tolist()
Changed the documentation style as suggested.

Additional notes:

I'm not sure what you have in mind when you suggested combining the vips interp-guesser with the one I coded here. I often (ab)use complex dtypes in pyvips as vectors because it enables atan2 functionality indirectly, so in that case I'm using complex values but don't want interpretation to be 'fourier'. Anyway, I'm happy to patch it up as you prefer.
I note that for complex types, tolist now works to make a list of complex values (since complex is a built-in Python type), but we can't round-trip because new_from_list doesn't accept complex entries. More broadly, the question is whether it might be good to find the minimum dtype compatible with the list of entries, similar to numpy (but not exactly), so that the resulting image we get when creating with a list of lists has exactly the minimum format that will hold it without loss of data. Thus, if everything fits in uchar, we give a uchar-format image. However, I don't know whether this conflicts with existing usage in the sense of using this functionality to make convolution kernels. Also, if we don't just assume float/double, what is the right thing to do if user specifies non-default values for scale and/or offset?

jcupitt · 2022-04-05T14:16:59Z

This all looks great Rob. Let's merge and do any polishing in further PRs once we've had time to kick the tyres a bit.

Nice job!

and fix some small issues with the numpy integration PR see #316

jcupitt · 2022-04-05T15:20:43Z

I coaxed it through flake8, and it found a couple of small issues which I think I've fixed too.

This patch pushes the new pyvips interpretation guess rules down into libvips. See libvips/pyvips#316

jcupitt · 2022-04-05T17:36:20Z

I've made the libvips interpretation guesser match your improved behaviour: libvips/libvips@57cd942 -- we should probably have the same behaviour everywhere, rather than just in pyvips.

So for libvips >= 8.13 we could call into libvips for _guess_interpretation, and for older libvipses we should stick with your rules. I'll make a PR.

jcupitt · 2022-04-05T17:41:42Z

Slight tangent: I noticed the numpy and torch tests fail unless you have pretty recent versions. The ones that ship with ubuntu 21.10, for example, are too old.

Perhaps either test_deps should have the required numpy / pytorch versions, or the tests should be more forgiving.

erdmann · 2022-04-05T17:42:01Z

My implementation didn't do anything to ensure against stuff like specifying 'rgb' with single-band images while yours does, so as you mentioned mine could use another round of improvements too. I think at the moment the default behavior, in which None is the interpretation, just falls back to the libvips behavior.

erdmann · 2022-04-05T17:45:13Z

Slight tangent: I noticed the numpy and torch tests fail unless you have pretty recent versions. The ones that ship with ubuntu 21.10, for example, are too old.

Ah, interesting! I confess I only tested it with my setup. Is the test at fault or the code it's testing? I'll rework if you can let me know how it's failing.

jcupitt · 2022-04-05T17:59:21Z

Sorry, brainfart, it was PIL and torch that failed.

The torch test fails on torch.asarray (that's new for torch 1.10 I think), and PIL fails in new_from_array on strides = a['strides'](stride is new for v9, I think).

jcupitt · 2022-04-05T18:01:42Z

I like the way the libvips guesser leaves interpretation alone unless it's obviously crazy.

Perhaps we could add _interpretation_is_sane() to pyvips. Your auto value could be swapped for VIPS_INTERPRETATION_ERROR, or "error", since an error value is always wrong.

jcupitt · 2022-04-06T11:09:02Z

I noticed the docs need revising too, we have this section:

https://github.com/libvips/pyvips/blob/master/doc/intro.rst#numpy-and-pil

Which obviously needs reworking. Would you like to have a stab, or shall I? There's a PIL/numpy example in pyvips/examples too which should be redone.

erdmann · 2022-04-06T14:03:30Z

I'll have a go.

New functionality to go to and from numpy arrays.

49d24f8

Accept bool arrays; change default interpretation.

39c80e0

jcupitt added a commit that referenced this pull request Mar 28, 2022

add invalidate() method

248553b

useful if you're modifying image memory arrays outside libvips, eg. with numpy see #316

jcupitt mentioned this pull request Mar 28, 2022

add invalidate() method #317

Merged

jcupitt reviewed Mar 28, 2022

View reviewed changes

jcupitt added a commit that referenced this pull request Mar 28, 2022

add invalidate() method (#317)

c048aaa

* add invalidate() method useful if you're modifying image memory arrays outside libvips, eg. with numpy see #316 * add rob's invalidate test

jcupitt and others added 5 commits March 29, 2022 10:48

Merge branch 'master' into numpy_interface

569f78a

New functionality to go to and from numpy arrays.

9c7084a

Accept bool arrays; change default interpretation.

745ede4

Join fromarray & new_from_array; add tolist; tests

1ebf4df

Merge branch 'numpy_interface' of https://github.com/erdmann/pyvips i…

2a10079

…nto numpy_interface

erdmann added 2 commits March 29, 2022 13:15

minor formatting

28f7727

Fixed up botched merge; fixed docs, formatting

347549c

jcupitt merged commit 0409cce into libvips:master Apr 5, 2022

erdmann deleted the numpy_interface branch April 5, 2022 15:02

erdmann mentioned this pull request Apr 5, 2022

getattr(pyvips.Image, anything) returns True when it should raise AttributeError #314

Closed

jcupitt added a commit that referenced this pull request Apr 5, 2022

get pyvips through flake8 again

326a798

and fix some small issues with the numpy integration PR see #316

jcupitt added a commit to libvips/libvips that referenced this pull request Apr 5, 2022

revise interpretation guesser

57cd942

This patch pushes the new pyvips interpretation guess rules down into libvips. See libvips/pyvips#316

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New functionality to go to and from numpy arrays. #316

New functionality to go to and from numpy arrays. #316

erdmann commented Mar 27, 2022

jcupitt commented Mar 27, 2022 •

edited

Loading

erdmann commented Mar 28, 2022

erdmann commented Mar 28, 2022

jcupitt commented Mar 28, 2022

erdmann commented Mar 28, 2022

erdmann commented Mar 28, 2022

jcupitt commented Mar 28, 2022

jcupitt left a comment

jcupitt Mar 28, 2022

erdmann commented Mar 29, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

erdmann commented Apr 5, 2022

erdmann commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 6, 2022

erdmann commented Apr 6, 2022

New functionality to go to and from numpy arrays. #316

New functionality to go to and from numpy arrays. #316

Conversation

erdmann commented Mar 27, 2022

jcupitt commented Mar 27, 2022 • edited Loading

erdmann commented Mar 28, 2022

erdmann commented Mar 28, 2022

jcupitt commented Mar 28, 2022

erdmann commented Mar 28, 2022

erdmann commented Mar 28, 2022

jcupitt commented Mar 28, 2022

jcupitt left a comment

Choose a reason for hiding this comment

jcupitt Mar 28, 2022

Choose a reason for hiding this comment

erdmann commented Mar 29, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

erdmann commented Apr 5, 2022

erdmann commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 5, 2022

jcupitt commented Apr 6, 2022

erdmann commented Apr 6, 2022

jcupitt commented Mar 27, 2022 •

edited

Loading