Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part #10526

magonser · 2018-02-05T19:00:36Z

This issue was suggested in dask/dask#3109

My current understanding is that numpy.vectorize provides a way to

provide a Python function,
assign it a signature with information about core dimensions,
bind it input data, where two things happen
- call __array__ufunc__, if present, and/or
- broadcast loop dimensions (according to signature) of input arrays against each other.
Iterative calls of the Python function over all loop dimension entries

I would like to suggest to implement an additional wrapper, just like numpy.vectorize, which does all the steps above, except step 4). I.e. a Python function could be wrapped as gufunc and given a signature, and when binding input the same Step 3) is applied.

The benefit is, that same data binding methodology and interface is used, if the user already provides a vectorized implementation of Python function. It becomes especially important for interoperability with other libraries, e.g. dask.

The text was updated successfully, but these errors were encountered:

hpaulj · 2018-02-12T06:46:29Z

np.vectorize is Python code, so you should be able to customize your own version. A working example would be easier to understand.

np.vectorize creates a callable object. The init setup doesn't do much; most of the work is done when called with the arguments. That includes broadcasting (based on the signature and array dimensions), the ndindex iteration, and massaging the results to fit the signature. I don't see how you can separate out step 4.

In step 3, I don't see any use or reference to __array__ufunc__.

shoyer · 2018-02-27T19:43:06Z

It seems like there are at least two options here:

A standard way to write "duck ufuncs" that aren't actual ufuncs, but which could safely make use of __array_ufunc__. This is a little tricky, because not every part of the ufunc API would necessarily make sense for duck ufuncs.
An interface for turning "duck ufuncs" into actual ufuncs.

I think option (2) would be preferred, but it's not clear to me that it's possible to do with the current ufunc API (admittedly, I don't understand it well).

mrocklin · 2018-07-08T20:06:52Z

As a concrete example consider Scikit-Learn's Estimator.predict method. This typically takes an n x m shaped array and produces an n shaped array, but it generally just broadcasts along the first dimension so it's signature is probably something like (m)->()

It would be useful for Scikit-Learn to have some mechanism to be able to say "this function can be broadcast in the following way" and defer to objects that can do that sort of broadcasting (using the __array_ufunc__ protocol) when they are provided as inputs.

Concretely, it would be nice for downstream projects to be able to say the following:

class Estimator:
    @numpy.broadcastable('(m)->()')
    def predict(self, X):
        ...
        return y

Ideally then if a user provides something like a dask array

estimator = Estimator(...)
estimator.fit(...)

estimator.predict(my_dask_array)

Then ideally the decorated predict method would go through the normal checks for __array_ufunc__ and give control over to the dask array object.

mattip · 2018-07-08T22:43:20Z

See a possible implementation in PR #11061, which was rejected for matmul

eric-wieser · 2018-07-10T09:03:22Z

but it generally just broadcasts along the first dimension

A signature of (m)->() broadcasts along all but the last dimension - in einsum notation, it's ...j->.... It sounds like predict doesn't actually broadcast, and is either ij->i or j-> but no higher dimensional versions.

mrocklin · 2018-07-10T11:22:01Z

That's correct that it doesn't currently broadcast, but arguably it should. This is another case where it'd be nice to say "This thing is broadcastable in the following way, please apply gufunc semantics to it."

…

On Tue, Jul 10, 2018 at 5:03 AM Eric Wieser ***@***.***> wrote: but it generally just broadcasts along the first dimension A signature of (m)->() broadcasts along all but the last dimension - in einsum notation, it's ...j->.... It sounds like predict doesn't actually broadcast, and is either ij->i or j-> but no higher dimensional versions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10526 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszER3rXPEd_l0o-mkVNFtGQmUb9rGks5uFG3kgaJpZM4R573a> .

mhvk · 2018-07-10T16:59:08Z

We do have np.frompyfunc, which creates a proper ufunc from a python function (with object dtype), so I guess ideally one would have the equivalent that created a gufunc. But this is a bit tricky to implement, since it would need the ndarray implementation to actually extract sub-arrays on each iteration instead of just passing on pointers and strides.

mattip · 2018-07-10T23:02:24Z

If we design such a wrapper it should handle these better than the current gufunc mechanism:

memory overlap requirements and temporary output buffer allocation
specifying requirements for contiguous memory layout

This was referenced Feb 5, 2018

Expand definition of generalized ufunc signature for corner cases #10527

Open

Feature/dask.array.apply_gufunc (Issue #702) dask/dask#3109

Merged

mrocklin mentioned this issue Jul 8, 2018

Treat Predict as a Numpy Generalized Ufunc scikit-learn/scikit-learn#11456

Open

mattip added 01 - Enhancement 15 - Discussion component: numpy.ufunc labels Aug 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part #10526

Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part #10526

magonser commented Feb 5, 2018

hpaulj commented Feb 12, 2018

shoyer commented Feb 27, 2018

mrocklin commented Jul 8, 2018

mattip commented Jul 8, 2018

eric-wieser commented Jul 10, 2018

mrocklin commented Jul 10, 2018 via email

mhvk commented Jul 10, 2018

mattip commented Jul 10, 2018

Implement gufunc(pyfunc, ...) similar to numpy.vectorize but without the vectorize part #10526

Implement gufunc(pyfunc, ...) similar to numpy.vectorize but without the vectorize part #10526

Comments

magonser commented Feb 5, 2018

hpaulj commented Feb 12, 2018

shoyer commented Feb 27, 2018

mrocklin commented Jul 8, 2018

mattip commented Jul 8, 2018

eric-wieser commented Jul 10, 2018

mrocklin commented Jul 10, 2018 via email

mhvk commented Jul 10, 2018

mattip commented Jul 10, 2018

Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part #10526

Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part #10526