Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support duck arrays in numba gufuncs #2979

Open
mrocklin opened this issue May 18, 2018 · 7 comments
Open

Support duck arrays in numba gufuncs #2979

mrocklin opened this issue May 18, 2018 · 7 comments

Comments

@mrocklin
Copy link

There might be some room for improvement in handling numpy-like arrays in numba-generated gufuncs

I was quite happy to see this work well:

In [1]: from numba import vectorize, float64
   ...: 
   ...: @vectorize([float64(float64, float64)])
   ...: def f(x, y):
   ...:     return x + y
   ...: 

In [2]: import dask.array as da

In [3]: x = da.arange(10, chunks=(5,))

In [4]: f(x, x)  # hooray!
Out[4]: dask.array<f, shape=(10,), dtype=float64, chunksize=(5,)>

Unfortunately this didn't fare as well. Am I using vectorize incorrectly or is this a feature request?

In [5]: @vectorize
   ...: def f(x, y):
   ...:     return x + y
   ...: 

In [6]: f(x, x)  # oh no!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-4438cca8adef> in <module>()
----> 1 f(x, x)  # oh no!

~/Software/anaconda/lib/python3.6/site-packages/numba/npyufunc/dufunc.py in _compile_for_args(self, *args, **kws)
    162                 argtys.append(numpy_support.map_arrayscalar_type(arg))
    163             else:
--> 164                 argty = typeof(arg)
    165                 if isinstance(argty, types.Array):
    166                     argty = argty.dtype

~/Software/anaconda/lib/python3.6/site-packages/numba/typing/typeof.py in typeof(val, purpose)
     29     if ty is None:
     30         msg = "cannot determine Numba type of %r" % (type(val),)
---> 31         raise ValueError(msg)
     32     return ty
     33 

ValueError: cannot determine Numba type of <class 'dask.array.core.Array'>
@seibert
Copy link
Contributor

seibert commented May 18, 2018

It is a feature request, or rather a side effect of our current vectorize implementation that we need to fix.

In the former case, you are making an actual NumPy ufunc object with kernels compiled at declaration time. NumPy's dispatcher is used, and it is presumably making the call to __array__ before handing the data off to the Numba-generated ufunc kernel.

In the latter case, we are using a special Numba-internal ufunc dispatcher which detects the incoming types and generates a new kernel based on the types it detects. Numba currently isn't looking for the array attribute in the typeof() function, but it could. I don't think this would be too tricky to do.

@mrocklin
Copy link
Author

and it is presumably making the call to array before handing the data off to the Numba-generated ufunc kernel

So, I suspect that it is more likely calling our __array_ufunc__ method, which is performing automatic parallelism across outer dimensions. This numba gufunc is working directly on the dask array in a scalable manner.

@mrocklin
Copy link
Author

Just to put a bit of motivation behind this feature request, I'd like to write a blogpost that shows that numba and dask.array are able to inter-operate over the gufunc protocol without explicit coordination.

sklam added a commit to sklam/numba that referenced this issue Jun 1, 2018
@sklam
Copy link
Member

sklam commented Jun 1, 2018

See branch https://github.com/sklam/numba/tree/misc/iss2979 for minimal changes to make the dufunc accept __array_ufunc__ protocol.

@sklam
Copy link
Member

sklam commented Jun 1, 2018

So the problem is that, dufunc needs to check for the __array_ufunc__ protocol when it fails to dispatch&compile.

And to clarify, gufunc just works. This issue is just about the dufunc object.

@mrocklin
Copy link
Author

mrocklin commented Jun 4, 2018

I've added a small comment to the commit at sklam@34f9d48

@sklam what additional work needs to be one here? Presumably there are tests and documentation to write, etc.. If someone wanted to take on this work what should they do?

@sklam
Copy link
Member

sklam commented Jun 4, 2018

@mrocklin, yes, your comment in sklam@34f9d48 is exactly right. I have only done the minimal to see how to make it work and to see what code needed to be changed. If someone wanted to take on the work, they would need to:

  • implement the ufunc protocol properly as per your comment. they can continue from where I made the changes.
  • add test (presumably w/o using dask.array; using some made up type with __array_ufunc__).
  • and docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants