Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP: Array function protocol #11189

Merged
merged 10 commits into from Jun 1, 2018

Conversation

Projects
None yet
7 participants
@mrocklin
Copy link
Contributor

commented May 29, 2018

Written with Stephan Hoyer
after discussion with Stefan Van der Walt, Charles Harris, Matti Picus, Jaime Frio and Nathaniel Smith

Add first draft of NEP 0016: array function protocol
Written with Stephan Hoyer after discussion with Stefan Van der Walt,
Charles Harris, Matti Picus, and Jaime Frio

@mrocklin mrocklin changed the title Add first draft of NEP 0016: array function protocol NEP: Array function protocol May 29, 2018

@shoyer
Copy link
Member

left a comment

Can we save NEP-16 for Nathaniel's An abstract base class for identifying "duck arrays" in #10706 ? That would make this NEP-17.

...
}
*Note from MR: would it make sense instead for them to return the

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

delete? :)

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 1, 2018

Author Contributor

Done

NEP: Dispatch Mechanism for NumPy's high level API
==================================================

:Author: Authors: Stephan Hoyer <shoyer@google.com> and Matthew Rocklin <mrocklin@gmail.com>

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

I think the standard is one line/:Author: entry per author

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 1, 2018

Author Contributor

Done

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 1, 2018

Member

I pushed a few commits to your branch already, you probably want to pull them before making further edits :)

@mrocklin

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2018

success, value = do_array_function_dance(
func=broadcast_to,
relevant_arguments=[array],
array, shape, subok=subok) # *args, **kwargs

This comment has been minimized.

Copy link
@mattip

mattip May 29, 2018

Member

This does not match the function signature (self, func, overload_types, *args, **kwargs) and mixes positional and keyword arguments. In particular, I don't understand what is input as the overload_types positional argument.


As mentioned above, if this means that some functions that we overload
with ``__array_function__`` should switch to a new protocol instead,
that is explicitly OK for as long as ``__array_ufunc__`` retains its

This comment has been minimized.

Copy link
@mattip

mattip May 29, 2018

Member

array_ufunc__ is experimental?

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi May 29, 2018

Contributor

Yes. In particular, it might support gufuncs as well as ufuncs in the future.

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

I think this is probably a typo and should be __array_function__ instead


.. code-block:: python
def __array_function__(self, func, types, *args, **kwargs)

This comment has been minimized.

Copy link
@eric-wieser

eric-wieser May 29, 2018

Member

What if {'func', 'types'} & kwargs.keys() is non-empty?

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

Agreed, I think we need to switch this to not use * and **` unpacking.

@mattip

This comment has been minimized.

Copy link
Member

commented May 29, 2018

This should be merged as nep-0018 once the initial comments are handled. Deeper discussion, as I understand the process, should take place on the mailing list

- The ``.T`` method works using Python's method dispatch
- The ``np.sum`` function explicitly checks for a ``.sum`` method on
the argument

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi May 29, 2018

Contributor

Might be worth noting here that

  • np.sum falls back to __array_ufunc__ with method='reduce' if the object does not have a sum method.
  • ufunc.reduce uses __array_ufunc__ with method='reduce' by default.

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

np.mean() might be a better example, I'm pretty sure it doesn't do the __array_ufunc__ fallback.


- NumPy will gather implementations of ``__array_functions__`` from all
specified inputs and call them in order: subclasses before
superclasses, and otherwise left to right.

This comment has been minimized.

Copy link
@eric-wieser

eric-wieser May 29, 2018

Member

Can we clarify the wording here to make it clear what func(base, derived) does, given base.__array_function__ is derived.__array_function__? (There was a cpython issue about this too, which I think @shoyer found)

This comment has been minimized.

Copy link
@shoyer
`NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`__).
In particular:

- NumPy will gather implementations of ``__array_functions__`` from all

This comment has been minimized.

Copy link
@eric-wieser

eric-wieser May 29, 2018

Member

Is the function name really sometimes plural?

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

No, this is a mistake

.. code:: python
def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(

This comment has been minimized.

Copy link
@eric-wieser

eric-wieser May 29, 2018

Member

Would it be better to handle this as a decorator? Perhaps we could use

ba = inspect.signature(func).bind(*args, **kwargs)
ba.apply_defaults()
array_function = ...
array_function(func, *ba.args, **bar.kwargs)

This comment has been minimized.

Copy link
@shoyer

shoyer May 29, 2018

Member

Yes, if we only supported Python 3. But in Python 2, we would lose introspection of function arguments.

Nonetheless this should indeed be mentioned.

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

This example really does not look nice!! If this cannot be done more elegantly in python2, then I think we should aim for python3 only - this process is unlikely to be that fast and even if it were, skipping one release would be well worth having a much simpler interface. In particular, in python3 one could use a decorator that looks at annotations (which we'd like anyway!) - much more elegant.

Various alternatives to this proposal were discussed in a few Github issues:

1. `pydata/sparse #1 <https://github.com/pydata/sparse/issues/1>`_
2. `numpy/numpy #11128 <https://github.com/numpy/numpy/issues/11129>`_

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi May 30, 2018

Contributor

Typo? Text says 11128 but issue is 11129.

@@ -213,7 +215,8 @@ might be affected by this change:
success, value = do_array_function_dance(
func=broadcast_to,
relevant_arguments=[array],
array, shape, subok=subok) # *args, **kwargs
args=(array,),
kwargs=dict(shape=kwargs, subok=subok))

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi May 30, 2018

Contributor

Typo? Maybe should be shape=shape

func=broadcast_to,
relevant_arguments=[array],
args=(array,),
kwargs=dict(shape=kwargs, subok=subok))

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi May 30, 2018

Contributor

Still a typo. shape=shape?

@mhvk
Copy link
Contributor

left a comment

The nature of types is really rather unclear. Should it perhaps be a list of types the ndarray implementation does not know what to do with? (This would include ndarray subclasses if the current function would do asarray)? More comments on mailing list.

2. Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return
implementation for ``func(*args, **kwargs)``. Otherwise, it should

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

Given the example below I think you mean "the result from calling its implementation of ...".

methods appropriately until one succeeds.

This is one additional function of moderate complexity.
2. Calling this function within all relevant Numpy functions.

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

insert empty line before this one.

type of ``func``, or about which of ``args`` and ``kwargs`` may contain
an objects implementing the array API. As a convenience for
``__array_function__`` implementors of the NumPy API, ``types`` contains
a list of all types that NumPy checks for potential overloads.

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

It is unclear what this means. A list of types of all arguments checked? A list of types of arguments that defined `__array_function``? More in the general comments.

``__array_function__`` attribute on those inputs, and call those
methods appropriately until one succeeds.

This is one additional function of moderate complexity.

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

Maybe comment here that speed will be of the essence.

This comment has been minimized.

Copy link
@mhvk

mhvk May 31, 2018

Contributor

In particular, the speed for regular arrays should not be impacted much even for small arrays.


Still be determined: what guarantees can we offer for ``types``? Should
we promise that types are unique, and appear in the order in which they
are checked?

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

Another question is whether it should even be here. Since a relevant_arguments is being considered, perhaps a relevant_types in kwargs?

.. code:: python
def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

This example really does not look nice!! If this cannot be done more elegantly in python2, then I think we should aim for python3 only - this process is unlikely to be that fast and even if it were, skipping one release would be well worth having a much simpler interface. In particular, in python3 one could use a decorator that looks at annotations (which we'd like anyway!) - much more elegant.

if success:
return value
... # continue with the definition of broadcast_to

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

I actually think this is slightly the wrong model: ndarray should define __array_function__ itself and that should call the actual definition of broadcast_to (but at lowest priority). See more general comment.

signature for functions in the past, including functions like
``numpy.sum`` which support overloads.

For adding new keyword arguments that do not change default behavior, we

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

I think this is too restraining. I think it would be up to implementations to check that there are no excess arguments.

(Autograd, Tangent), higher order array factorizations (TensorLy), etc.
that add additional functionality on top of the Numpy API.

We would like to be able to use these libraries together, for example we

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

Add:

Finally, some ``ndarray`` subclasses add new behaviour (e.g., MaskedArray and astropy's Quantity),
which gives different meaning to functions. Indeed, MaskedArray reimplements some of the core
numpy functions, and Quantity has long-standing issues about compatibility with stacking functions.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 1, 2018

Author Contributor

I'm not sure how best to weave this text into the NEP. To me this seems like orthogonal functionality.

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 1, 2018

Contributor

My text may have been confusing, but I think __array_function__ is definitely very relevant for subclasses.

For instance, currently ma.extras is full of re-definitions of functions like stack - i.e., it follows the make-your-own-namespace "solution" that this NEP would like to address. This could be completely replaced by the __array_function__ proposed here!

Similarly, for Quantity we are currently relying on overrides or hope that a function uses ufuncs and does not do asarray - and we have no good solution for functions like concatenate (not being willing to provide our own namespace...). With __array_function__, this would be solved. For instance, for concatenate we can ensure that all inputs are converted to the same unit before doing the actual concatenation.

In other words, I probably should have written what looks much like this NEP for MaskedArray and Quantity alone - hence the request to mention them!

Aside: For both the above, I see possible implementations very much in terms of "1. prepare input arrays; 2. call original function on those; 3. set some properties on outputs" - step (2) is most logically implemented with a super() call, which is why I strongly suggest there be an ndarray.__array_function__.

casting all arguments to numpy arrays and re-calling the ufunc, but the
heterogeneous function signatures supported by ``__array_function__``
make it impossible to implement this generic fallback behavior for
``__array_function__``.

This comment has been minimized.

Copy link
@mhvk

mhvk May 30, 2018

Contributor

This could be solved by all functions also gaining a coerce (or so) keyword argument, somewhat equivalent to the current subok, but always False by default.

Alternatively, as I'd like an ndarray.__array_function__ implementation anyway (for super calls), that implementation could take a coerce keyword argument (now default True) or re-use subok (then default False).

@mhvk

This comment has been minimized.

Copy link
Contributor

commented May 30, 2018

I cannot find this mentioned on the mailing list, so then here:

  1. I'm rather unclear about the use of types. It can help me decide what to do, but I would still have to find the argument in question. If we pass anything at all, should it just be the argument itself that does not get immediately recognized by ndarray? Or, better still, perhaps a tuple of all arguments that were inspected?
  2. For subclasses, it would be very handy to have ndarray.__array_function__, so one can call super after changing arguments. (For __array_ufunc__, there was lots of question about whether this was useful, but it really is!!).
  3. This ndarray.__array_function__ might also help solve the problem of cases where coercion is fine: it could have an extra keyword argument (say coerce) that would call the function with coercion in place.
  4. Indeed, the ndarray.__array_function__ could just be used inside the "dance" function, and then the actual implementation of a given function would just be a separate, private one.
@shoyer

This comment has been minimized.

Copy link
Member

commented May 30, 2018

@mattip

This comment has been minimized.

Copy link
Member

commented May 30, 2018

The authors should send it out to the mailing list.

Still needed for merge (as per Nep Workflow, the merge should be sooner rather than later):

  • rebase against master to remove the failing test
  • rename the file to doc/neps/nep-0018-array-function-protocol.rst
@mhvk

This comment has been minimized.

Copy link
Contributor

commented May 31, 2018

Another general comment, since this is still not on the mailing list, yet I'm thinking about it now: Since speed for normal operation should be impacted as minimally as possible, there should be obvious ways to ensure no type checking dance is done. Some possible solutions (which I think should be in the NEP, even if as discounted options):

  • Two namespaces, one for the undecorated base functions, and one for the decorated ones. The idea would be that if one knows one is dealing with arrays only, one would do import numpy.array_only as np. The latter namespace would be the one automatically used for ndarray.__array_function__.
  • Decorator automatic insertion of a array_only=np._NoValue (or coerce and perhaps subok=... if not present) in the function signature, so that users who know that they have arrays only could pass array_only=True (name to be decided). This would be most useful if there were also some type of configuration parameter that could set the default of array_only.
@mrocklin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2018

rebase against master to remove the failing test
rename the file to doc/neps/nep-0018-array-function-protocol.rst

Done! My apologies for the delays on this @mattip .

@mattip mattip merged commit 1c50f24 into numpy:master Jun 1, 2018

5 of 6 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
lgtm analysis: C/C++ No alert changes
Details
lgtm analysis: JavaScript No alert changes
Details
lgtm analysis: Python No alert changes
Details
@mattip

This comment has been minimized.

Copy link
Member

commented Jun 1, 2018

Thanks @mrocklin

@shoyer

This comment has been minimized.

Copy link
Member

commented Jun 2, 2018

I've now posted the NEP to the mailing list for discussion
https://mail.python.org/pipermail/numpy-discussion/2018-June/078127.html

@mhvk with regards to ndarray.__array_function__ in particular, I don't have strong feelings here and am happy to differ to your judgment (you know I don't like subclassing). Certainly there is virtue in using the same approach as __array_ufunc__. I'll add a brief note on this in the next revision.

With regards to namespaces, please note the section on "Separate namespace" that is already in the NEP text.

@mhvk
Copy link
Contributor

left a comment

Two smaller comments; more general ones on the mailing list.

``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that
is not covered by existing protocols, like reductions (like ``np.sum``)

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 3, 2018

Contributor

This particular reduction is an example of a ufunc - maybe stick to the more complicated example of np.mean?

would only include these as keyword arguments when they have changed
from default values. This is similar to `what NumPy already has
done <https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/fromnumeric.py#L1865-L1867>`__,
e.g., for the optional ``keepdims`` argument in ``sum``:

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 3, 2018

Contributor

Might be worth mentioning that the dance decorator could do this part as well!

.. code:: python

def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 3, 2018

Contributor

It's a mini-thing, but one might as well have the dance return either a result or NotImplemented.

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jul 23, 2018

Contributor

I’d be in favor of this. No need to complicate things with multiple out args, remember the order and so on.

@mattip

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

Are we ready to start the approval process for NEP 18?

@shoyer

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

@mattip almost -- I'd like to add short paragraphs on class methods and dtypes to "Alternatives" to make it clear that we aren't aren't ruling out such options in the future.

@okuta

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2018

Hello, I am a CuPy developer.
This feature is important for NumPy compatible library.
I create a PR for CuPy in experimental cupy/cupy#1650 . Please use it if you want.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.