-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: revision of NEP-18 (__array_function__
)
#11303
Conversation
I would particularly like to highlight a draft implementation (in pure Python for now). Hopefuly this will be a useful for driving the discussion forward: https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006 TODOs before merging: - [ ] review from mrocklin Other TODOs: - [ ] decide if we want to change what is passed on to `__array_function__` implementations: should we include `overloaded_args` and/or `relevant_args` as well or instead? - [ ] add some discussion about static typing / PEP-484?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two changes, mainly typos.
One thing: (I'll repeat this on the mailing list) It'd be nice to have some form of super
if we do decide to implement np.NotImplementedButCoercible
. Probably another helper function that calls the implementation for a certain class, without doing the dance, but raising TypeError
and following np.NotImplementedButCoercible
rules? This would address @mhvk's concerns. The reason I think this is acceptable as __array_function__
is a protocol, not a finished method to be used as-is. Something like the following:
def do_partial_dance(func, cls, args, kwargs):
if hasattr(cls, '__array_function__'):
retval = cls.__array_function__(...)
if retval is np.NotImplementedButCoercible:
return func(*args, **kwargs)
if retval is NotImplemented:
raise TypeError(...)
return retval
return func(*args, **kwargs)
Then, something inheriting from ndarray
would just do do_partial_dance(func, ndarray, ...)
instead of super(cls, self).__array_function__(...)
.
Also, I fully support making __array_function__
a class method as suggested, which would make the above possible. Also, we can simply remove ndarray.__array_function__
.
or universal functions (like ``np.exp``). The semantics are very similar | ||
to ``__array_ufunc__``, except the operation is specified by an | ||
arbitrary callable object rather than a ufunc instance and method. | ||
is not covered by the ``__array_func__`` protocol for universal functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: __array_func__
-> __array_ufunc__
# overloaded function again. | ||
return func(*args, **kwargs) | ||
|
||
To avoid recursion, the dispatch rules for ``__array_function__`` need also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid recursion -> To avoid infinite recursion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly points of clarification, though not all minor.
And in particular I would like to discard the idea of ndarray
returning NotImplementedButCoercible
up front. There is no need to break expectations for super
chains like that.
function call that will be checked for an ``__array_function__`` | ||
implementation. | ||
- The tuple ``args`` and dict ``**kwargs`` are directly passed on from the | ||
- ``types`` is a list of argument types from the original NumPy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name now has changed from types
to possibly_overloaded
- should add that I think this is a terrible name! Just overloaded
would be better.
(Separate: still think an OrderedDict
would be better - but have to look at implementation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, switched back to types
. But maybe overloaded_types
would be better/more descriptive.
@@ -145,7 +149,9 @@ This will require two changes within the Numpy codebase: | |||
1. A function to inspect available inputs, look for the | |||
``__array_function__`` attribute on those inputs, and call those | |||
methods appropriately until one succeeds. This needs to be fast in the | |||
common all-NumPy case. | |||
common all-NumPy case, and have acceptable performance (no worse than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that "It should also be possible to skip the test altogether".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree here. This isn't the case for __array_ufunc__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be if adding subok=False
didn't cause a bigger delay than checking for __array_ufunc__
. But perhaps this also suggests that the current phrasing of "it should be fast" is good enough.
functions to define how that function operates on them. This will allow | ||
using NumPy as a high level API for efficient multi-dimensional array | ||
operations, even with array implementations that differ greatly from | ||
``numpy.ndarray``. | ||
|
||
Detailed description | ||
-------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the mention of astropy as well? See comments on original NEP.
`current behavior <https://bugs.python.org/issue30140>`_ of Python. | ||
- Implementations of ``__array_function__`` indicate that they can | ||
handle the operation by returning any value other than | ||
``NotImplemented``. | ||
- If all ``__array_function__`` methods return ``NotImplemented``, | ||
NumPy will raise ``TypeError``. | ||
|
||
One deviation from the current behavior of ``__array_ufunc__`` is that NumPy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that sounds like something we should do in __array_ufunc__
as well. See #11306.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is desirable, then I'd argue that passing self
is undesirable, as the function should never have a reason to know which object it was called on.
return func(*args, **kwargs) | ||
|
||
To avoid recursion, the dispatch rules for ``__array_function__`` need also | ||
the same special case they have for ``__array_ufunc__``: any arguments with an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OR the non-wrapped function can be called. That would give a substantial speed advantage. (For __array_ufunc__
, part of #8892)
``inspect`` module. NumPy won't be supporting Python 2 for | ||
`very much longer <http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_, but a bigger issue is | ||
performance: ``inspect`` is written in pure Python, so our prototype | ||
decorator pretty slow, adding about 15 microseonds of overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the wrapper would be an issue is we use annotations, since then the wrapper can do all the work while wrapping the function, creating the relevant try_array_function_override
. It also has the benefit of generically not having to duplicate names in the wrapper.
If we want to do this, we should consider exposing the helper function | ||
``do_array_function_dance()`` above as a public API. | ||
If we want to do this, we should expose the helper function | ||
``try_array_function_override()`` as a public API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly, the wrapper should be exposed, and the annotation API agreed upon.
Alternatively, a separate namespace, e.g., ``numpy.array_only``, could be | ||
created for a non-overloaded version of NumPy's high level API, for cases | ||
where performance with NumPy arrays is a critical concern. This has most | ||
of the same downsides as the separate namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the downsides are quite the same because if you use the wrappers, the separate "namespace" can be created automatically.
|
||
This approach also suggests an intriguing possibility: default implementations | ||
of ``__array_function__`` and ``__array_ufunc__`` on ``numpy.ndarray`` could | ||
change to simply return ``NotImplementedButCoercible``, i.e., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it extremely unexpected behaviour for ndarray
itself to return this: the semantics of __array_function__
is that it provides an answer if it can, and in this case ndarray
of course can provide an answer. To me, this strongly suggests this would be a design mistake.
Instead, please do discuss the alternative of ndarray.__array_function__
simply calling the unwrapped function - that would also remove the need to treat its __array_function__
differently, and has the benefit of actually doing what it is supposed to do.
special cases for ``numpy.ndarray`` in the NumPy's dispatch logic. It would | ||
break `some existing implementations <https://mail.python.org/pipermail/numpy-discussion/2018-June/078175.html>`_ | ||
of ``__array_ufunc__`` on ndarray subclasses (e.g., in astropy), but this | ||
would be acceptable given the still experimental status of``__array_ufunc__``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might as well object loudly now, and be quite clear that it will need a much more convincing case than this to override my objections.
@mhvk Could you look at the text of my review for a possible alternative? Maybe you'll find it useful. In particular, I think the mistake you may be making is to treat |
@hameerabbasi - I fear I don't like the suggestion much at all. Having had to work around the idiotic behaviour of Just as one concrete example, consider the case of two subclasses, my |
Thinking a bit more about |
p.s. On the notebook: I'm not sure I actually understand why the inspect and apply defaults are needed, but I haven't got the time right now to investigate. |
I'm going to remove the suggestion that |
This is needed for handling arguments that might either be called either by position or keyword, e.,g., |
Ah, I see. But then the code could be made much more efficient if the wrapper created a function with the right signature and made the args from that, i.e., the wrapped function would look like
and for something like
I.e., your wrapped-function creator should do as much as possible of the work up-front, so you don't loose time in the actual call. |
I think this would require using some sort of dynamic code generation, perhaps inspired by or using the |
Yes, that is what I'm thinking of. I thought in fact that was what was typically done, but looking at the one piece of code that I know that does wrapping and inspecting of arguments ( Anyway, perhaps for the NEP all we need to note is that working with the decorator could be just as fast, and thus worries about performance should not prevent us from going for the most readable way to tell which arguments would be looked at - if we can do the type annotation and use it for the wrapping, we get two benefits in one go! |
Just to be clear, you're suggesting that we might be able to write overloads like:
The I agree that this would be pretty awesome! There are some potential issues (e.g., Python 3 only, potentially slow imports), but it should definitely be mentioned. You'll note that I added the fully-qualified name |
Yes, that's what I would hope; nice to see it written out! I'd bet one could get rid of that fully-qualified name... |
array libraries (e.g., scipy.sparse) really don't want implicit conversions | ||
to NumPy arrays, and often avoid implementing ``__array__`` for exactly this | ||
reason. Implicit conversions can result in silent bugs and performance | ||
degradation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what would happen if the policy were "if everyone returned NotImplemented
then do what we used to do, and try coercing with numpy.asarray
, if that fails, then we fail as we did before"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do this. I think this would a slightly inferior version of @mhvk's proposal to only try coercing classes that return NotImplemented
if they implement __array__
. The problem is that classes like scipy.sparse.matrix
technically can be converted with np.asarray()
, but you end up with a 0-dimensional array containing a sparse matrix (scipy/scipy#4239). (That's probably a bug that could be fixed by defining an __array__
method that raises TypeError.)
The main reason why I don't like returning NotImplemented to mean "try to coerce this argument instead" (at least if it defines For example, dask.array would certainly want to implement Maybe there are other ways to achieve this goal. In principle we could even remove In the long term, my preference would be to stop using |
Still to do after my latest edits:
|
👍 to (for now at least) not coercing anything that returns |
@shoyer - on the Should Overall, I think there's little point in making it a class method beyond a sense of purity. And one has to balance that against confusion for people who are used to regular operator overloading or to |
I agree with this all of this . |
How about @np.dispatch_on
def concatenate(arrs, axis=0, out=None):
# yield every operand that can be overloaded on
# probably faster to return a list, but this is more readable
for arr in arrs:
yield arr
yield out
@np.concatenate.register(np.ndarray)
def concatenate(arrs, axis=0, out=None):
# the current implementation of concatenate This eliminates the need for python 3 support, or to write a typing parser |
I like @eric-wieser's logic, with one difference: The |
@eric-wieser - that seems super-beautiful (and can still be auto-generated using annotations if someone wanted to do that). |
There's still the question of how to expose the other available types to the implementation. Maybe as simple as a |
@eric-wieser - maybe still just @shoyer - it would seem nicest if the types were ordered in the way they are going to be tried (i.e., the sub-class put before the class). |
I really like the explicit decorator solution! It's low-boilerplate but still fully explicit, with a clean separation of code for generating dispatch arguments and implementation. I've updated my notebook with a draft implementation, which I've called I'll note one important difference from what @eric-wieser wrote: I'm only decorating the implementation function, and don't pass the type def _concatenate_dispatcher(arrs, axis=0, out=None):
for arr in arrs:
yield arr
yield out
@dispatch_with(_concatenate_dispatcher)
def concatenate(arrs, axis=0, out=None):
# the current implementation of concatenate I like the idea of a dispatch mechanism with a One of the major advantages of using protocols rather than multiple dispatch is that we don't need to figure this all out ahead of time, and we can push this choice on implementations instead. The protocol is the simplest interface, but it doesn't require what implementers can do. Instead, I think I'll update the NEP to recommend that implementers of class MyArray:
def __array_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS:
return NotImplemented
if not all(issubclass(t, MyArray) for t in types):
return NotImplemented
return HANDLED_FUNCTIONS[func](*args, **kwargs)
HANDLED_FUNCTIONS = {}
def implements(numpy_function):
def decorator(func):
HANDLED_FUNCTIONS[numpy_function] = func
return func
return decorator
@implements(np.concatenate)
def concatenate(arrays, axis=0, out=None):
... # implementation of concatenate for MyArray objects
@implements(np.broadcast_to)
def broadcast_to(array, shape, subok=False):
... # implementation of broadcast_to for MyArray objects |
In principle, I agree. In practice, I'm not sure it actually matters. If a class defines override behavior differently based upon in the order in which overrides were checked, it probably doesn't have a well-defined type casting hierarchy. The only reason why I'm not doing this in my current implementation is that |
I'm getting really excited about the explicit decorator for internal use in NumPy. Although there are still a few edge cases where we would need to explicitly call Here's why: with the explicit decorator solution, we don't need to awkwardly reassign parsed function arguments into This means that if an array implementation doesn't need to or can't support some optional keyword arguments, e.g., the For example, you could write a simplified version of @implements(np.sum)
def sum(array, axis=None):
... # implementation of sum for MyArray objects This would just work, as long as nobody explicitly passes in This also automatically gives us a safe expansion path for new keyword arguments. If NumPy adds a new optional keyword argument (e.g., Finally, if we make a point to write our dispatching functions as accepting def _sum_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None,
**ignored_kwargs):
yield a
yield out
@dispatch_with(_sum_dispatcher)
def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
... # current definition of np.sum Adding the unused |
Could you add this to the notebook? I am curious what stack track you get with the exception on python3 |
How do we want to handle default argument values in the dispatcher? Do we set them all to |
@eric-wieser If you see @shoyer's example, the idea is to use the same signature as the function. If you mean the implementation, for that, we do things exactly as before, with the exception of the added decorator. Edit: The idea is to not pass them in at all (via |
@hameerabbasi: My point is that |
Ah. To maintain compatibility with older code, Numpy will only add arguments at the end, so we can add |
Would the decorator solution introduce challenges when trying to pickle numpy functions? |
It looks like def f():
pass
print(pickle.dumps(f))
def good_wrapper(func):
@functools.wraps(func)
def new_func(*args, **kwargs):
return func(*args, **kwargs)
return new_func
@good_wrapper
def f():
pass
print(pickle.dumps(f))
def bad_wrapper(func):
def new_func(*args, **kwargs):
return func(*args, **kwargs)
return new_func
@bad_wrapper
def f():
pass
print(pickle.dumps(f)) Results in:
|
It really doesn't matter what the default argument values in the dispatcher are, as long as they aren't objects that implement |
My inclination is that we would only want to include
|
Done. From the notebook, here's what a traceback looks like by default:
We probably should catch such a |
Thanks. The re-raised (?) error should also mention the offending function ( |
OK, I have a draft version of exception wrapping in the notebook, e.g., the error message is now: In practice, for a function like I'm not sure this is necessary in practice because the traceback will include the file/location of the offending class and numpy function, but easier to understand error messages are always a win. If we do this, it might not be a bad idea for |
@shoyer - yes, I definitely feel things here should go to |
@shoyer - it might make sense to merge the changes you've made so far and start a new iteration... |
I haven't had the time to rewrite things yet, but I added a big "Warning" to the in-progress section so I think we can merge this in its current, updated draft state. |
+1 from me. Thanks for leading this over the last few weeks @shoyer
…On Fri, Jun 15, 2018 at 11:32 AM Stephan Hoyer ***@***.***> wrote:
I haven't had the time to rewrite things yet, but I added a big "Warning"
to the in-progress section so I think we can merge this in its current,
updated draft state.
@mrocklin <https://github.com/mrocklin> any objections?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11303 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszIEtFndHNZUDEi3Iy8Q1P0NpxhIwks5t89OagaJpZM4UiL5i>
.
|
__array_function__
)
I would particularly like to highlight a draft implementation (in pure Python
for now). Hopefuly this will be a useful for driving the discussion forward:
https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
TODOs before merging:
Other TODOs:
__array_function__
implementations: should we include
overloaded_args
and/orrelevant_args
as well as or instead oftypes
?@mhvk @hameerabbasi @ngoldbaum @mattip please leave a note if there's anything I missed from our previous discussion that you would like to see addressed in the NEP itself. Detailed discussion should of course be saved for the mailing list.