Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add __array_ufunc__ #8247

Merged
merged 43 commits into from Apr 27, 2017
Merged

ENH: Add __array_ufunc__ #8247

merged 43 commits into from Apr 27, 2017

Conversation

charris
Copy link
Member

@charris charris commented Nov 6, 2016

This reverts commit bac094c and enables
__numpy_ufunc__ for development in the NumPy 1.13.0 development cycle.

EDIT: Note that the name has been changed to __array_ufunc__ together with various changes to the function signature and implementation.

@njsmith
Copy link
Member

njsmith commented Nov 6, 2016

I'm all for getting this into 1.13, but I don't get why we would want to
enable it on master before it's ready. All that does is put master into an
unreleasable state? Generally the rule is things should live on branches
until they're ready, and this is part of how we got into the mess with
scipy jumping the gun...

On Nov 6, 2016 9:58 AM, "Charles Harris" notifications@github.com wrote:

This reverts commit bac094c
bac094c
and enables

numpy_ufunc for development in the NumPy 1.13.0 development cycle.

You can view, comment on, or merge this pull request online at:

#8247
Commit Summary

  • ENH: Revert "Temporarily disable numpy_ufunc"

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#8247, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAlOaEd4slFcqly26_j-yqed1ikcnhwQks5q7hU5gaJpZM4KqoDm
.

@charris
Copy link
Member Author

charris commented Nov 6, 2016

My thought is to have it there so we can work on it and have it tested against. If things breaks, tests will break and will need fixing anyway. I think trying to work this out in one big commit would be difficult and I would like to get an early start. However, there are a couple of things we might want to put in here before merging this:

  • Change the name, mostly on account of scipy tests enabling themselves.
  • Remove the index.

Other changes, such as adding an override, have less effect downstream and can come in later. Currently priority still plays a role and will need a deprecation cycle. I suppose we need to finalize what the function looks like, then argue about the binops.

@charris charris changed the title ENH: Revert "Temporarily disable __numpy_ufunc__" WIP, ENH: Revert "Temporarily disable __numpy_ufunc__" Nov 6, 2016
@rgommers
Copy link
Member

rgommers commented Nov 7, 2016

I would agree with @njsmith in principle, but with @charris in practice. If we don't re-enable it on master, there's a good chance it will never make it in. However, at the moment it's very hard to judge how far off we still are. Would be good to have a concise summary of the open points and a plan about who/how to approach those in broad lines before hitting the green button here.

@pv
Copy link
Member

pv commented Nov 7, 2016 via email

@charris
Copy link
Member Author

charris commented Nov 7, 2016

@pv I think the sticking point was the last. My own preference would be to have an opt out mechanism with the usual NotImplemented return. The reason is twofold, first, at the present time, many Objects opt out by using __array_priority__ in combination with an __rop__ as we have recommended , second, there is a desire to deprecate __array_priority__ as it doesn't really work for its original intended purpose, the second is that the Python way is to rely on NotImplemented and write an __rop__ . Note that the core problem is greedy arrays and how to keep them in check. Now it is possible to get around the lack of an __rop__ call using __numpy_ufunc__ (possibly one of the few cases where an index would be helpful), but it requires a fair amount to new code and knowledge of the what ufuncs are used for which __op__. The easy way out would be to change the meaning of __array_priority__ >= 1000 to opt out and otherwise ignore it at some future date, the other option is to add an opt out attribute. I'm in favor of the either of the last two options.

@pv
Copy link
Member

pv commented Nov 8, 2016

Yes, that is the point where we failed to find consensus on what is the optimum. It's quite possible that if we start again, the issue will not be clarified, so the decision needs to be made without convergence (and with an escape route in mind). I think there were some responses to your specific point above in the old huge discussion (rare use case to handle numpy stuff without knowing about numpy, provide dummy implementation in numpy?), plus alternative proposals for the opt-out.

@charris
Copy link
Member Author

charris commented Nov 9, 2016

I've left the NEP alone as an historical document, even though names and arguments have changed.

@mhvk
Copy link
Contributor

mhvk commented Nov 11, 2016

Agreed with @pv's summary. And that the remaining unresolved issue was with treating items that become object arrays. If I recall correctly, plans are now afoot to disallow automatic creation of object arrays with np.array, so this would be a good time to do the same with __numpy_ufunc__. With that, it should be possible to get rid of __array_priority__ much more easily.

@charris
Copy link
Member Author

charris commented Nov 12, 2016

I've removed the index. For out always tuple, does that mean the the **kwds will always contain an out entry, possibly empty? I note that the index will currently contain positions in the out argument, so I assume that having __array_ufunc__ is also called if one of an out arguments implements it.

Note on documentation, there looks to be very little any apart from the NEP, which is now outdated.

EDIT: Looks like the tuple needs to contain exactly as many entries, some possibly None, as the function has, so I guess this means always having out in the keyword dictionary. Is there any interest in deprecating non-keyword out? I note that having one of each for two output functions is deprecated and should probably go to an error soon.

@charris charris force-pushed the enable-numpy_ufunc branch 2 times, most recently from 50b585f to 07b0e5c Compare November 12, 2016 21:27
@charris
Copy link
Member Author

charris commented Nov 13, 2016

It looks to me like the nin argument to PyUFunc_CheckOverride can be removed if the out argument is always a tuple keyword argument. IFAICT, the only need for it now is for the three psuedo ufuncs used for matrix multiplication, but if the required arguments, and only the required arguments, are passed in the args tuple, the length of that tuple fills the need. @pv Comment?

@charris
Copy link
Member Author

charris commented Nov 14, 2016

OK, there are complications with __array__ufunc__ in general that I think we should deal with to keep it simple on the user end. Take for instance the following for the reduce, accumulate, and reduceat methods:

    static char *reduce_kwlist[] = {
            "array", "axis", "dtype", "out", "keepdims", NULL};
    static char *accumulate_kwlist[] = {
            "array", "axis", "dtype", "out", "keepdims", NULL};
    static char *reduceat_kwlist[] = {
            "array", "indices", "axis", "dtype", "out", NULL};

Note that all of these can be passed by position or by keyword, there is no simple division between the two except that everything after the last positional argument must be passed by keyword. ISTM that from the user end that we should have a definite division into required (positional) and optional (keyword) arguments. That is not so difficult to achieve, but is likely to add call overhead. I think that the simplest way to implement this is to use PyArg_ParseTupleAndKeywords with all object output types and then construct the args tuple and kwargs dictionary from them. We could possibly save some time by only doing this if one of the inputs implements __array_ufunc__, although even in that case we need to determine the inputs. Also note how far along the out argument appears in the given examples.

Thoughts?

@mhvk
Copy link
Contributor

mhvk commented Nov 14, 2016

I agree that all optional arguments should always end up in kwargs. I'm not sure why you say one needs to have an out even if there are no actual output arrays passed in. The advantage of not doing that is that the user can hide possibly expensive checks for any special cases behind a simply if kwargs: ....

One question: can even an array passed in via where have __array_ufunc__? (out arrays already do -- and should. I cannot immediately think why it would be useful for where [except trickery where it is an array that adapts itself based on the inputs] but it is a form of consistency to allow any of the array inputs to override)

@njsmith
Copy link
Member

njsmith commented Nov 14, 2016

It would be useful to allow where= to be a sparse array.

On Nov 14, 2016 2:09 PM, "Marten van Kerkwijk" notifications@github.com
wrote:

I agree that all optional arguments should always end up in kwargs. I'm
not sure why you say one needs to have an out even if there are no actual
output arrays passed in. The advantage of not doing that is that the user
can hide possibly expensive checks for any special cases behind a simply if
kwargs: ....

One question: can even an array passed in via where have array_ufunc?
(out arrays already do -- and should. I cannot immediately think why it
would be useful for where [except trickery where it is an array that
adapts itself based on the inputs] but it is a form of consistency to allow
any of the array inputs to override)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8247 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAlOaOmaG1Xmnj9Qs1d2dlR0sQTBQgwsks5q-NvXgaJpZM4KqoDm
.

@charris
Copy link
Member Author

charris commented Nov 14, 2016

@mhvk I didn't say there needs to be an out argument, I'm thinking of parsing out all the arguments, but in that case the ones not found will be NULL and can be ignored in constructing the kwrds dictionary.

@eric-wieser
Copy link
Member

we should have a definite division into required (positional) and optional (keyword) arguments

What exactly do you mean by that? Are you suggesting position-only arguments, keyword-only arguments, or both? Is there any precedent elsewhere in numpy for keyword-only arguments (when not combined with vararg functions)?

@shoyer
Copy link
Member

shoyer commented Nov 15, 2016

What exactly do you mean by that? Are you suggesting position-only arguments, keyword-only arguments, or both? Is there any precedent elsewhere in numpy for keyword-only arguments (when not combined with vararg functions)?

This is for the arguments passed into __array_ufunc__, which is only called by NumPy, not users.

@charris
Copy link
Member Author

charris commented Nov 15, 2016

Checking, I see that the override function already takes care of the argument normalization.

@charris
Copy link
Member Author

charris commented Nov 15, 2016

@pv The outer ufunc method just reshapes the input arrays and calls ufunc_generic_call, passing on the keywords untouched. The upshot is that PyUFunc_CheckOverride is called twice, but in only the second call are the arguments properly normalized. I propose to drop the first call. If not dropped, the arguments should at least be normalized. Thoughts?

EDIT: OK, I think we need to preserve the call as the array reshape might fail for some inputs, so the arguments need a proper normalization.

@pv
Copy link
Member

pv commented Nov 15, 2016 via email

@charris
Copy link
Member Author

charris commented Nov 15, 2016

Hmm, there is no error checking in the normalization functions, not even if there are sufficient positional arguments. All of the python C-API functions may error, even PyTuple_GetSlice where it is undocumented.

EDIT: And the positional arguments could be passed as keywords. Should we error on that?

@charris
Copy link
Member Author

charris commented Nov 15, 2016

The default values of optional arguments are also not set, should we handle those?

@mhvk
Copy link
Contributor

mhvk commented Nov 15, 2016

The default values of optional arguments are also not set, should we handle those?

I'd say we should not set anything unless it is explicitly passed in, if only to ensure that if kwargs is empty, no special stuff needs to be done.

@charris charris merged commit e332ba4 into numpy:master Apr 27, 2017
@charris charris deleted the enable-numpy_ufunc branch April 27, 2017 20:02
@charris
Copy link
Member Author

charris commented Apr 27, 2017

Merged. Thanks to all for pulling this off, and a special thanks to Marten for sheparding the process to completion.

Further modifications and enhancements should go through the normal PR process.

@shoyer
Copy link
Member

shoyer commented Apr 27, 2017

Thanks @charris.

Just to be 100% clear: I think we've very close, but before releasing NumPy we need to change the ufunc override rules so that encountering any argument with __array_ufunc__ = None disables overrides entirely.

@mrocklin
Copy link
Contributor

I am very glad to see this. Thank you for pushing this through.

I'm now looking forward to the 1.13 release more than any other release that I can recall :)

@eric-wieser
Copy link
Member

Do we want to get MaskedArray using __array_ufunc__ before we release 1.13, as a testing ground?

@charris
Copy link
Member Author

charris commented Apr 27, 2017

@eric-wieser I think it is best to leave applications to future releases as they are likely to take a fair amount of time to settle as people experiment and gain experience.

@shoyer
Copy link
Member

shoyer commented Apr 27, 2017

Do we want to get MaskedArray using array_ufunc before we release 1.13, as a testing ground?

I think this is a definitely worth trying. Certainly implementing NDArrayOperatorsMixin turned up enough design design issues on its own.

@mhvk
Copy link
Contributor

mhvk commented Apr 27, 2017

🎆 🎆 🎆

I'd actually not make further changes right now -- we do note that the thing is provisional after all.

Definitely, redoing MaskedArray using it would be great, but surely for 1.14!

@shoyer
Copy link
Member

shoyer commented Apr 27, 2017

To be clear, I think it's worth trying to redo MaskedArray with it, not necessarily worth merging the attempt :).

@eric-wieser
Copy link
Member

eric-wieser commented Apr 27, 2017

@shoyer: Was the use of the word redo intentional? It's tempting to make a new np.ma2 or np.masked module which discards the hardmask and sharedmask behaviour...

@shoyer
Copy link
Member

shoyer commented Apr 27, 2017

@eric-wieser ask @mhvk, he used "redo" first :)

@mhvk
Copy link
Contributor

mhvk commented Apr 27, 2017

I'm definitely tempted to redo -- that way we also don't have to worry about, e.g., introducing views of masks instead of copies. And the code could become substantially more agnostic about what the items being masked actually are. Anyway, for another issue/PR...

@shoyer
Copy link
Member

shoyer commented Apr 28, 2017

See #9014 for a follow-up PR, changing the behavior of __array_ufunc__ = None and adding recommendations to the NEP.

@njsmith
Copy link
Member

njsmith commented Apr 29, 2017

If re-doing np.ma with more sensible semantics, then it should probably at least start as a third-party library, to allow a little more room for experimentation, independent release cycle, etc.? We could still link to it from the docs or even a DeprecationWarning on np.ma.

It definitely would be an interesting experiment.

@charris charris changed the title ENH: Add __array_ufunc__ ENH: Add __array_ufunc__ May 9, 2017
mherkazandjian pushed a commit to mherkazandjian/numpy that referenced this pull request May 30, 2017
* ENH: NDArrayOperatorsMixin calls ufuncs directly, like ndarray

Per our discussion in
numpy#8247 (comment)

* add back in accidentally dropped __repr__
@mrocklin
Copy link
Contributor

mrocklin commented Jun 7, 2017

I have a naive implementation of this up for dask.array here: dask/dask#2438

In [1]: import dask.array as da, numpy as np

In [2]: x = np.arange(24).reshape((4, 6))

In [3]: np.sum(np.sin(x), axis=0)
Out[3]:
array([-1.56697566,  2.06850183,  3.80220828,  2.04018197, -1.59757823,
       -3.76653238])

In [4]: d = da.arange(24, chunks=(4,)).reshape((4, 6))

In [5]: np.sum(np.sin(d), axis=0)
Out[5]: dask.array<sum-aggregate, shape=(6,), dtype=float64, chunksize=(6,)>

In [6]: _.compute()
Out[6]:
array([-1.56697566,  2.06850183,  3.80220828,  2.04018197, -1.59757823,
       -3.76653238])

@charris
Copy link
Member Author

charris commented Jun 7, 2017

@mrocklin Have you been happy with the implementation so far?

@mrocklin
Copy link
Contributor

mrocklin commented Jun 7, 2017

@charris very. My first attempt took about ten lines to reroute dask.array's existing functions to be called when the user uses the proper numpy equivalent. I suspect that with a bit more work I can remove dask's functions entirely so that da.add is np.add

@mhvk
Copy link
Contributor

mhvk commented Jun 8, 2017

@mrocklin -- Great that it seemed relatively easy! Do ping if there seem to be things that do not work quite as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet