-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide on what the resolution rules for __op__/__rop__/__numpy_ufunc__ actually are #5844
Comments
Hi, thanks for picking this up. I agree 100% this is an issue that has |
The second point is: scipy.sparse does not need the new binop logic. |
Right, the above writeup doesn't even attempt to describe the dispatch mechanism that currently exists in master (except briefly in the very last paragraph), just because I'm trying to focus on what we've previously released, and what we want to release next, and what's in master might or might not match either of those :-).
Right, we are 100% in agreement here.
I guess I don't understand why we should consider this hypothetical situation. We do have I'd prefer that
If you don't override how something works, then you get the default behavior, yes. I guess I don't see why that's a problem, or what you would expect to happen instead...? For now, anyone who has overridden Not sure what you mean about ndarray subclasses -- subclasses are totally unaffected by all this stuff, b/c Python gives them total control over binop dispatch regardless of what we do.
We're not replacing Python's binop dispatch -- it still does what it does. And like you say, we have independent reasons to add I guess the argument here is really that there's something inadequate about The problem we run into in the first layer (Python's built-in dispatch) is that according to Python, what you're supposed to do is to try performing the operation, and then if you can't because you don't know what to do with the other object, you give up and let it try. But like you alluded to above, numpy has this "never say die" attitude where it will find some way to try and make But, this problem doesn't arise for |
As it appears to me, the problem is at the binop level, so it should be solved on that level, so that we don't throw away Python's binop mechanism along with the bathwater. That ndarrays directly call ufuncs, regardless of whether the operation can (for some meaning of "can") be done, renders the normal Python mechanism used to deal with binops inoperable. I don't think taking a "numpy exceptionalist" approach of using our own dispatch mechanism, working in pretty much the same way as the existing one, is completely necessary for a solution of the present problem.
The reason it does in the rules in master is to preserve legacy backward compatibility, exactly a you propose above. However, note that (in the above proposal) the presence of numpy_ufunc still has a rather important effect --- it turns off the default ndarray binop cast-other-to-ndarray behavior. I think being able to do this is the key element in solving the binop problem, as it's present in all of the solutions proposed.
Yes, but I don't think numpy_ufunc solves this particular problem. scipy.sparse.binop should return NotImplemented when it doesn't recognize the other object, and the execution then falls back to the other object as per normal Python binop rules. This should work fine. However, if scipy.sparse has a cast-other-to-ndarray-and-gobble-it-up fallback behavior in the binop, it will run against the same problem that we currently have in numpy binops. However, exactly the same problem also arises in the numpy_ufunc hook, so it appears we have just shuffled the issue around. Your suggested solution of cast-self-to-ndarray-and-then-try-again should also work with Python binops. The difference is then between
Note that Python will not call the rhs op for binops between ndarray subclasses that are not subclasses of each other (the common situation). If they don't override all of the greedy behavior of ndarray default binop, interaction between them will again run into problems.
I think the thing is that the goal is also achievable with a suitably written numpy.ndarray.binop, and I think this is the more desirable way to go, because it doesn't throw away Python's standard mechanism of binop negotiation. Of course, both approaches require that all classes that want to interoperate have to write their binop/numpy_ufunc mechanisms in a certain way, to avoid the cast-other-to-ndarray behavior that is at the root of the problems. It seems all of the solutions above, including array_priority (although it is was not consistently implemented), boil down to selectively disabling the default cast-other-to-ndarray behavior of the ndarray binop. The proposal above does that when numpy_ufunc is present, in which case it continues the method negotiation in numpy_ufunc; Python's own mechanism is skipped in all cases. tl;dr; I don't like the idea of throwing away Python's own binop mechanism and replacing it with our own, essentially identical system. If a simpler approach is necessary, I'd rather go for custom attribute(s) in |
Yes, good to have! Suggestion: would it make sense to make a wiki page in this so the documentation of the current and future API gets written as we go? I will make one comment on arrays with added information like units, from my experience working on More generally, for all those classes |
Now read the thread more carefully: My sense would be (and I think we all agree) that we leave the present behaviour as unchanged as possible, and try to ensure But we need to define more clearly what it means to "let the ufunc machinery take over". Could it be simpler than it is now? In particular, should the very first thing in a
(Here, The thing that this removes is that no longer (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/number.c#L121):
Right now, the trial to implement the above prioritisation caused the problem in #4815 that |
(i) The very first thing in an ufunc is already checking for numpy_ufunc
hook and dispatching to it.
.
(ii) As I write above, if ndarray dispatches immediately to ufunc,
Python's native binop mechanism just cannot be used to deal with
ndarrays. We have then replaced the native mechanism in the language by
our own mechanism that is pretty much identical, modulo the fact that
the presence of `__numpy_ufunc__` attribute turns off the greedy default
ndarray binop behavior.
|
I am certainly inspired by @mhvk's vision of composable ndarray objects. I would love to be able to write new ndarray-like objects that add a single feature and all work well together, like lazy computation (e.g., I think One thing that would be extremely helpful is some sort of "best practices" recipe for how one should implement binary operations on an ndarray-like object to make use of the numpy machinery. Let me give this a shot, just to see if I understand what I've read so far: class ArrayLike:
"""Our ndarray-like object will just be a simple wrapper"""
def __init__(self, values):
self.values = values
def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs):
# replace self with self.values in input before calling the ufunc again
inputs = tuple(x.values if i == n else x for n, x in enumerate(input))
# similarly replace ArrayLike instances in the out argument
if 'out' in kwargs:
out = kwargs['out']
cls = type(self)
if isinstance(out, tuple):
out = tuple(o.values if isinstance(o, cls) else o for o in out)
elif isinstance(out, cls):
out = out.values
kwargs['out'] = out
# do the computation on unwrapped arrays
result = getattr(ufunc, method)(*inputs, **kwargs)
# now wrap the result
return type(self)(result)
# for consistency, binary ops should be defined by calling numpy ufuncs
# we might even write a standard mixin class to add all these methods
def __add__(self, other):
return np.add(self, other) |
Separately, on non-ndarray objects having Example:
Probably should document this wonderful behaviour! |
@pv - I see your point that in @njsmith's most simple scheme, the EDIT: I meant of course that the |
@pv - Please don't remove it ;-) But above I omitted my first trial, which is silly, as it is more obvious it should work:
The logic in the simple scheme would be (for
|
@shoyer - I think your example is good, except that I would hope you would not need to define And of course your |
That isn't how numpy_ufunc dispatch works, though -- remember that as (Also note that you want to support np.add(a, b) and for it to do the
|
@shoyer: yes, that ArrayLike class is exactly what I have in mind. (Except
|
@mhvk I need to define @njsmith OK, let me add something for |
@shoyer - OK, my mistake; yes, in your case you would obviously need @njsmith - you're right that my list does not describe the current logic correctly. What happens currently is that p.s. I definitely would like to have a way to tell |
Re: NotImplemented. Consider numpy.sin(x) returning notimplemented,
which gets automatically turned into an object array later on, and then
trying to debug WTH is going on (in some code someone else wrote five
years ago). I think it's somewhat bad design if such special values
start leak out from ufuncs --- this is an error condition, so an
exception should be raised.
|
"Don't even try" is written:
On May 6, 2015 1:01 PM, "Marten van Kerkwijk" notifications@github.com
|
I have one more question for my example, related to this discussion about returning def __add__(self, other):
try:
return np.add(self, other)
except TypeError:
return NotImplemented Otherwise, writing a compatible ndarray-like class will require using |
@mhvk For your latest example, I think you mean |
Just to be sure I understand correctly: is @shoyer's example essentially how
|
@pv - I fear I have somewhat turned this discussion off-topic. I'm still trying to understand what is wrong with the simplest |
@mhvk To clarify, if |
@shoyer - @pv and @njsmith convinced me that the appropriate thing for
This behaviour seems sensible to me. But an explicit error is better (and @njsmith's
|
@mvhk: If ndarray dispatches immediately to ufunc, Python's native binop |
@mhvk I think it is still slightly better behaved to write:
NumPy translates this into |
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
Is this resolved by the merge of #8247? |
@eric-wieser Yes, I think so, though note my follow-up in #9014 |
As per the discussion at numpygh-5844, and in particular numpy#5844 (comment) this commit switches binop dispatch to mostly defer to ufuncs, except in some specific cases elaborated in a long comment in number.c. The basic strategy is to define a single piece of C code that knows how to handle forward binop overrides, and we put it into private/binop_override.h so that it can be accessed by both the array code in multiarray.so and the scalar code in umath.so.
I know its been a long time, but @njsmith and others. I just realized we do binop stuff in our scalar code. Why is that? We should be able to assume that scalars always defer to any array-like, no? Any array-like should be able to score our scalars. There could be some stranger things about when to defer to user defined scalars as opposed to going to ufunc directly. I am willing to bet we can just defer in that case as well though? Removing all "binop_override.h" from the scalar math code entirely does not cause any test failures for one thing (although I suppose it might then go into ufuncs when it doesn't have to necessarily). |
I think the issue is speed. I wonder if there is a way to fast-path even more ufunc cases so this wouldn't matter. |
Yeah, we have to fall back to the ufunc commonly, and before we fall back to the ufunc, we have to check |
So, I do think that we are just making scalars unnecessarily slow currently though. We need to defer once we reach the "use the array method (even though this is a scalar)" stage. That stage already does the check in either case though, so it seems we are doing the check up to 3 times if we end up going into the array branch. So I think we can remove the case completely from the scalar code. If scalars fall back to the generic scalar code paths, these already just fall back to the array code path. So the only danger would be a scalar subclass implementing |
There is a complex set of questions around how to handle method resolution in the presence of
__numpy_ufunc__
. Currently in master is an extremely complicated set of rules that isn't documented and that I don't actually understand (see #5748 for the latest set of changes to this), so it's kinda hard to know whether they are correct, but I suspect not. And this is a blocker for 1.10, b/c whatever we release in 1.10 will be set in stone forever.I strongly feel that we cannot include
__numpy_ufunc__
in a release without at least having a document somewhere describing what the actual dispatch rules are. I hope that doesn't mean we have to defer__numpy_ufunc__
for another release, but if it does then it does.AFAICT this is how
a op b
dispatch works for ndarrays, BEFORE__numpy_ufunc__
(i.e., this is how 1.9 works):a.__op__(b)
orb.__rop__(a)
. So in the case where one of these objects is a proper subclass of the other, that object always gets to do absolutely anything, so that's fine. The interesting cases are the ones where neither is a proper subclass of the other (either because it's like,matrix + masked array
, or because it's likendarray + scipy.sparse
). So without loss of generality, let's focus on the case where Python callsa.__op__(b)
, anda
is either an instance ofndarray
or else an instance of a subclass ofndarray
which has not overridden__op__
, i.e. we're gettingndarray.__op__(a, b)
.ndarray.__op__
has the following logic (seePyArray_GenericBinaryFunction
innumber.c
):b
is not anndarray
at all (even a subclass), andb
has a higher__array_priority__
thana
, then we returnNotImplemented
and let control pass tob.__rop__(a)
.np.op(a, b)
and let the ufunc machinery take over.np.op(a, b)
does the following (seePyUFunc_GenericFunction
,PyUFunc_GeneralizedFunction
, inufunc_object.c
, and alsoufunc_generic_call
which converts-2
return values from the previous intoNotImplemented
so you have to audit their whole call stack):b
is not anndarray
, and callingnp.array(b)
returns an object array (presumably because coercion failed... though I guess this could also be hit ifb.__array__()
return an object array or something), ANDb
has a higher__array_priority__
thana
, andb
has an__rop__
method, then returnNotImplemented
.NotImplemented
. (This is buried inget_ufunc_arguments
, search forreturn -2
.)Now, my suggestion is that the way we would EVENTUALLY like this to look is:
a.__op__(b)
orb.__rop__(a)
. As above, let's assume that it invokesndarray.__op__(a, b)
.ndarray.__op__(a, b)
callsnp.op(a, b)
(which in turn invokes all the standard ufunc stuff, including__numpy_ufunc__
resolution).I submit that it is obvious that IF we can make this work, then it is obviously the ideal outcome, because it is the simplest possible solution. But is it too simple? To determine this we have to answer two questions: (1) Will it adequately address all the relevant use cases? (2) Can we get there from here?
So let's compare the current rules to my dream rules.
First, we observe that everything that currently happens inside the ufunc machinery looks like it's totally wrong. The first check can only be triggered if
b
is a non-ndarray
that has a higher__array_priority__
(among other things), but if we look above, we see that those conditions are sufficient to trigger the check inndarray.__op__
, so checking again at the ufunc level is redundant at best. And the second check is just incoherent nonsense AFAICT. The only reason to returnNotImplemented
is b/c you want to pass control to another__(r)op__
method, and there's no reason arrays containing structured dtypes in particular should somehow magically have different__(r)op__
methods available than other arrays. So we can just get rid of all the ufunc stuff immediately, great.That leaves the
__array_priority__
stuff. We have two problems here: we can't just drop this immediately b/c of backcompat issues, and we need to have some way to continue to support all the use cases that this currently supports. The first problem is just a matter of having a deprecation period. For the second, observe that a class which defines a__numpy_ufunc__
method gets complete control over what any ufunc call does, so it has almost as much power as a class that currently sets__array_priority__
. The only additional power that__array_priority__
currently gives you is that it lets you distinguish between e.g. a call tondarray.__add(a, b)
versus a call tonp.add(a, b)
. So the only code that really loses out from my proposed change is code which wantsa + b
andadd(a, b)
to do different things.AFAIK in the entire history of numpy there is only one situation where this power has been used on purpose: the definition of matrix classes where
a * b
is matmul, butnp.multiply(a, b)
is elmul. And we've all agreed that such classes should be deprecated and eventually phased out (cite: PEP 465).So, I conclude that EVENTUALLY my dream rules should work great. The only problem is that we need some temporary compromises to get us from here to there. Therefore, I propose we use the following dispatch rules in numpy 1.10, with the goal of moving to my "dream rules" in some future version:
a.__op__(b)
orb.__rop__(a)
. As above, let's assume that it invokesndarray.__op__(a, b)
.ndarray.__op__(a, b)
does the following:b
does not define__numpy_ufunc__
and is not anndarray
at all (even a subclass), andb
has a higher__array_priority__
thana
, then we issue a deprecation warning and returnNotImplemented
and let control pass tob.__rop__(a)
. (bolded parts are changes compared to the current behaviour)__op__
is__mul__
andb->tp_class->tp_name.startswith("scipy.sparse.")
, then returnNotImplemented
. (This rule is necessary in addition to the above, becausescipy.sparse
has already made a release containing__numpy_ufunc__
methods, so the exception above doesn't apply.)np.op(a, b)
and let the ufunc machinery take over.I believe that this is adequate to covers all practical use cases for the current dispatch machinery, and gives us a clean path to better dispatch machinery in the future.
The main alternative proposal is Pauli's, which involves a very complicated check (I won't try to summarize here, see this comment and following code). The goal of that approach is to continue supporting classes where
a + b
andadd(a, b)
do different things. I don't think that keeping substantial additional complexity around indefinitely is worth it in order to support functionality that no-one has ever found a use for except in one very specific case (overriding__mul__
), and where we generally agree that that one specific case should be phased out as possible.I would very much appreciate feedback from scipy.sparse and astropy in particular on whether the above covers all their concerns.
(Partial) History: #4815, #5748
CC: @pv, @cowlicks, @mhvk
The text was updated successfully, but these errors were encountered: