Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on what the resolution rules for __op__/__rop__/__numpy_ufunc__ actually are #5844

Closed
njsmith opened this issue May 6, 2015 · 328 comments

Comments

@njsmith
Copy link
Member

njsmith commented May 6, 2015

There is a complex set of questions around how to handle method resolution in the presence of __numpy_ufunc__. Currently in master is an extremely complicated set of rules that isn't documented and that I don't actually understand (see #5748 for the latest set of changes to this), so it's kinda hard to know whether they are correct, but I suspect not. And this is a blocker for 1.10, b/c whatever we release in 1.10 will be set in stone forever.

I strongly feel that we cannot include __numpy_ufunc__ in a release without at least having a document somewhere describing what the actual dispatch rules are. I hope that doesn't mean we have to defer __numpy_ufunc__ for another release, but if it does then it does.

AFAICT this is how a op b dispatch works for ndarrays, BEFORE __numpy_ufunc__ (i.e., this is how 1.9 works):

  • First Python uses the subclass rule to decide whether to invoke a.__op__(b) or b.__rop__(a). So in the case where one of these objects is a proper subclass of the other, that object always gets to do absolutely anything, so that's fine. The interesting cases are the ones where neither is a proper subclass of the other (either because it's like, matrix + masked array, or because it's like ndarray + scipy.sparse). So without loss of generality, let's focus on the case where Python calls a.__op__(b), and a is either an instance of ndarray or else an instance of a subclass of ndarray which has not overridden __op__, i.e. we're getting ndarray.__op__(a, b).
  • ndarray.__op__ has the following logic (see PyArray_GenericBinaryFunction in number.c):
    • If b is not an ndarray at all (even a subclass), and b has a higher __array_priority__ than a, then we return NotImplemented and let control pass to b.__rop__(a).
    • Otherwise, we call np.op(a, b) and let the ufunc machinery take over.
  • np.op(a, b) does the following (see PyUFunc_GenericFunction, PyUFunc_GeneralizedFunction, in ufunc_object.c, and also ufunc_generic_call which converts -2 return values from the previous into NotImplemented so you have to audit their whole call stack):
    • If b is not an ndarray, and calling np.array(b) returns an object array (presumably because coercion failed... though I guess this could also be hit if b.__array__() return an object array or something), AND b has a higher __array_priority__ than a, and b has an __rop__ method, then return NotImplemented.
    • If any of our arrays contain structured dtypes or strings, and there are no special struct ufunc loops registered, but not if any of our arrays contain objects, then return NotImplemented. (This is buried in get_ufunc_arguments, search for return -2.)
    • Otherwise we return the actual ufunc result.

Now, my suggestion is that the way we would EVENTUALLY like this to look is:

  • First, Python uses the subclass rule to decide whether to invoke a.__op__(b) or b.__rop__(a). As above, let's assume that it invokes ndarray.__op__(a, b).
  • ndarray.__op__(a, b) calls np.op(a, b) (which in turn invokes all the standard ufunc stuff, including __numpy_ufunc__ resolution).
  • There is no step 3.

I submit that it is obvious that IF we can make this work, then it is obviously the ideal outcome, because it is the simplest possible solution. But is it too simple? To determine this we have to answer two questions: (1) Will it adequately address all the relevant use cases? (2) Can we get there from here?

So let's compare the current rules to my dream rules.

First, we observe that everything that currently happens inside the ufunc machinery looks like it's totally wrong. The first check can only be triggered if b is a non-ndarray that has a higher __array_priority__ (among other things), but if we look above, we see that those conditions are sufficient to trigger the check in ndarray.__op__, so checking again at the ufunc level is redundant at best. And the second check is just incoherent nonsense AFAICT. The only reason to return NotImplemented is b/c you want to pass control to another __(r)op__ method, and there's no reason arrays containing structured dtypes in particular should somehow magically have different __(r)op__ methods available than other arrays. So we can just get rid of all the ufunc stuff immediately, great.

That leaves the __array_priority__ stuff. We have two problems here: we can't just drop this immediately b/c of backcompat issues, and we need to have some way to continue to support all the use cases that this currently supports. The first problem is just a matter of having a deprecation period. For the second, observe that a class which defines a __numpy_ufunc__ method gets complete control over what any ufunc call does, so it has almost as much power as a class that currently sets __array_priority__. The only additional power that __array_priority__ currently gives you is that it lets you distinguish between e.g. a call to ndarray.__add(a, b) versus a call to np.add(a, b). So the only code that really loses out from my proposed change is code which wants a + b and add(a, b) to do different things.

AFAIK in the entire history of numpy there is only one situation where this power has been used on purpose: the definition of matrix classes where a * b is matmul, but np.multiply(a, b) is elmul. And we've all agreed that such classes should be deprecated and eventually phased out (cite: PEP 465).

So, I conclude that EVENTUALLY my dream rules should work great. The only problem is that we need some temporary compromises to get us from here to there. Therefore, I propose we use the following dispatch rules in numpy 1.10, with the goal of moving to my "dream rules" in some future version:

  • First, Python uses the subclass rule to decide whether to invoke a.__op__(b) or b.__rop__(a). As above, let's assume that it invokes ndarray.__op__(a, b).
  • ndarray.__op__(a, b) does the following:
    • If b does not define __numpy_ufunc__ and is not an ndarray at all (even a subclass), and b has a higher __array_priority__ than a, then we issue a deprecation warning and return NotImplemented and let control pass to b.__rop__(a). (bolded parts are changes compared to the current behaviour)
    • If __op__ is __mul__ and b->tp_class->tp_name.startswith("scipy.sparse."), then return NotImplemented. (This rule is necessary in addition to the above, because scipy.sparse has already made a release containing __numpy_ufunc__ methods, so the exception above doesn't apply.)
    • Otherwise, we call np.op(a, b) and let the ufunc machinery take over.

I believe that this is adequate to covers all practical use cases for the current dispatch machinery, and gives us a clean path to better dispatch machinery in the future.

The main alternative proposal is Pauli's, which involves a very complicated check (I won't try to summarize here, see this comment and following code). The goal of that approach is to continue supporting classes where a + b and add(a, b) do different things. I don't think that keeping substantial additional complexity around indefinitely is worth it in order to support functionality that no-one has ever found a use for except in one very specific case (overriding __mul__), and where we generally agree that that one specific case should be phased out as possible.

I would very much appreciate feedback from scipy.sparse and astropy in particular on whether the above covers all their concerns.

(Partial) History: #4815, #5748
CC: @pv, @cowlicks, @mhvk

@njsmith njsmith added this to the 1.10 blockers milestone May 6, 2015
@pv
Copy link
Member

pv commented May 6, 2015

Hi, thanks for picking this up. I agree 100% this is an issue that has
to be resolved.
.
A clarification on the above writeup: when __numpy_ufunc__ is present,
the dispatch mechanism, as it is currently implemented, completely skips
all of the legacy logic inside ufuncs that you outline above, and also
disregards __array_priority__. The only logic that comes into play is
the Python binop step, plus the co-operative behavior explicitly coded
in number.c:needs_right_binop_forward. After that, command passes either
to numpy_ufunc, or to the right-hand operation of the other object.
.
The purpose for scrapping the ufunc-returns-NotImplemented and
array_priority logic is that as shown by experience, it doesn't work so
well, and moreover is already fairly complicated.
.
The motivation for the binop dispatch changes made together with
numpy_ufunc is the following: suppose we did not have
numpy_ufunc at all. How should the dispatch logic work so that
things make sense? (The only role numpy_ufunc currently has in the
dispatch logic, is that its presence turns off the old logic, and
switches on the new one.)
.
The criticism for the simpler dispatch logic above is the following:
what happens if the other object does not have numpy_ufunc present?
Numpy ufuncs will happily turn many things into object arrays, and will
unconditionally gobble up ndarray subclasses (potentially triggering
array_wrap etc.).
.
The second criticism is this: what is the point in replacing Python's
binop dispatch mechanism by a second layer of dispatch, which works in
the same way? Won't we run into the same problems in the second layer
that we ran in the first layer? (Note that numpy_ufunc still has a
logically independent purpose, as it enables overriding np.multiply and
all other ufunc ops, not just binops.) The answer to the second question
might be "no", but I don't see that from the top of my head.

@pv
Copy link
Member

pv commented May 6, 2015

The second point is: scipy.sparse does not need the new binop logic.
It does not have ndarray subclasses, and __array_priority__ will work
fine for it (provided inplace ops are made to respect it). scipy.sparse has
always used array_priority to ensure its binops get a chance to run.
.
Note that that it works for inplace ops in the absence of __numpy_ufunc__
currently is because of the array_priority check, which is done inside
an ufunc called from the in-place binop invoked by ndarray. This check would
need to be moved to occur earlier (inside the ndarray binop) to make mul
be consistent with imul even when numpy_ufunc is present.

@njsmith
Copy link
Member Author

njsmith commented May 6, 2015

A clarification on the above writeup: when __numpy_ufunc__ is present, the dispatch mechanism, as it is currently implemented [...]

Right, the above writeup doesn't even attempt to describe the dispatch mechanism that currently exists in master (except briefly in the very last paragraph), just because I'm trying to focus on what we've previously released, and what we want to release next, and what's in master might or might not match either of those :-).

The purpose for scrapping the ufunc-returns-NotImplemented and array_priority logic is that as shown by experience, it doesn't work so well, and moreover is already fairly complicated.

Right, we are 100% in agreement here.

The motivation for the binop dispatch changes made together with numpy_ufunc is the following: suppose we did not have numpy_ufunc at all. How should the dispatch logic work so that things make sense? (The only role numpy_ufunc currently has in the dispatch logic, is that its presence turns off the old logic, and switches on the new one.)

I guess I don't understand why we should consider this hypothetical situation. We do have __numpy_ufunc__, and we need it regardless of what happens at the binop layer, and in most cases binop dispatch will be followed by __numpy_ufunc__ dispatch regardless. So to me the question is: given that we have __numpy_ufunc__, what should happen at the binop layer? Having two complicated dispatch systems is strictly worse than having just one, and as I argue above, __numpy_ufunc__ is actually sufficient for everything you might want to do at the binop layer anyway. (Esp., keep in mind: whatever we do at the binop layer, will have to be replicated by every other array-like class, b/c otherwise consider what happens if I want to write some new, say, distributed array class that knows how to interoperate with both ndarrys and scipy.sparse arrays. Now whatever I need from ndarray's binop dispatch system, I also need to exist in scipy.sparse's binop dispatch system. So this is much much easier if we can tell people, when you implement an array-like class just do def __add__(self, other): return np.add(self, other), etc., and your array-like will interoperate in a predictable way with every other array-like.)

I'd prefer that __numpy_ufunc__ not play any role in the binop dispatch, and you can see in my "dream rules" that it doesn't. In my proposal, the only reason it affects the binop dispatch is because we need some signal to disable the __array_priority__ deprecation warning for classes that no longer depend on the deprecated functionality.

The criticism for the simpler dispatch logic above is the following: what happens if the other object does not have numpy_ufunc present? Numpy ufuncs will happily turn many things into object arrays, and will unconditionally gobble up ndarray subclasses (potentially triggering array_wrap etc.).

If you don't override how something works, then you get the default behavior, yes. I guess I don't see why that's a problem, or what you would expect to happen instead...? For now, anyone who has overridden __op__ methods will continue to see them called via the somewhat byzantine __array_priority__ route. In the future, they should override __numpy_ufunc__ instead.

Not sure what you mean about ndarray subclasses -- subclasses are totally unaffected by all this stuff, b/c Python gives them total control over binop dispatch regardless of what we do.

The second criticism is this: what is the point in replacing Python's binop dispatch mechanism by a second layer of dispatch, which works in the same way? Won't we run into the same problems in the second layer that we ran in the first layer? (Note that numpy_ufunc still has a logically independent purpose, as it enables overriding np.multiply and all other ufunc ops, not just binops.)

We're not replacing Python's binop dispatch -- it still does what it does. And like you say, we have independent reasons to add __numpy_ufunc__ dispatch, so we necessarily have at least 2 layers of dispatch. This discussion is about whether we want to have yet another custom binop dispatch system in between those -- a third layer.

I guess the argument here is really that there's something inadequate about __numpy_ufunc__? ("Won't we run into the same problems...")

The problem we run into in the first layer (Python's built-in dispatch) is that according to Python, what you're supposed to do is to try performing the operation, and then if you can't because you don't know what to do with the other object, you give up and let it try. But like you alluded to above, numpy has this "never say die" attitude where it will find some way to try and make ndarray + other_obj or add(ndarray, other_obj) work, even if it's massively suboptimal (like calling __array__ on a sparse array to convert it to dense form, or doing some weird object dtype thing). So we end up in this very sticky situation where we want to give the other object a chance to override the operation unconditionally, BUT if the other object doesn't want to then there's a good chance that we can go ahead and finish the operation after all via some fallback logic (coercing stuff to ndarray etc.). There's no way to express that in Python's built-in dispatch, b/c we have no good way to look at an object and know whether it can perform the operation ahead of time -- e.g., you can't just check whether there's an __add__ method, b/c list has an __add__ method but you don't want to use it when doing ndarray + list.

But, this problem doesn't arise for __numpy_ufunc__, b/c only objects that actually know how to deal with ndarrays have it defined :-). So we can get away with the rule that any object that has __numpy_ufunc__ defined has opted-out of receiving fallback handling. (And if you really do want fallback handling it's pretty easy for a __numpy_ufunc__ method to coerce self to an ndarray manually and then retry the operation, which basically gives you fallback handling after all.) So I think __numpy_ufunc__ does not run into the problems that we have with Python's built-in binop handling. Or am I missing something?

@pv
Copy link
Member

pv commented May 6, 2015

I guess I don't understand why we should consider this hypothetical situation.

As it appears to me, the problem is at the binop level, so it should be solved on that level, so that we don't throw away Python's binop mechanism along with the bathwater. That ndarrays directly call ufuncs, regardless of whether the operation can (for some meaning of "can") be done, renders the normal Python mechanism used to deal with binops inoperable. I don't think taking a "numpy exceptionalist" approach of using our own dispatch mechanism, working in pretty much the same way as the existing one, is completely necessary for a solution of the present problem.

I'd prefer that numpy_ufunc not play any role in the binop dispatch, and you can see in my
"dream rules" that it doesn't.

The reason it does in the rules in master is to preserve legacy backward compatibility, exactly a you propose above.

However, note that (in the above proposal) the presence of numpy_ufunc still has a rather important effect --- it turns off the default ndarray binop cast-other-to-ndarray behavior. I think being able to do this is the key element in solving the binop problem, as it's present in all of the solutions proposed.

Whatever we do at the binop layer, will have to be replicated by every other array-like class,
b/c otherwise consider what happens if I want to write some new, say, distributed array class
that knows how to interoperate with both ndarrys and scipy.sparse arrays.
Now whatever I need from ndarray's binop dispatch system, I also need to exist in scipy.sparse's binop dispatch system.

Yes, but I don't think numpy_ufunc solves this particular problem.

scipy.sparse.binop should return NotImplemented when it doesn't recognize the other object, and the execution then falls back to the other object as per normal Python binop rules. This should work fine.

However, if scipy.sparse has a cast-other-to-ndarray-and-gobble-it-up fallback behavior in the binop, it will run against the same problem that we currently have in numpy binops. However, exactly the same problem also arises in the numpy_ufunc hook, so it appears we have just shuffled the issue around.

Your suggested solution of cast-self-to-ndarray-and-then-try-again should also work with Python binops. The difference is then between np.multiply(np.asarray(self), other) or operator.mul(np.asarray(self), other) (modulo fallback to a cast-other-to-ndarray fallback if TypeError). In either case, the implementation needs to be written to be cooperative.

subclasses are totally unaffected by all this stuff, b/c Python gives them total control over binop
dispatch regardless of what we do.

Note that Python will not call the rhs op for binops between ndarray subclasses that are not subclasses of each other (the common situation). If they don't override all of the greedy behavior of ndarray default binop, interaction between them will again run into problems.

Or am I missing something?

I think the thing is that the goal is also achievable with a suitably written numpy.ndarray.binop, and I think this is the more desirable way to go, because it doesn't throw away Python's standard mechanism of binop negotiation. Of course, both approaches require that all classes that want to interoperate have to write their binop/numpy_ufunc mechanisms in a certain way, to avoid the cast-other-to-ndarray behavior that is at the root of the problems.

It seems all of the solutions above, including array_priority (although it is was not consistently implemented), boil down to selectively disabling the default cast-other-to-ndarray behavior of the ndarray binop. The proposal above does that when numpy_ufunc is present, in which case it continues the method negotiation in numpy_ufunc; Python's own mechanism is skipped in all cases.

tl;dr; I don't like the idea of throwing away Python's own binop mechanism and replacing it with our own, essentially identical system. If a simpler approach is necessary, I'd rather go for custom attribute(s) in other for disabling the ndarray default fallback behavior.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

Yes, good to have! Suggestion: would it make sense to make a wiki page in this so the documentation of the current and future API gets written as we go?

I will make one comment on arrays with added information like units, from my experience working on Quantity, MaskedArray, and a new Variable class. I've found it most useful to think of all of those as container classes, which add a single aspect to a set of numbers (a unit, array of masks, and array of uncertainties, resp.). In particular, while I might have felt differently when I started on Quantity, I now no longer feel it makes much sense to bring units deeper inside (as in a dtype), though it may be useful to have ufuncs that do two calculations in one go (say scale*a + b).

More generally, for all those classes __numpy ufunc__ is a great solution, as it allows on to strip off a given additional property, run the ufunc, allow it to call the __numpy_ufunc__ on the next operand, etc. So far, it seems fine if there is no additional ordering mechanism, as long as if an operand has a __numpy_ufunc__ it will always get called.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

Now read the thread more carefully: My sense would be (and I think we all agree) that we leave the present behaviour as unchanged as possible, and try to ensure __numpy_ufunc__ will get us to dream world.

But we need to define more clearly what it means to "let the ufunc machinery take over". Could it be simpler than it is now? In particular, should the very first thing in a ufunc just be the equivalent of:

def ufunc(LHS, RHS, *args, **kwargs):
    if hasattr(LHS, '__numpy_ufunc__'):
        return LHS.__numpy_ufunc__(<usual stuff>)
    elif hasattr(RHS, '__numpy_ufunc__'):
        return RHS.__numpy_ufunc__(<usual stuff>)
    else:
        return ufunc(LHS, RHS, *args, **kwargs)

(Here, ufunc is just the present machinery, i.e., one that does not look at __numpy_ufunc__ at all.)

The thing that this removes is that no longer (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/number.c#L121):

  • This always prioritizes the r* routines over numpy_ufunc, independent
  • of whether the other object is an ndarray subclass or not.

Right now, the trial to implement the above prioritisation caused the problem in #4815 that <Q-subclass> + <Q> yields a TypeError (to understand why, see @pv's #4815 (comment)). In this respect, reverting #5748 and using instead the solution I proposed is closer to the above simpleminded scheme. But what exactly makes the above prioritisation necessary?

@pv
Copy link
Member

pv commented May 6, 2015 via email

@shoyer
Copy link
Member

shoyer commented May 6, 2015

I am certainly inspired by @mhvk's vision of composable ndarray objects. I would love to be able to write new ndarray-like objects that add a single feature and all work well together, like lazy computation (e.g., dask.array) or labeled arrays (e.g., pandas.Series or xray.DataArray).

I think __numpy_ufunc__ and @njsmith's proposal here is better than the current system using __array_priority__, but I won't be 100% sure this covers my use cases until I try it.

One thing that would be extremely helpful is some sort of "best practices" recipe for how one should implement binary operations on an ndarray-like object to make use of the numpy machinery. Let me give this a shot, just to see if I understand what I've read so far:

class ArrayLike:
    """Our ndarray-like object will just be a simple wrapper"""
    def __init__(self, values):
        self.values = values

    def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs):
        # replace self with self.values in input before calling the ufunc again
        inputs = tuple(x.values if i == n else x for n, x in enumerate(input))
        # similarly replace ArrayLike instances in the out argument
        if 'out' in kwargs:
            out = kwargs['out']
            cls = type(self)
            if isinstance(out, tuple):
                out = tuple(o.values if isinstance(o, cls) else o for o in out)
            elif isinstance(out, cls):
                out = out.values
            kwargs['out'] = out
        # do the computation on unwrapped arrays
        result = getattr(ufunc, method)(*inputs, **kwargs)
        # now wrap the result
        return type(self)(result)

    # for consistency, binary ops should be defined by calling numpy ufuncs
    # we might even write a standard mixin class to add all these methods
    def __add__(self, other):
        return np.add(self, other)

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

Separately, on non-ndarray objects having __array_priority__: the main reason I used this is to ensure ndarray ufuncs don't do their nefarious gobbling of anything into an object array. Indeed, this is an aspect of ndarray that I really do not like, to try to make everything into an array (even though it has obvious benefits for python numbers). In this respect, I do like very much that I can just define __numpy_ufunc__ = None and be sure that, e.g., <array> * <my-non-array> will eventually get to my __r<op>__ methods.

Example:

class A(object):
    def __radd__(self, other):
        return 42

class B(A):
    __numpy_ufunc__ = None

np.arange(3) + A()
# Yikes: array([42, 42, 42], dtype=object)
np.arange(3) + B()
# what I would hope: 42

Probably should document this wonderful behaviour!

@pv
Copy link
Member

pv commented May 6, 2015

@shoyer: the system currently in master is not using array_priority
either at all (when numpy_ufunc is present).
.
@mvhk: that does not look like the intended use of numpy_ufunc. In the
proposal above, radd would not get called.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@pv - I see your point that in @njsmith's most simple scheme, the __r<op>__ functions will only be called by the CPython rule: if the RHS is a strict subclass of the LHS. But I guess I still do not understand why this would pose a general problem and/or why for specific cases where it does one cannot write a proper __numpy_ufunc__ that takes care.

EDIT: I meant of course that the __r<op>__ rules will only be called before the self.__<op>__ rules by the CPython logic.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@pv - Please don't remove it ;-)

But above I omitted my first trial, which is silly, as it is more obvious it should work:

class B(A):
    def __numpy_ufunc__(self, *args, **kwargs);
        return NotImplemented

The logic in the simple scheme would be (for <ndarray> + B()):

  1. CPython calls self.__add__(other), which is ndarray.__add__(self, other);
  2. ndarray.__add__ calls np.add(self, other);
  3. np.add calls other.__numpy_ufunc__;
  4. other.__numpy_ufunc__ returns NotImplemented, which is passed down to CPython;
  5. CPython calls other.__radd__(self), just as one would have hoped.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@shoyer - I think your example is good, except that I would hope you would not need to define __add__, etc., unless they do something more interesting than calling np.add -- the standard ndarray methods should "just" do the right thing...

And of course your __numpy_ufunc__ will have to do something with whatever it is that sets your class apart from an ndarray. With multiple container elements, this may not be so trivial, especially if perhaps one would like to expose all class properties (say, have <masked-array> * <quantity> return a MaskedQuantity object). But that is beyond the present discussion.

@njsmith
Copy link
Member Author

njsmith commented May 6, 2015

That isn't how numpy_ufunc dispatch works, though -- remember that as
far as numpy_ufunc is concerned, there is no binop, the user may have
just done something like np.add(a, b), and np.add is not supposed to return
NotImemented under any circunstance. So the way numpy_ufunc is defined,
if you return NotImplemented from numpy_ufunc then you still won't get
any fallback behavior: the only two possibilities are that either some
other argument's numpy_ufunc will step in to handle the operation, or
else the operation will fail b/c no-one knows how to handle it. (Just like
what happens when add and radd both return NotImplemented.) I know
working on ndarray for subclasses for so long has warped your mind to the
point where using magic attributes to tweak the behavior of complex
fallback chains seems natural and obvious, but really we are trying to keep
things simpler than that :-).

(Also note that you want to support np.add(a, b) and for it to do the
same thing as +, so implementing a fancy radd instead of a fancy
numpy_ufunc is a waste of time anyway -- the latter is strictly more
useful.)
On May 6, 2015 12:28 PM, "Marten van Kerkwijk" notifications@github.com
wrote:

@pv https://github.com/pv - Please don't remove it ;-)

But above I omitted my first trial, which is silly, as it is more obvious
it should work:

class B(A):
def numpy_ufunc(self, _args, *_kwargs);
return NotImplemented

The logic in the simple scheme would be (for + B()):

  1. CPython calls self.add(other), which is ndarray.add(self,
    other);
  2. ndarray.add calls np.add(self, other);
  3. np.add calls other.numpy_ufunc;
  4. other.numpy_ufunc returns NotImplemented, which is passed down to
    CPython;
  5. CPython calls other.radd(self), just as one would have hoped.


Reply to this email directly or view it on GitHub
#5844 (comment).

@njsmith
Copy link
Member Author

njsmith commented May 6, 2015

@shoyer: yes, that ArrayLike class is exactly what I have in mind. (Except
that your numpy_ufunc should also check for the presence of self in
kwargs["out"]. And maybe also kwargs["where"], I forget if where= is a
legal dispatch candidate.)
On May 6, 2015 11:34 AM, "Stephan Hoyer" notifications@github.com wrote:

I am certainly inspired by @mhvk https://github.com/mhvk's vision of
composable ndarray objects. I would love to be able to write new
ndarray-like objects that add a single feature and all work well together,
like lazy computation (e.g., dask.array) or labeled arrays (e.g.,
pandas.Series or xray.DataArray).

I think numpy_ufunc and @njsmith https://github.com/njsmith's
proposal here is better than the current system using array_priority,
but I won't be 100% sure this covers my use cases until I try it.

One thing that would be extremely helpful is some sort of "best practices"
recipe for how one should implement binary operations on an
ndarray-like object to make use of the numpy machinery. Let me give this a
shot, just to see if I understand what I've read so far:

class ArrayLike:
"""Our ndarray-like object will just be a simple wrapper"""
def init(self, values):
self.values = values

def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs):
    # replace self with self.values for calling the ufunc again
    inputs = tuple(x.values if i == n else x for n, x in enumerate(input))
    result = getattr(ufunc, method)(*inputs, **kwargs)
    # now wrap the result
    return type(self)(result)

# for consistency, binary ops should be defined by calling numpy ufuncs
# we might even write a standard mixin class to add all these methods
def __add__(self, other):
    return np.add(self, other)


Reply to this email directly or view it on GitHub
#5844 (comment).

@shoyer
Copy link
Member

shoyer commented May 6, 2015

I think your example is good, except that I would hope you would not need to define add, etc., unless they do something more interesting than calling np.add -- the standard ndarray methods should "just" do the right thing...

@mhvk I need to define __add__ because I want operations like array_like + 1 to be well defined. I'm not subclassing numpy.ndarray.

@njsmith OK, let me add something for kwargs['out'].

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@shoyer - OK, my mistake; yes, in your case you would obviously need __add__ and company.

@njsmith - you're right that my list does not describe the current logic correctly. What happens currently is that np.add(np.arange(3.,), B()) raises a TypeError (because __numpy_ufunc__ is not callable or returns NotImplemented). I guess the "magic" that happens is that ndarray.__add__ method then turns this into NotImplemented. But this does happen to do more or less what I would hope! I'm also not quite sure why one couldn't turn this into the list I gave, i.e., what is the problem with np.add returning NotImplemented (or perhaps raising NotImplementedError)?

p.s. I definitely would like to have a way to tell ufuncs that my object cannot interact with arrays inside a ufunc, i.e., an "don't even try" flag.

@pv
Copy link
Member

pv commented May 6, 2015 via email

@njsmith
Copy link
Member Author

njsmith commented May 6, 2015

"Don't even try" is written:

def __numpy_ufunc__(...):
    raise TypeError("stop that")

On May 6, 2015 1:01 PM, "Marten van Kerkwijk" notifications@github.com
wrote:

@shoyer https://github.com/shoyer - OK, my mistake; yes, in your case
you would obviously need add and company.

@njsmith https://github.com/njsmith - you're right that my list does
not describe the current logic correctly. What happens currently is that np.add(np.arange(3.,),
B()) raises a TypeError (because numpy_ufunc is not callable or
returns NotImplemented). I guess the "magic" that happens is that
ndarray.add method then turns this into NotImplemented. But this does
happen to do more or less what I would hope! I'm also not quite sure why
one couldn't turn this into the list I gave, i.e., what is the problem with
np.add returning NotImplemented (or perhaps raising NotImplementedError)?

p.s. I definitely would like to have a way to tell ufuncs that my object
cannot interact with arrays inside a ufunc, i.e., an "don't even try"
flag.


Reply to this email directly or view it on GitHub
#5844 (comment).

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@pv, @njsmith - OK, agreed, the np.<ufunc> itself should raise an error; to make this explicit, I agree that one should write:

 class B(A):
    def __numpy_ufunc__(self, *args, **kwargs);
        return NotImplementedError

@shoyer
Copy link
Member

shoyer commented May 6, 2015

I have one more question for my example, related to this discussion about returning NotImplemented. If my subclass is well behaved, then it should (per Python best practices), return NotImplemented from __add__ if the operation is not well defined. This would give the other object have an opportunity to handle the operation with __radd__. If np.add never returns NotImplemented, then I think I should write something like the following for my __add__ special method:

    def __add__(self, other):
        try:
            return np.add(self, other)
        except TypeError:
            return NotImplemented

Otherwise, writing a compatible ndarray-like class will require using __numpy_ufunc__. Though I suppose that may not actually be so bad....

@shoyer
Copy link
Member

shoyer commented May 6, 2015

@mhvk For your latest example, I think you mean return NotImplemented?

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

Just to be sure I understand correctly: is @shoyer's example essentially how ndarray.__add__ is defined? I.e., in terms of my list:
The logic for <ndarray> + B()) is:

  1. CPython calls self.__add__(other), which is ndarray.__add__(self, other);
  2. ndarray.__add__ calls np.add(self, other);
  3. np.add calls other.__numpy_ufunc__;
  4. other.__numpy_ufunc__ returns NotImplementedError which is passed on by np.add;
  5. ndarray.__add__ notices an exception and returns NotImplemented to CPython;
  6. CPython calls other.__radd__(self), just as one would have hoped.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@pv - I fear I have somewhat turned this discussion off-topic. I'm still trying to understand what is wrong with the simplest __numpy_ufunc__ mechanism, i.e., why special care would need to be taken for other having both __numpy_ufunc__ and __r<op>__. Could one not just assign other the job of ensuring that its __numpy_ufunc__ raises an exception if it doesn't want to have a given ufunc be done without going through its __r<op>__ method?

@shoyer
Copy link
Member

shoyer commented May 6, 2015

@mhvk To clarify, if __add__ and __radd__ return NotImplemented, CPython will raise TypeError. NotImplementedError is a built-in error type that is entirely unrelated to the NotImplemented singleton.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2015

@shoyer - @pv and @njsmith convinced me that the appropriate thing for __numpy_ufunc__ to do is to raise an actual exception; then the ndarray.__<op>__ method turns this into NotImplemented (as in your __add__ definition). Note that one does get an exception even in my initial example:

class B(A):
    __numpy_ufunc__ = None

np.arange(3) + B()
#-> 42
np.add(np.arange(3), B())
# -> TypeError: 'NoneType' object is not callable

This behaviour seems sensible to me. But an explicit error is better (and @njsmith's TypeError makes more sense than my NotImplementedError, as this is what happens for python operations as well):

 class B(A):
    def __numpy_ufunc__(self, *args, **kwargs);
        return TypeError("Cannot deal with ndarrays via ufuncs")

np.add(np.arange(3), B())
#-> TypeError(...)

@pv
Copy link
Member

pv commented May 6, 2015

@mvhk: If ndarray dispatches immediately to ufunc, Python's native binop
mechanism cannot be used to deal with ndarrays reasonably. Then one
must use numpy_ufunc. We have then replaced the native mechanism in
the language by our own mechanism that is pretty much identical in how
it works. Besides this, the main difference in the approaches is just
what exactly is the way in which ndarray.binop lets some code in
other to run, instead of casting other to ndarray and doing its own thing.
.
Pragmatically, of course, you can deal with pretty much anything (also
array_priority, if it was consistently implemented), but I believe it is
not good design to break a fairly central part of the language, if there
is no really pressing need to do it.
.
If the logic in ndarray.binop is slightly more complicated, I don't
see this causing problems for those who want to just use numpy_ufunc.
The user side code would be the same.

@shoyer
Copy link
Member

shoyer commented May 6, 2015

@mhvk I think it is still slightly better behaved to write:

class B(A):
    def __numpy_ufunc__(self, *args, **kwargs):
        return NotImplemented

NumPy translates this into TypeError: __numpy_ufunc__ not implemented for this type, which is similar to how CPython translates __radd__ returning NotImplemented into TypeError: unsupported operand type(s) for +: 'int' and 'C'. But this also leaves the door open to some other ndarray-like object still defining ufuncs when other of the other argument is of type B.

mhvk pushed a commit to mhvk/numpy that referenced this issue Mar 12, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Mar 12, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Mar 17, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Mar 23, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Mar 25, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Mar 30, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 1, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 2, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 5, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 5, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 5, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 5, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 7, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 8, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 9, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 10, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 21, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 24, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
charris pushed a commit to charris/numpy that referenced this issue Apr 27, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
@eric-wieser
Copy link
Member

Is this resolved by the merge of #8247?

@shoyer
Copy link
Member

shoyer commented Apr 28, 2017

@eric-wieser Yes, I think so, though note my follow-up in #9014

@shoyer shoyer closed this as completed Apr 28, 2017
mherkazandjian pushed a commit to mherkazandjian/numpy that referenced this issue May 30, 2017
As per the discussion at numpygh-5844, and in particular
   numpy#5844 (comment)
this commit switches binop dispatch to mostly defer to ufuncs, except
in some specific cases elaborated in a long comment in number.c.

The basic strategy is to define a single piece of C code that knows
how to handle forward binop overrides, and we put it into
private/binop_override.h so that it can be accessed by both the array
code in multiarray.so and the scalar code in umath.so.
@seberg
Copy link
Member

seberg commented Jul 14, 2020

I know its been a long time, but @njsmith and others. I just realized we do binop stuff in our scalar code. Why is that? We should be able to assume that scalars always defer to any array-like, no? Any array-like should be able to score our scalars. There could be some stranger things about when to defer to user defined scalars as opposed to going to ufunc directly.

I am willing to bet we can just defer in that case as well though? Removing all "binop_override.h" from the scalar math code entirely does not cause any test failures for one thing (although I suppose it might then go into ufuncs when it doesn't have to necessarily).

@mattip
Copy link
Member

mattip commented Jul 14, 2020

I think the issue is speed. I wonder if there is a way to fast-path even more ufunc cases so this wouldn't matter.

@seberg
Copy link
Member

seberg commented Jul 14, 2020

Yeah, we have to fall back to the ufunc commonly, and before we fall back to the ufunc, we have to check __array_priority__, if scalars were scalars and not array-scalars that would not matter, but scalar + [1, 2, 3] actually works currently (and likely has to).
However, it does not really make sense that we do the check before doing the scalar logic. We only have to do that binop check before falling back to ufuncs.

@seberg
Copy link
Member

seberg commented Jul 14, 2020

So, I do think that we are just making scalars unnecessarily slow currently though. We need to defer once we reach the "use the array method (even though this is a scalar)" stage. That stage already does the check in either case though, so it seems we are doing the check up to 3 times if we end up going into the array branch. So I think we can remove the case completely from the scalar code.

If scalars fall back to the generic scalar code paths, these already just fall back to the array code path. So the only danger would be a scalar subclass implementing __array_ufunc__. I will be so bold and claim that in any case where a scalar is a non-numpy scalar is involved, we should always defer though (which may cause in turn fallback to the full ufunc/ndarray slot path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests