BUG difference in behaviour for subclass output in ufuncs #4753

Merged
merged 1 commit into from Oct 5, 2015

Conversation

Projects
None yet
8 participants
Contributor

mhvk commented Jun 18, 2015

If one wants to have the output of a ufunc passed to a ndarray subclass, but the input is a regular array, the behaviour differs depending on whether one passes the output as a positional or as a keyword argument: if passed as positional argument, __numpy_ufunc__ is called, while if passed as a keyword argument, __array_prepare__ and __array_wrap__ are called. I would have expected for __numpy_ufunc__ to be called in both cases.

p.s. If the input is a subclass, it always works as expected.

Example script:

import numpy as np


class MyArray(np.ndarray):
    def __array_prepare__(self, obj, context=None):
        print('prepare:', self, obj, context)
        return super(MyArray, self).__array_prepare__(obj, context)

    def __array_wrap__(self, obj, context):
        print('wrap:', self, obj, context)
        return super(MyArray, self).__array_wrap__(obj, context)

    def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs):
        print('ufunc:', self, ufunc, method, i, inputs, kwargs)
        values = [np.asarray(arg) for arg in inputs]
        if 'out' in kwargs:
            kwargs['out'] = kwargs['out'].view(np.ndarray)
        result = getattr(ufunc, method)(*values, **kwargs) / 2.
        return result.view(MyArray)

a = np.array([-1., -1.]).view(MyArray)
print(repr(np.sin(np.arange(2.), a)))
a = np.array([-1., -1.]).view(MyArray)
print(repr(np.sin(np.arange(2.), out=a)))

pv added the Defect label Jun 2, 2014

Owner

pv commented Jun 2, 2014

Is this something that was not addressed in gh-4626?

EDIT: apparently yes. this is an oversight

Owner

pv commented Jun 2, 2014

@cowlicks: overlooked case. Also, what should i be when self is one of the output arrays (probably past the inputs, which is likely ok...)?

juliantaylor added this to the 1.9 blockers milestone Jun 2, 2014

mhvk referenced this pull request in astropy/astropy Jul 9, 2014

Merged

Quantity.__array_ufunc__ #2583

Contributor

mhvk commented Jan 19, 2015

Checking 1.10 blockers brought up by my __numpy_ufunc__ trials: In numpy 1.10.0.dev-256d256, this is still an issue.

Contributor

cowlicks commented Jan 23, 2015

I'll take a look this weekend.

Contributor

cowlicks commented Jan 26, 2015

Okay so what currently happens is PyUFunc_CheckOverride is called at every ufunc call. It sets result as the result of the override if one exists, if no override is found result is NULL. Inside PyUFunc_CheckOverride we check positional arguments for __numpy_ufunc__ attribute. If it is not there, set result = NULL and return 0. Clearly we need to check out here, but it will add a little overhead to every ufunc call since we are now checking a dictionary too. I think we need to insert this logic here

Owner

njsmith commented Jan 26, 2015

Surely there is somewhere in the ufunc logic where the arguments are
normalized. (Or if not there ought to be ;-).) Can't we put this checking
after that happens?

...okay reading the code you linked to I see that the numpy_ufunc code
is apparently already has its own separate reimplementation of ufunc
argument handling. I can see how we might not want to run two separate
copies of the same argument normalization on every ufunc call :-(

On Mon, Jan 26, 2015 at 4:00 AM, Blake Griffith notifications@github.com
wrote:

Okay so what currently happens is PyUFunc_CheckOverride is called at
every ufunc call. It sets result as the result of the override if one
exists, if no override is found result is NULL. Inside
PyUFunc_CheckOverride we check positional arguments for numpy_ufunc
attribute. If it is not there, set result = NULL and return 0. Clearly we
need to check out here, but it will add a little overhead to every ufunc
call since we are now checking a dictionary too. I think we need to insert
this logic here
https://github.com/numpy/numpy/blob/master/numpy/core/src/private/ufunc_override.h#L224


Reply to this email directly or view it on GitHub
#4753 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

cowlicks commented Jan 26, 2015

They are only normalized (to what __numpy_ufunc__ needs) If there is a __numpy_ufunc__ attribute present. I was reluctant to normalize first because that would add even more overhead to every ufunc call. I'm testing my prototype (based on my last comment) now. I can try both approaches and see if the timing vs code complexity tradeoff is significant.

Owner

njsmith commented Jan 26, 2015

I know it's more work and may spill over into cleaning up other people's
messes, but I think it's worth looking to see if you can move the checks
for the numpy_ufunc attribute to somewhere after the regular ufunc
argument processing, which already has to deal with positional versus
kwargs.
On 26 Jan 2015 04:22, "Blake Griffith" notifications@github.com wrote:

They are only normalized (to what numpy_ufunc needs) If there is a
numpy_ufunc attribute present. I was reluctant to normalize first
because that would add even more overhead to every ufunc call. I'm testing
my prototype (based on my last comment) now. I can try both approaches and
see if the timing vs code complexity tradeoff is significant.


Reply to this email directly or view it on GitHub
#4753 (comment).

Contributor

cowlicks commented Jan 26, 2015

Yeah, that is definitely the right way to do this. I'm not sure why I did not do it that way originally. I'll try to find where that happens and check it out... After I get this working.

Contributor

cowlicks commented Jan 26, 2015

@njsmith I found some arg parsing stuff here, but it is just for multiarray stuff? I don't think it is used for ufuncs. I need to look more into this.

My prototype is here (based on my first comment). But it is breaking np.dot and idk why.

Contributor

mhvk commented Jan 26, 2015

Had a look at the prototype and while it looks good generally, I worry it would still break for ufuncs with two outputs (where out is a tuple; see your work in #4171)

Contributor

cowlicks commented Feb 1, 2015

Little update: I rebased on current master and the segfault went away... So it wasn't my fault I guess? That had me stumped for a while. Working on a fix taking @mhvk's comments into consideration.

Contributor

cowlicks commented Feb 1, 2015

This seems to fix it. But I have not refactored to use the general arg processing as suggested by @njsmith. I have not assessed the difficulty of that yet. If it seems hard I'll make an issue with a plan of action.

I think there is still cleaning up to do. And I have not checked that I did not create any memory leaks. I'm not confident in my C coding. So I'll double check this then make a PR.

Contributor

mhvk commented Feb 2, 2015

looks good to me, but am definitely even less of an expert.

Contributor

cowlicks commented Feb 2, 2015

There is a bit of a problem. The __numpy_ufunc__ protocol specifies it has the signature:

def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)

Where "i is the index of self in inputs" but output arguments get put into kwargs. So what should the i index be in this case? (See the NEP here).

Basically __numpy_ufunc__ does not account for the output arguments being able to override the ufunc. I'm still thinking about the best way to approach this. Suggestions? @pv

Contributor

mhvk commented Feb 2, 2015

FWIW: I think the approach you took, of just continuing to add to i, is the most logical one. It is important for an output to be able to have __numpy_ufunc__ and thus to tell that it cannot handle the result.

Contributor

cowlicks commented Feb 2, 2015

@mhvk But it is breaking this test.

Basically the test uses the "defective" behavior.

Contributor

mhvk commented Feb 2, 2015

Hmm, fortunately, __numpy_ufunc__ has not been part of a release, so still in time to get it right! It seems obvious the implementer will have to be able to deal with either i is None or i >= len(inputs). The latter would carry more information, so seems preferred.

Contributor

ewmoore commented Feb 2, 2015

If it hasn't been released perhaps the signature should be changed. Maybe add an o parameter or make i be input_index and add an output_index and require that one or the other always be None. This seems much clearer than indexing into a list of output arrays if i >= len(inputs).

Is it documented anywhere that passing an output array can effect the ufunc dispatch? Seems kind of surprising that it work this way.

Contributor

mhvk commented Feb 2, 2015

@ewmoore - but the output should not just be treated as an arbitrary ndarray either (indeed, that is what the current __array_prepare__ and __array_wrap__ do well).

Owner

pv commented Feb 2, 2015

Only the option i >= len(inputs) is compatible with code in Scipy.
That it's out there is somewhat non-ideal situation, but what is done is
done and needs to be considered now. If a different interface is
desired, then __numpy_ufunc__ needs to be renamed to
__numpy_override__ or something similar.

Contributor

mhvk commented Feb 2, 2015

I'd also think i>=len(inputs) is most logical conceptually in the sense that it now effectively indexes *args in the original call for the case that no out keyword was present.

Contributor

cowlicks commented Feb 5, 2015

Okay. I'll correct the test that is failing to check for i >= len(inputs), do some cleanup, and make a PR.

Contributor

mhvk commented Jun 11, 2015

@cowlicks - did you ever get around to fixing this? You had #5566, but you closed that intending to make a different solution...

Contributor

cowlicks commented Jun 11, 2015

No I never got around it. There is a lot of discussion around __numpy_ufunc__ and I have not had the bandwidth to follow it. It seemed like a rewrite was being proposed in #5844? But I don't really have time to read that whole thread.

Contributor

mhvk commented Jun 18, 2015

@cowlicks - I think this bug has little to do with the (long) discussion about what to do with ndarray binops, but rather affects __numpy_ufunc__ more generally. Since we're close to release, I went ahead and reworked your earlier pull request, in a way that I think will address all previous comments. Please have a look. (My own sense: OK, but needs a few tests that i is indeed what it should be.) cc @pv, @njsmith, @jaimefrio, @charris, since they all looked at your original PR (#5566).

@cowlicks @mhvk cowlicks Check `out` kwarg for __nump_ufunc__ override and set index appropria…
…tely

for the case where self is among outputs but not among inputs. Ensure it
works both out passed on as an argument and with out in a keyword argument.
288239d
Contributor

mhvk commented Jun 19, 2015

OK, I added further tests that the index passed on to __numpy_ufunc__ is in fact correct in all cases. I also rebased and merged all commits into one. I note that there was some discussion that it would have been more logical to do this test after normalising the arguments, but that doesn't really matter for the functionality, and at this late stage it seems best to just fix the bug rather than try to overhaul the order.

Owner

njsmith commented Jun 19, 2015

OK, going to say something unpopular. (Certainly I'm annoyed at myself for saying it, but nonetheless...) I've been debating this with myself for a while, and I've come to the conclusion that we should bite the bullet and drop the index argument from __numpy_ufunc__'s signature. This will require some care to avoid breaking already released versions of scipy.sparse (lesson learned: we should never have released scipy.sparse with __numpy_ufunc__ before it came out in numpy), but that's better than leaving it forever.

Rationale:

  • the index argument is never useful. The only time it's useful at all is for classes that want to cast self to ndarray and recurse, and even in those cases they can just as well do a loop and handle all objects of their type at once. It'll be faster too. None of the example classes we've written during the binop discussion have used it at all.
  • the impossibility of having a useful value for output arguments makes there index even more useless. (Sure we can set it to
    n > nin, but what's the use of this? Writing code to check for this and do something sensible is more complicated than just finding the array by hand.)
  • there are almost certainly other arguments we should dispatch on, like where=. What do we set the index to in this case?

Sorry. I'm the one who suggested adding it in the first place, but it was a bad idea :-/

Also while we're at it, we should make the out= argument unconditionally a tuple, rather than sometimes a single ndarray and sometimes a tuple of them. This would simplify argument processing (see: all the times we wrote if not isinstance(out, tuple): out = (out,) on the example binop implementations).

Contributor

mhvk commented Jun 19, 2015

@njsmith - I made your point a new issue #5986, as I think it deserves better discussion than it might get here. Also, this PR is really not so much about what to do with the index as about ensuring that out arguments are able to override __numpy_ufunc__, whether given as an argument or as a keyword argument.

Owner

charris commented Jun 27, 2015

@mhvk, @njsmith So where does this sit now?

Owner

njsmith commented Jun 28, 2015

I think the status is: this is an uncontroversial bug, and @mhvk has a plausible patch here, which hasn't been converted into a PR yet.

This isn't necessarily a 1.10 blocker (it would be annoying to release with a bug in __numpy_ufunc__, but it would just be a bug; we could fix it in 1.10.1 if we had to).

As a practical matter it would probably be best to get #6001 and #5986 sorted out first before addressing this, to avoid collisions.

Contributor

mhvk commented Jun 28, 2015

@njsmith - note that my patch did get converted to a PR -- this issue itself has been converted to a PR [1]. I actually don't think it impacts any other the other PR too much. If we remove index, it would effectively be removed again, but that's fine.

[1] See http://docs.astropy.org/en/latest/development/workflow/additional_git_topics.html#converting-a-github-issue-to-a-pull-request

Owner

charris commented Aug 3, 2015

Moved this to the 1.11.0 release with the rest of the __numpy_ufunc__ stuff.

Owner

charris commented Oct 3, 2015

@mhvk What need to be done to bring this to completion. If we need to change the interface, now would be a good time to get it decided. @pv IIRC, we are also going to need a rename for scipy sparse.

Contributor

mhvk commented Oct 5, 2015

@charris - I think this PR is all OK for the present __numpy_ufunc__. If we do decide to change that, including removing the index from the call signature, then most of the actual code will likely get removed, but some of the tests will guard against this problem recurring (specifically, https://github.com/numpy/numpy/pull/4753/files#diff-bc1e8d40b8b7758facc5e143be016f99R2515).

So, my feeling is to merge this, in order to let the test case rather than this open PR guard against the problem recurring.

@charris charris added a commit that referenced this pull request Oct 5, 2015

@charris charris Merge pull request #4753 from mhvk/bug-4753
BUG difference in behaviour for subclass output in ufuncs
68e61c2

@charris charris merged commit 68e61c2 into numpy:master Oct 5, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Owner

charris commented Oct 5, 2015

@mhvk OK, merging. Thanks.

mhvk deleted the mhvk:bug-4753 branch Oct 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment