Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG : np.sum silently drops keepdims for sub-classes of ndarray #4619

Merged
merged 5 commits into from
Feb 8, 2016

Conversation

tacaswell
Copy link
Contributor

change test from type(a) is not mu.ndarray to not isinstance(a, mu.ndarray)

This bug can be demonstrated by:

class test_array(ndarray):
    pass

base_ = np.arange(30).reshape(5, 6)
tt = base_.view(test_array)

def sum_test(a):
    print(np.sum(a))
    print(np.sum(a, keepdims=True))

sum_test(base_)
sum_test(tt)

@seberg
Copy link
Member

seberg commented Apr 14, 2014

Should probably replace this with a mechanism of if keepdims=None don't pass keepdims, otherwise get the error if the subclass doesn't support.

@tacaswell
Copy link
Contributor Author

Huh, seems I broke a bunch of matrix tests

@seberg
Copy link
Member

seberg commented Apr 14, 2014

You can't know if a subclass implements keepdims so you need to switch based on keepdims is None. If you do that, please also add a test for the new behaviour (i.e. error when keepdims is given for subclasses not supporting it).

@tacaswell
Copy link
Contributor Author

Are there existing sub-classes that don't support keepdims (sorry, use numpy all the time, but have not dived very deep and this is my first time really looking at the internals) to use for the test?

It looks like a whole bunch of similar functions (product, amax, ...) will also need to be changed.

Still having a bit of trouble squaring what you are saying with my (probably wrong) mental model of how this is working underneath, and the errors from the test suite.

@seberg
Copy link
Member

seberg commented Apr 14, 2014

@tacaswell, the thing is, we cannot know that since any third party library could subclass ndarray. What I mean is:

if keepdims is None:
    subclass.method(a, axis=axis, dtype=dtype, out=out)
else:
   subclass.method(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)

@njsmith
Copy link
Member

njsmith commented Apr 14, 2014

@seberg's snippet might be better written

kwargs = {}
if keepdims is not None:
kwargs["keepdims"] = keepdims
foo.method (..., **kwargs)
On 14 Apr 2014 17:39, "seberg" notifications@github.com wrote:

@tacaswell https://github.com/tacaswell, the thing is, we cannot know
that since any third party library could subclass ndarray. What I mean is:

if keepdims is None:
subclass.method(a, axis=axis, dtype=dtype, out=out)
else:
subclass.method(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)


Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-40381323
.

@juliantaylor
Copy link
Contributor

keepdims is a boolean so it should be if not keepdims

and I think we do need to add this to all functions were we are currently wrapping subclasses

@tacaswell
Copy link
Contributor Author

Slowly starting to understand.

This is going to involve touching a large number of the functions in fromnumeric and nanfunctions.

I am using @njsmith 's pattern.

When I have all the tests passing locally I will replace this branch.

@tacaswell
Copy link
Contributor Author

I think I have it working. If people are happy with this method of addressing the issue I will add tests for all of the functions I changed + update the documentation.

@charris
Copy link
Member

charris commented Apr 22, 2014

@tcaswell needs a rebase.

@tacaswell
Copy link
Contributor Author

I wish all rebases were that easy

@juliantaylor
Copy link
Contributor

why is keepdims changed to None?

if type(a) is not mu.ndarray:
try:
amax = a.max
except AttributeError:
return _methods._amax(a, axis=axis,
out=out, keepdims=keepdims)
out=out, **kwargs)
# NOTE: Dropping the keepdims parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is obsolete now

@juliantaylor
Copy link
Contributor

I wonder if we shoud inspect the subclass and check if it really supports keepdims
it seems it was not passed intentionally, probably for compatibility reasons

@tacaswell
Copy link
Contributor Author

The default value is changed to None so that this layer can tell if the user explicitly gave a value or not. If the user did not explicitly give keepdims, then drop it (not pass the value through to the sub class) and let the sub-class set it's own default (or not have it).

In either case now the result of np.sum(a) and a.sum match as do the results of np.sum(a, keepdims=True) and a.sum(keepdims=True).

@seberg
Copy link
Member

seberg commented Apr 23, 2014

Aside the fact that it might seem weird that a subclass should override the default. matrix already does it ;) (though it doesn't give the option to disable keepdims anyway). So while it feels weird, considering that I am tending to say that the None solution is better.

@juliantaylor
Copy link
Contributor

I don't really like changing an argument type for this, aren't array subclasses almost always python classes? then you can get the default the class uses for the function:

In [5]: inspect.getargspec(pandas.Series().sum)
Out[5]: ArgSpec(args=['self', 'axis', 'skipna', 'level', 'numeric_only'], varargs=None, keywords='kwargs', defaults=(None, None, None, None))

we can also use this to warn when a subclass does not support the argument but the user passed the non-default

@njsmith
Copy link
Member

njsmith commented Apr 23, 2014

I don't like trying to use introspection for this, but I agree that changing the public API is a bit nasty.

Maybe it would be better to use an internal _NotGiven sentinel object as the actual default, and document the default as being False?

@tacaswell
Copy link
Contributor Author

What is the advantage of using _NotGiven rather than None?

@njsmith
Copy link
Member

njsmith commented Apr 23, 2014

The idea is to make it an implementation detail and minimize the API
surface area - people can't intentionally pass _NotGiven, so we don't find
ourselves down the line having to specially support people explicitly
passing None.

Actually I'm not sure why we shouldn't just use **kwargs for keepdims, out,
etc.
On 23 Apr 2014 22:52, "Thomas A Caswell" notifications@github.com wrote:

What is the advantage of using _NotGiven rather than None?


Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41219554
.

@tacaswell
Copy link
Contributor Author

I am still not convinced about _NotGiven (this being python people can get a hold of it and pass it in). I tend to interpret a default value of None as meaning 'not given', so maybe I am just being short sighted.

I do like the idea of changing to using **kwargs, but that seems like a much larger api change than False -> None.

@rgommers
Copy link
Member

Please no inspect or **kwargs.

False --> None seems perfectly acceptable if the documentation is clear. _NotGiven doesn't seem necessary to me, but if you prefer it then I'm fine with it.

@njsmith
Copy link
Member

njsmith commented Apr 24, 2014

What's wrong with kwargs? I agree that we should try to avoid it in
general, but the necessary functionality in this case is exactly that we
note whether or not the caller specified a kw argument, and if they did
specify it what its value is, and then pass on both parts of this when we
delegate to another function. In Python, kwargs is the One Obvious Way
designed to handle this case.

Note that passing keepdims=None here does not mean to call the delegatee
with keepdims=None, that's the weird bit...

On Thu, Apr 24, 2014 at 7:03 AM, Ralf Gommers notifications@github.comwrote:

Please no inspect or **kwargs.

False --> None seems perfectly acceptable if the documentation is clear.
_NotGiven doesn't seem necessary to me, but if you prefer it then I'm
fine with it.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41245442
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@seberg
Copy link
Member

seberg commented Apr 24, 2014

How bad is dropping introspection support when using kwargs? With kwargs, I would want an error right away though if there is anything but keepdims inside there... unless we decide otherwise after some discussion.
If you don't like None since it basically makes it part of the API, a _NotGiven singleton (nobody is supposed to pass it, so I don't think we have to worry about any subtleties) may be simpler anyway, and does not hinder introspection.

EDIT: So to say, I would be for just doing _NotGiven = object() and if keepdims is not _NotGiven...

@tacaswell
Copy link
Contributor Author

It seems the consensus is to create a _NotGiven module singleton to replace None as the default and the same pattern for dropping keepdims if it is not given?

I will get to this sometime today.

@rgommers
Copy link
Member

@njsmith you know why we need to avoid kwargs in general. Adding it to 18 functions (if I counted correctly) is therefore 18x doing something that is best avoided.

keepdims=None/_NotGiven is perfectly adequate here, hence I prefer it.

@njsmith
Copy link
Member

njsmith commented Apr 24, 2014

I'm fine with _NotGiven, but it's basically just an ad hoc reimplementation
of kwargs, and shares most of the disadvantages.

The real problem is that we have 18 different functions that are just
not-quite-identical wrappers around other functions :-(.

Maybe at some point we can replace most of these by thin wrappers around
ufunc calls, and then the keepdims stuff will become numpy_ufunc's
problem. Deprecating np.matrix will also help...

On Thu, Apr 24, 2014 at 5:36 PM, Ralf Gommers notifications@github.comwrote:

@njsmith https://github.com/njsmith you know why we need to avoid
kwargs in general. Adding it to 18 functions (if I counted correctly) is
therefore 18x doing something that is best avoided.

keepdims=None/_NotGiven is perfectly adequate here, hence I prefer it.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41302290
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@juliantaylor
Copy link
Contributor

whats wrong with inspect?

using None doesn't fix the issue that we are breaking the api when the user passes the argument but the subclass does not accept it.

can one check the type of an argument fast if it is a python object and no c-extension object? I'm thinking about improving the argument parsing speed of ufunc reductions which would require fast default checks. If not I would prefer None over _NotGiven unless the latter is a c-extension object.
This is why that is relevant:

In [1]: d = np.ones(1000)
In [2]: %timeit np.sum(d)
100000 loops, best of 3: 6.1 µs per loop
In [3]: %timeit np.add.reduce(d)
100000 loops, best of 3: 3.8 µs per loop

@rgommers
Copy link
Member

inspect is quite fragile and best avoided for production code. Here's a long discussion about it: http://thread.gmane.org/gmane.comp.python.scientific.devel/15860

@rgommers
Copy link
Member

@juliantaylor agree that the sum vs. add.reduce speed difference is very annoying.

@jakirkham
Copy link
Contributor

That's what I was thinking. There were a few other failures on AppVeyor and I have put them in this issue ( #6991 ).

@tacaswell
Copy link
Contributor Author

Commits squashed and made the special case in nanvar made a bit nicer, but I am still not sure exactly where to add the documentation.

@seberg
Copy link
Member

seberg commented Feb 1, 2016

I think the "documentation" was just adding a short mention in the compatibility section of the release notes probably. They are in doc/release/. Did not read the code, but seems a bit of a shame that we stalled this for so long.

@@ -1826,15 +1835,14 @@ def sum(a, axis=None, dtype=None, out=None, keepdims=False):
sum = a.sum
except AttributeError:
return _methods._sum(a, axis=axis, dtype=dtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to raise a stylistic point at this late stage, but since the return _methods._sum call is now identical to the one at the end, I would suggest rewriting this (as well as other occurrences further down) as:

# no need to elif here; always returned output above
if type(a) is not mu.ndarray:
    try:
        sum = a.sum
    except AttributeError:
        pass
    else:
        return sum(axis=axis, dtype=dtype, out=out, **kwargs)

return _methods._sum(a, axis=axis, dtype=dtype, out=out, **kwargs)

@tacaswell tacaswell force-pushed the ndarray_subclass_sum_bug branch 3 times, most recently from f0fac00 to a08f94c Compare February 5, 2016 18:45
@tacaswell
Copy link
Contributor Author

======================================================================
FAIL: test_scripts.test_f2py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/tests/test_scripts.py", line 93, in test_f2py
    assert_(success, msg)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/testing/utils.py", line 71, in assert_
    raise AssertionError(smsg)
AssertionError: Warning: neither f2py nor f2py3 nor f2py3.5 found in path
----------------------------------------------------------------------

Two tests are failing on the above which I suspect is un-related to these changes.

@seberg
Copy link
Member

seberg commented Feb 5, 2016

@charris do you want to backport this? While I like it, I think the default is probably no backport, which would mean that it should be in the 1.12 release notes.

Don't worry about those test failures, some setuptools stuff I don't even want to know about :).
EDIT: or maybe it was venv I guess, you see my point of not wanting to know ;)

@njsmith
Copy link
Member

njsmith commented Feb 5, 2016

Yeah, let's leave this for 1.12 -- it probably won't create any surprises
downstream, but it's the sort of thing that could.

On Fri, Feb 5, 2016 at 11:52 AM, seberg notifications@github.com wrote:

@charris https://github.com/charris do you want to backport this? While
I like it, I think the default is probably no backport, which would mean
that it should be in the 1.12 release notes.

Don't worry about those test failures, some setuptools stuff I don't even
want to know about :).
EDIT: or maybe it was venv I guess, you see my point of not wanting to
know ;)


Reply to this email directly or view it on GitHub
#4619 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

@tacaswell
Copy link
Contributor Author

done

@homu
Copy link
Contributor

homu commented Feb 7, 2016

☔ The latest upstream changes (presumably #7181) made this pull request unmergeable. Please resolve the merge conflicts.

tacaswell and others added 5 commits February 7, 2016 14:26
change test from `type(a) is not mu.ndarray` to
`not isinstance(a, mu.ndarray)`

Because every sub-class of ndarray is not guaranteed to implement
`keepdims` as a kwarg, when wrapping these methods care must be taken.

The previous behavior was to silently eat the kwarg when dealing with
a sub-class of ndarray.

Now, if `keepdims=np._NoValue` (the new default) it is not passed
through to the underlying function call (so the default value of
`keepdims` is now controlled by the sub-class).  If `keepdims` is not
`np._NoValue` then it is passed through and will raise an exception if
the sub-class does not support the kwarg.

A special case in nanvar was required to deal with `matrix` that previously
relied on `fromnumeric` silently dropping `keepdims`.
Move the calls to user-provided versions of std, var, and mean on non
mu.ndarray objects out of the `try` block so that numpy will not mask
AttributeError raised during the execution of the function rather than
due to the object not having the required method.
In sum, amax, amin, and prod simplify the logic to remove an identical
return statement / call to `_methods._xxx`.  This removes several
elif/else pairs and reduces the number of exit points from the functions
but makes the code path a bit more complicated to trace.
@@ -163,9 +164,14 @@ def nanmin(a, axis=None, out=None, keepdims=False):

.. versionadded:: 1.8.0
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is due to line wrap changes in #7181

charris added a commit that referenced this pull request Feb 8, 2016
BUG : np.sum silently drops keepdims for sub-classes of ndarray
@charris charris merged commit 2ee7755 into numpy:master Feb 8, 2016
@tacaswell tacaswell deleted the ndarray_subclass_sum_bug branch February 8, 2016 05:27
@charris
Copy link
Member

charris commented Jun 12, 2016

Backported for 1.11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants