BUG : np.sum silently drops keepdims for sub-classes of ndarray #4619

tacaswell · 2014-04-14T12:47:38Z

change test from type(a) is not mu.ndarray to not isinstance(a, mu.ndarray)

This bug can be demonstrated by:

class test_array(ndarray):
    pass

base_ = np.arange(30).reshape(5, 6)
tt = base_.view(test_array)

def sum_test(a):
    print(np.sum(a))
    print(np.sum(a, keepdims=True))

sum_test(base_)
sum_test(tt)

seberg · 2014-04-14T13:41:59Z

Should probably replace this with a mechanism of if keepdims=None don't pass keepdims, otherwise get the error if the subclass doesn't support.

tacaswell · 2014-04-14T13:52:30Z

Huh, seems I broke a bunch of matrix tests

seberg · 2014-04-14T15:06:38Z

You can't know if a subclass implements keepdims so you need to switch based on keepdims is None. If you do that, please also add a test for the new behaviour (i.e. error when keepdims is given for subclasses not supporting it).

tacaswell · 2014-04-14T15:30:59Z

Are there existing sub-classes that don't support keepdims (sorry, use numpy all the time, but have not dived very deep and this is my first time really looking at the internals) to use for the test?

It looks like a whole bunch of similar functions (product, amax, ...) will also need to be changed.

Still having a bit of trouble squaring what you are saying with my (probably wrong) mental model of how this is working underneath, and the errors from the test suite.

seberg · 2014-04-14T15:39:48Z

@tacaswell, the thing is, we cannot know that since any third party library could subclass ndarray. What I mean is:

if keepdims is None:
    subclass.method(a, axis=axis, dtype=dtype, out=out)
else:
   subclass.method(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)

njsmith · 2014-04-14T16:43:32Z

@seberg's snippet might be better written

kwargs = {}
if keepdims is not None:
kwargs["keepdims"] = keepdims
foo.method (..., **kwargs)
On 14 Apr 2014 17:39, "seberg" notifications@github.com wrote:

@tacaswell https://github.com/tacaswell, the thing is, we cannot know
that since any third party library could subclass ndarray. What I mean is:

if keepdims is None:
subclass.method(a, axis=axis, dtype=dtype, out=out)
else:
subclass.method(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-40381323
.

juliantaylor · 2014-04-14T16:53:48Z

keepdims is a boolean so it should be if not keepdims

and I think we do need to add this to all functions were we are currently wrapping subclasses

tacaswell · 2014-04-14T17:32:42Z

Slowly starting to understand.

This is going to involve touching a large number of the functions in fromnumeric and nanfunctions.

I am using @njsmith 's pattern.

When I have all the tests passing locally I will replace this branch.

tacaswell · 2014-04-14T18:25:14Z

I think I have it working. If people are happy with this method of addressing the issue I will add tests for all of the functions I changed + update the documentation.

charris · 2014-04-22T23:35:40Z

@tcaswell needs a rebase.

tacaswell · 2014-04-23T01:36:13Z

I wish all rebases were that easy

juliantaylor · 2014-04-23T08:15:58Z

why is keepdims changed to None?

juliantaylor · 2014-04-23T08:18:59Z

numpy/core/fromnumeric.py

    if type(a) is not mu.ndarray:
        try:
            amax = a.max
        except AttributeError:
            return _methods._amax(a, axis=axis,
-                                out=out, keepdims=keepdims)
+                                out=out, **kwargs)
        # NOTE: Dropping the keepdims parameter


this comment is obsolete now

juliantaylor · 2014-04-23T08:21:22Z

I wonder if we shoud inspect the subclass and check if it really supports keepdims
it seems it was not passed intentionally, probably for compatibility reasons

tacaswell · 2014-04-23T12:07:48Z

The default value is changed to None so that this layer can tell if the user explicitly gave a value or not. If the user did not explicitly give keepdims, then drop it (not pass the value through to the sub class) and let the sub-class set it's own default (or not have it).

In either case now the result of np.sum(a) and a.sum match as do the results of np.sum(a, keepdims=True) and a.sum(keepdims=True).

seberg · 2014-04-23T12:33:47Z

Aside the fact that it might seem weird that a subclass should override the default. matrix already does it ;) (though it doesn't give the option to disable keepdims anyway). So while it feels weird, considering that I am tending to say that the None solution is better.

juliantaylor · 2014-04-23T17:07:05Z

I don't really like changing an argument type for this, aren't array subclasses almost always python classes? then you can get the default the class uses for the function:

In [5]: inspect.getargspec(pandas.Series().sum)
Out[5]: ArgSpec(args=['self', 'axis', 'skipna', 'level', 'numeric_only'], varargs=None, keywords='kwargs', defaults=(None, None, None, None))

we can also use this to warn when a subclass does not support the argument but the user passed the non-default

njsmith · 2014-04-23T20:28:13Z

I don't like trying to use introspection for this, but I agree that changing the public API is a bit nasty.

Maybe it would be better to use an internal _NotGiven sentinel object as the actual default, and document the default as being False?

tacaswell · 2014-04-23T21:52:19Z

What is the advantage of using _NotGiven rather than None?

njsmith · 2014-04-23T22:05:04Z

The idea is to make it an implementation detail and minimize the API
surface area - people can't intentionally pass _NotGiven, so we don't find
ourselves down the line having to specially support people explicitly
passing None.

Actually I'm not sure why we shouldn't just use **kwargs for keepdims, out,
etc.
On 23 Apr 2014 22:52, "Thomas A Caswell" notifications@github.com wrote:

What is the advantage of using _NotGiven rather than None?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41219554
.

tacaswell · 2014-04-23T22:43:24Z

I am still not convinced about _NotGiven (this being python people can get a hold of it and pass it in). I tend to interpret a default value of None as meaning 'not given', so maybe I am just being short sighted.

I do like the idea of changing to using **kwargs, but that seems like a much larger api change than False -> None.

rgommers · 2014-04-24T06:03:52Z

Please no inspect or **kwargs.

False --> None seems perfectly acceptable if the documentation is clear. _NotGiven doesn't seem necessary to me, but if you prefer it then I'm fine with it.

njsmith · 2014-04-24T11:54:36Z

What's wrong with kwargs? I agree that we should try to avoid it in
general, but the necessary functionality in this case is exactly that we
note whether or not the caller specified a kw argument, and if they did
specify it what its value is, and then pass on both parts of this when we
delegate to another function. In Python, kwargs is the One Obvious Way
designed to handle this case.

Note that passing keepdims=None here does not mean to call the delegatee
with keepdims=None, that's the weird bit...

On Thu, Apr 24, 2014 at 7:03 AM, Ralf Gommers notifications@github.comwrote:

Please no inspect or **kwargs.

False --> None seems perfectly acceptable if the documentation is clear.
_NotGiven doesn't seem necessary to me, but if you prefer it then I'm
fine with it.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41245442
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

seberg · 2014-04-24T12:30:49Z

How bad is dropping introspection support when using kwargs? With kwargs, I would want an error right away though if there is anything but keepdims inside there... unless we decide otherwise after some discussion.
If you don't like None since it basically makes it part of the API, a _NotGiven singleton (nobody is supposed to pass it, so I don't think we have to worry about any subtleties) may be simpler anyway, and does not hinder introspection.

EDIT: So to say, I would be for just doing _NotGiven = object() and if keepdims is not _NotGiven...

tacaswell · 2014-04-24T15:39:37Z

It seems the consensus is to create a _NotGiven module singleton to replace None as the default and the same pattern for dropping keepdims if it is not given?

I will get to this sometime today.

rgommers · 2014-04-24T16:36:53Z

@njsmith you know why we need to avoid kwargs in general. Adding it to 18 functions (if I counted correctly) is therefore 18x doing something that is best avoided.

keepdims=None/_NotGiven is perfectly adequate here, hence I prefer it.

njsmith · 2014-04-24T16:40:14Z

I'm fine with _NotGiven, but it's basically just an ad hoc reimplementation
of kwargs, and shares most of the disadvantages.

The real problem is that we have 18 different functions that are just
not-quite-identical wrappers around other functions :-(.

Maybe at some point we can replace most of these by thin wrappers around
ufunc calls, and then the keepdims stuff will become numpy_ufunc's
problem. Deprecating np.matrix will also help...

On Thu, Apr 24, 2014 at 5:36 PM, Ralf Gommers notifications@github.comwrote:

@njsmith https://github.com/njsmith you know why we need to avoid
kwargs in general. Adding it to 18 functions (if I counted correctly) is
therefore 18x doing something that is best avoided.

keepdims=None/_NotGiven is perfectly adequate here, hence I prefer it.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4619#issuecomment-41302290
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

juliantaylor · 2014-04-24T17:47:36Z

whats wrong with inspect?

using None doesn't fix the issue that we are breaking the api when the user passes the argument but the subclass does not accept it.

can one check the type of an argument fast if it is a python object and no c-extension object? I'm thinking about improving the argument parsing speed of ufunc reductions which would require fast default checks. If not I would prefer None over _NotGiven unless the latter is a c-extension object.
This is why that is relevant:

In [1]: d = np.ones(1000)
In [2]: %timeit np.sum(d)
100000 loops, best of 3: 6.1 µs per loop
In [3]: %timeit np.add.reduce(d)
100000 loops, best of 3: 3.8 µs per loop

rgommers · 2014-04-24T18:34:36Z

inspect is quite fragile and best avoided for production code. Here's a long discussion about it: http://thread.gmane.org/gmane.comp.python.scientific.devel/15860

rgommers · 2014-04-24T18:37:56Z

@juliantaylor agree that the sum vs. add.reduce speed difference is very annoying.

jakirkham · 2016-01-11T14:45:58Z

That's what I was thinking. There were a few other failures on AppVeyor and I have put them in this issue ( #6991 ).

tacaswell · 2016-01-17T03:50:43Z

Commits squashed and made the special case in nanvar made a bit nicer, but I am still not sure exactly where to add the documentation.

seberg · 2016-02-01T09:44:34Z

I think the "documentation" was just adding a short mention in the compatibility section of the release notes probably. They are in doc/release/. Did not read the code, but seems a bit of a shame that we stalled this for so long.

mhvk · 2016-02-01T14:56:38Z

numpy/core/fromnumeric.py

@@ -1826,15 +1835,14 @@ def sum(a, axis=None, dtype=None, out=None, keepdims=False):
            sum = a.sum
        except AttributeError:
            return _methods._sum(a, axis=axis, dtype=dtype,


Sorry to raise a stylistic point at this late stage, but since the return _methods._sum call is now identical to the one at the end, I would suggest rewriting this (as well as other occurrences further down) as:

# no need to elif here; always returned output above if type(a) is not mu.ndarray: try: sum = a.sum except AttributeError: pass else: return sum(axis=axis, dtype=dtype, out=out, **kwargs) return _methods._sum(a, axis=axis, dtype=dtype, out=out, **kwargs)

tacaswell · 2016-02-05T19:45:30Z

======================================================================
FAIL: test_scripts.test_f2py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/tests/test_scripts.py", line 93, in test_f2py
    assert_(success, msg)
  File "/home/travis/build/numpy/numpy/venv-for-wheel/lib/python3.5/site-packages/numpy/testing/utils.py", line 71, in assert_
    raise AssertionError(smsg)
AssertionError: Warning: neither f2py nor f2py3 nor f2py3.5 found in path
----------------------------------------------------------------------

Two tests are failing on the above which I suspect is un-related to these changes.

seberg · 2016-02-05T19:49:16Z

@charris do you want to backport this? While I like it, I think the default is probably no backport, which would mean that it should be in the 1.12 release notes.

Don't worry about those test failures, some setuptools stuff I don't even want to know about :).
EDIT: or maybe it was venv I guess, you see my point of not wanting to know ;)

njsmith · 2016-02-05T20:35:38Z

Yeah, let's leave this for 1.12 -- it probably won't create any surprises
downstream, but it's the sort of thing that could.

On Fri, Feb 5, 2016 at 11:52 AM, seberg notifications@github.com wrote:

@charris https://github.com/charris do you want to backport this? While
I like it, I think the default is probably no backport, which would mean
that it should be in the 1.12 release notes.

Don't worry about those test failures, some setuptools stuff I don't even
want to know about :).
EDIT: or maybe it was venv I guess, you see my point of not wanting to
know ;)

—
Reply to this email directly or view it on GitHub
#4619 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

tacaswell · 2016-02-05T20:44:16Z

done

homu · 2016-02-07T17:39:13Z

☔ The latest upstream changes (presumably #7181) made this pull request unmergeable. Please resolve the merge conflicts.

change test from `type(a) is not mu.ndarray` to `not isinstance(a, mu.ndarray)` Because every sub-class of ndarray is not guaranteed to implement `keepdims` as a kwarg, when wrapping these methods care must be taken. The previous behavior was to silently eat the kwarg when dealing with a sub-class of ndarray. Now, if `keepdims=np._NoValue` (the new default) it is not passed through to the underlying function call (so the default value of `keepdims` is now controlled by the sub-class). If `keepdims` is not `np._NoValue` then it is passed through and will raise an exception if the sub-class does not support the kwarg. A special case in nanvar was required to deal with `matrix` that previously relied on `fromnumeric` silently dropping `keepdims`.

Move the calls to user-provided versions of std, var, and mean on non mu.ndarray objects out of the `try` block so that numpy will not mask AttributeError raised during the execution of the function rather than due to the object not having the required method.

In sum, amax, amin, and prod simplify the logic to remove an identical return statement / call to `_methods._xxx`. This removes several elif/else pairs and reduces the number of exit points from the functions but makes the code path a bit more complicated to trace.

tacaswell · 2016-02-07T19:31:02Z

numpy/lib/nanfunctions.py

@@ -163,9 +164,14 @@ def nanmin(a, axis=None, out=None, keepdims=False):

        .. versionadded:: 1.8.0
    keepdims : bool, optional
-        If this is set to True, the axes which are reduced are left in the


this is due to line wrap changes in #7181

BUG : np.sum silently drops keepdims for sub-classes of ndarray

charris · 2016-06-12T20:16:26Z

Backported for 1.11.1.

juliantaylor reviewed Apr 23, 2014
View reviewed changes

tacaswell force-pushed the ndarray_subclass_sum_bug branch from 1d80259 to 75db4e0 Compare January 17, 2016 03:49

mhvk reviewed Feb 1, 2016
View reviewed changes

tacaswell force-pushed the ndarray_subclass_sum_bug branch 3 times, most recently from f0fac00 to a08f94c Compare February 5, 2016 18:45

tacaswell force-pushed the ndarray_subclass_sum_bug branch from a08f94c to 7b6c143 Compare February 5, 2016 20:43

tacaswell and others added 5 commits February 7, 2016 14:26

STY: pep8 only

010d17e

DOC: add release notes

e1621a9

tacaswell force-pushed the ndarray_subclass_sum_bug branch from 7b6c143 to e1621a9 Compare February 7, 2016 19:28

tacaswell reviewed Feb 7, 2016
View reviewed changes

charris added a commit that referenced this pull request Feb 8, 2016

Merge pull request #4619 from tacaswell/ndarray_subclass_sum_bug

2ee7755

BUG : np.sum silently drops keepdims for sub-classes of ndarray

charris merged commit 2ee7755 into numpy:master Feb 8, 2016

This was referenced Feb 8, 2016

[MRG]: Fweight and aweight support for np.var #5930

Closed

TST: Suppressed warnings #7099

Merged

tacaswell deleted the ndarray_subclass_sum_bug branch February 8, 2016 05:27

charris mentioned this pull request Mar 7, 2016

BUG: np.sum with keepdims=True fails on masked_array #7312

Closed

charris mentioned this pull request Jun 12, 2016

Masked arrays: keepdims is not enforced in numpy.ma.amax #7720

Closed

cgohlke mentioned this pull request Jul 18, 2016

reload(numpy);numpy.max([]) raises TypeError #7844

Closed

BUG : np.sum silently drops keepdims for sub-classes of ndarray #4619

BUG : np.sum silently drops keepdims for sub-classes of ndarray #4619

Conversation

tacaswell commented Apr 14, 2014

seberg commented Apr 14, 2014

tacaswell commented Apr 14, 2014

seberg commented Apr 14, 2014

tacaswell commented Apr 14, 2014

seberg commented Apr 14, 2014

njsmith commented Apr 14, 2014

juliantaylor commented Apr 14, 2014

tacaswell commented Apr 14, 2014

tacaswell commented Apr 14, 2014

charris commented Apr 22, 2014

tacaswell commented Apr 23, 2014

juliantaylor commented Apr 23, 2014

juliantaylor Apr 23, 2014

Choose a reason for hiding this comment

juliantaylor commented Apr 23, 2014

tacaswell commented Apr 23, 2014

seberg commented Apr 23, 2014

juliantaylor commented Apr 23, 2014

njsmith commented Apr 23, 2014

tacaswell commented Apr 23, 2014

njsmith commented Apr 23, 2014

tacaswell commented Apr 23, 2014

rgommers commented Apr 24, 2014

njsmith commented Apr 24, 2014

seberg commented Apr 24, 2014

tacaswell commented Apr 24, 2014

rgommers commented Apr 24, 2014

njsmith commented Apr 24, 2014

juliantaylor commented Apr 24, 2014

rgommers commented Apr 24, 2014

rgommers commented Apr 24, 2014

jakirkham commented Jan 11, 2016

tacaswell commented Jan 17, 2016

seberg commented Feb 1, 2016

mhvk Feb 1, 2016

Choose a reason for hiding this comment

tacaswell commented Feb 5, 2016

seberg commented Feb 5, 2016

njsmith commented Feb 5, 2016

tacaswell commented Feb 5, 2016

homu commented Feb 7, 2016

tacaswell Feb 7, 2016

Choose a reason for hiding this comment

charris commented Jun 12, 2016