Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add level kwarg for Series.any/.all #8302

Closed
immerrr opened this issue Sep 17, 2014 · 7 comments · Fixed by #8550
Closed

API: add level kwarg for Series.any/.all #8302

immerrr opened this issue Sep 17, 2014 · 7 comments · Fixed by #8550

Comments

@immerrr
Copy link
Contributor

immerrr commented Sep 17, 2014

It appears that Series' s.any and s.all methods miss level kwargs, unlike their statistical counterparts like s.sum:

In [4]: s = pd.Series([0,1,2], index=[0,0,1])

In [5]: s.sum(level=0)
Out[5]: 
0    1
1    2
dtype: int64

In [6]: s.prod(level=0)
Out[6]: 
0    0
1    2
dtype: int64

In [7]: s.any(level=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1d8c43752bc9> in <module>()
----> 1 s.any(level=0)

/home/immerrr/sources/pandas/pandas/core/series.pyc in f(self, *args, **kwargs)
     74     @Appender(func.__doc__)
     75     def f(self, *args, **kwargs):
---> 76         result = func(self.values, *args, **kwargs)
     77         if isinstance(result, (pa.Array, Series)) and result.ndim == 0:
     78             # return NumPy type

TypeError: _any() got an unexpected keyword argument 'level'

In [8]: s.all(level=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-bca0491001a6> in <module>()
----> 1 s.all(level=0)

/home/immerrr/sources/pandas/pandas/core/series.pyc in f(self, *args, **kwargs)
     74     @Appender(func.__doc__)
     75     def f(self, *args, **kwargs):
---> 76         result = func(self.values, *args, **kwargs)
     77         if isinstance(result, (pa.Array, Series)) and result.ndim == 0:
     78             # return NumPy type

TypeError: _all() got an unexpected keyword argument 'level'

Frames have those and I think so should series. Maybe, there are more reduction methods that I know not of that also miss those...

@staple
Copy link
Contributor

staple commented Oct 1, 2014

Hi, I'd like to take a crack at this one.

@staple
Copy link
Contributor

staple commented Oct 1, 2014

It looks like most aggregation functions are implemented in NDFrame and accept standardized arguments. All and any, however, are implemented separately in Series (via IndexOpsMixin) and DataFrame. The Series implementations forward to the corresponding numpy implementations, and so accept a different set of arguments.

Currently most of the aggregation functions accept something like:
axis, skipna, level, numeric_only, *kwargs
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html
(there are minor differences between functions)

While any/all accept:
axis, out, keepdims
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.any.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.any.html

The request in this ticket is for any/all to accept the level argument. It seems like we might want to implement any/all the same way the other aggregation functions are implemented in NDFrame rather than as thin wrappers around the numpy implementations. Would it make sense to:

  1. Make any/all support the union of argument accepted by numpy and the standard aggregation functions.
    or
  2. Deprecate support for the numpy only arguments, and only support the standard aggregation arguments. (This means deprecating support for ‘out’ and ‘keepdims’.) There seems to be some precedent for this option, as for example Series.sum does not accept 'out' and 'keepdims' while ndarray.sum does. In a very cursory scan I didn’t see any cases where pandas internally depends on using the out and keepdims arguments on Series’ any/all.

@jreback
Copy link
Contributor

jreback commented Oct 1, 2014

@staple any/all is just historical that they just forward to the numpy accessors. You can simply write them llike sum/mean/etc (eg.. use a function generator) and accept these level. no need to deprecate anything. (just accept **kwargs for compat). These also can be different (and should be)

skipna doesn't make any sense (these are boolean arrays ater all).

@immerrr
Copy link
Contributor Author

immerrr commented Oct 2, 2014

I'm also not sure if numeric_only makes sense for them.

@staple
Copy link
Contributor

staple commented Oct 5, 2014

I noticed that in DataFrame's any / all, there is an implementation for parameters skipna and numeric_only. My guess is it will make sense to generalize this implementation into NDFrame, preserving the skipna and numeric_only arguments and thereby making them available for Series as well.

Additionally, we can continue to support Series' existing special arguments from ndarray.any/all via kwargs, which will allow named arguments to be handled the way they were before. But unnamed arguments, for example, series.all(0, type(True), g, True) will not behave the same. This might not be a big deal, but I wanted to at least check to see if an api change like this requires any special actions.

@jreback
Copy link
Contributor

jreback commented Oct 5, 2014

@staple

ok, start by moving DataFrame.any/all to core/generic.py this exposes them generally (and I think they will simply work), but you will need to define them like .sum/.mean etc are defined (e.g. in make_stat_function(its possible you will need to define a new function, say make_stat_bool_function to handle this slightly differently), try to conform to how any/all work now.

Then, in core/base.py you can remove this _unbox stuff, and define any/all which will ONLY apply to Index (fyi, prob no tests for this). as the Series definition will be taken from core/generic.py (its defined after the core/base.py)

so this definitely needs cleanup as was never done originally.

@jreback jreback modified the milestones: 0.16, 0.15.1 Oct 7, 2014
@staple
Copy link
Contributor

staple commented Oct 13, 2014

Hi, please see my PR #8550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants