fromnumeric.py compatibility with GroupBy, Window, and tslib functions #12811

Closed
gfyoung opened this Issue Apr 6, 2016 · 8 comments

Comments

Projects
None yet
3 participants
Member

gfyoung commented Apr 6, 2016

In #12810, it was decided that compatibility for groupby (including resample.py) and window functions would be left for a separate PR / discussion, which seems reasonable given how massive #12810 already is. This issue serves a reminder to tackle this after landing #12810, as it seems like this can be easily addressed afterwards.

jreback added the Compat label Apr 6, 2016

jreback added this to the 0.18.1 milestone Apr 6, 2016

Member

gfyoung commented Apr 6, 2016

@jreback : Adding timestamps and timedeltas as well to this issue given my question about tslib.pyx (i.e. what sort of compatibility should we give to methods with numpy counterparts?)

@jreback jreback modified the milestone: 0.18.2, 0.18.1 Apr 27, 2016

Member

gfyoung commented May 7, 2016 edited

With my initial fromnumeric.py PR merged, it seems like a good idea to revisit this. The major files that I think merit examination for numpy compatibility are:

pandas/core/window/window.py
pandas/tseries/resample.py
pandas/core/groupby.py
pandas/tslib.pyx

gfyoung changed the title from fromnumeric.py compatibility with GroupBy and Window functions to fromnumeric.py compatibility with GroupBy, Window, and tslib functions May 7, 2016

Contributor

jreback commented May 7, 2016

how so, I don't really care to be compat with numpy for anything beyond very basic stuff. pls show an example.

Member

gfyoung commented May 7, 2016 edited

Examples:

tslib.round(self, freq) vs. np.round(a, decimals=0, out=None)

window.max(self, how=None, **kwargs) vs. np.max(a, axis=None, out=None, keepdims=False)

resample.var(self, ddof=1) vs. np.var(a, axis=None, out=None, ddof=0, keepdims=False)

groupby.mean(self) vs. np.mean(a, axis=None, out=None, keepdims=False)

All I was thinking of doing was putting validation calls in the implementation, similar to what was done in my previous PR and nothing more than that. I'm also perfectly fine leaving them as is since numpy decoupling is also one of our objectives with pandas.

Contributor

jreback commented May 7, 2016

In [1]: df = DataFrame({'A' : [1,2,1], 'B' : [1,2,3]})

In [2]: g = df.groupby('A')

In [3]: g.mean()
Out[3]: 
   B
A   
1  2
2  2

In [4]: np.mean(g)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-fdcd38489fcb> in <module>()
----> 1 np.mean(g)

/Users/jreback/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in mean(a, axis, dtype, out, keepdims)
   2878         try:
   2879             mean = a.mean
-> 2880             return mean(axis=axis, dtype=dtype, out=out)
   2881         except AttributeError:
   2882             pass

TypeError: mean() got an unexpected keyword argument 'axis'
Contributor

jreback commented May 7, 2016

ok that doesn't seem unreasonable

Do we actually want something like np.mean(g) to work?
A groupby object is not an array-like such as a Series. IMO we shouldn't put effort in enabling such usage

Member

gfyoung commented May 12, 2016 edited

@jorisvandenbossche : I'll leave that for you to debate with @jreback . To reiterate, I am perfectly fine either way. This is not as serious a compatibility issue as the previous one I raised in #12644.

@gfyoung gfyoung added a commit to gfyoung/pandas that referenced this issue May 19, 2016

@gfyoung gfyoung COMPAT: Expand compatibility with fromnumeric.py
Expands compatibility with fromnumeric.py in tslib.pyx and
puts checks in window.py, groupby.py, and resample.py to
ensure that pandas functions such as 'mean' are not called
via the numpy library.

Closes gh-12811.
eb4762c

jreback closed this in fecb2ca May 20, 2016

@nps nps added a commit to nps/pandas that referenced this issue May 30, 2016

@gfyoung @nps gfyoung + nps COMPAT: Further Expand Compatibility with fromnumeric.py
Follow-on to #12810 by expanding compatibility with fromnumeric.py
in the following modules:
  1) tslib.pyx
  2) window.py
  3) groupby.py and resample.py (shared classes)

Closes #12811.

Author: gfyoung <gfyoung17@gmail.com>

Closes #13148 from gfyoung/fromnumeric-compat-continued and squashes the following commits:

eb4762c [gfyoung] COMPAT: Expand compatibility with fromnumeric.py
cee5388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment