Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility issues with numpy's fromnumeric.py #12644

Closed
gfyoung opened this issue Mar 16, 2016 · 12 comments
Closed

Compatibility issues with numpy's fromnumeric.py #12644

gfyoung opened this issue Mar 16, 2016 · 12 comments
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Mar 16, 2016

A recent spate of issues/PR's stemming from calling functions defined in numpy's fromnumeric.py module here that have identically-named but differently implemented methods/functions in pandas is indicative of a much larger compatibility issue between the two libraries with this module. A thorough overview of all of the functions from the fromnumeric.py module and cross-referencing them to implementations in pandas is needed to avoid similar issues.

Relevant PRs:
#12413 (issue: #12238)
#12603 (issue: #12600)
#12638

#7325 (from numpy)

@gfyoung
Copy link
Member Author

gfyoung commented Mar 16, 2016

As mentioned in #12600, I'll tackle this as a follow-up once these PR's are landed.

@jreback jreback added API Design Compat pandas objects compatability with Numpy or Python functions labels Mar 16, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 16, 2016
@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

as discussed we basically have 2 classes of issues:

  • like sorter a seemingly innocuous argument that numpy needs, but pandas does not. So soln is now to pass thru, with no checks (and its still a named argument, currently passed via position from numpy). we should note in the doc-string this behavior.
  • like .round,.idxmax,stat functions. mainly the out argument which is not needed (and confusing to pandas). soln is to allow **kwargs, but check them for invalid args (to avoid misspellings and such). and raise if this particular arg is not None (in this case out).

@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

@wesm
Copy link
Member

wesm commented Mar 16, 2016

IMHO we should not be striving to make pandas API compatible with NumPy (except offering an __array__ API, of course), but we should avoid unnecessary / common conflicts if possible

@gfyoung
Copy link
Member Author

gfyoung commented Mar 16, 2016

@wesm: Agreed. I think in this case though trying to "align" the API with numpy's makes sense because it should be perfectly legal for example to call either np.searchsorted or Series.searchsorted without Python blowing up on the user.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

@gfyoung numpy's behavior is a bug really, in that it shouldn't just call _wrap_it and assume everything is a sub-class (like it does). I know you are fixing that, so this is really for compat.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 16, 2016

@jreback : Right, I guess "align" gives the connotation that pandas is doing something wrong, when we're really just trying to "accommodate" numpy's buggy API.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 21, 2016

Well, now pandas is not alone. numpy's close cousin scipy has these exact same compatibility issues too as I filed just now.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 22, 2016

My numpy PR has been merged. So now (hopefully) we can just worry about backwards compatibility.

@jreback
Copy link
Contributor

jreback commented Mar 22, 2016

maybe add order arg from np.argsort as well (IOW we could remove from pandas)

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

prob could start with seeing which functions call _wrap_it on the numpy side here.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 6, 2016

A massive PR (#12810) addressing this issue is finally up. There were a lot more incompatibilities than I had expected. Hopefully this PR should address almost if not all of them.

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 27, 2016
@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 30, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue May 1, 2016
Augment pandas array-like methods with appropriate parameters
(generally, '*args' and '**kwargs') so that they can be called
via analogous functions in the numpy library they are defined in
'fromnumeric.py'.

Closes pandas-devgh-12638.
Closes pandas-devgh-12644.
Closes pandas-devgh-12687.
@jreback jreback closed this as completed in 23eb483 May 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants