Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG, MAINT: Improve fromnumeric.py interface for downstream compatibility #7325

Merged
merged 2 commits into from
Mar 22, 2016
Merged

BUG, MAINT: Improve fromnumeric.py interface for downstream compatibility #7325

merged 2 commits into from
Mar 22, 2016

Conversation

gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Feb 24, 2016

Motivation stemmed from PR in pandas in which I was trying to make searchsorted in pandas have a compatible signature with numpy without externalizing the sorter argument (it isn't used in the pandas implementation) by accepting a kwargs argument to swallow the sorter arg.

However, because np.searchsorted treats sorter like a positional argument in the internal call and NOT as a keyword argument as it should be based on the method signature, it was not possible to do so. This PR fixes that inconsistency as well as in all other np functions that use _wrapit in the implementation.

xref #12644 (pandas)

@seberg seberg closed this Feb 24, 2016
@seberg
Copy link
Member

seberg commented Feb 24, 2016

OOPs, accident :(. Though I am a bit dubious about whether this can't break a lot of subclasses.

@seberg seberg reopened this Feb 24, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 24, 2016

I don't quite understand your comment with the double negative.

@seberg
Copy link
Member

seberg commented Feb 24, 2016

Sorry. Just wondering if all subclasses use the same argument names, and whether it is possible to do C-Api subclasses which might not use compatible parsing at all. Overall, this would be rare, but it would break them, so if it exists, this is probably a no-go.

@seberg
Copy link
Member

seberg commented Feb 24, 2016

I guess all of these are default arguments, so it makes it less likely. Not sure if it is possible or not though.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 24, 2016

Well at least internally, numpy is happy, so that is a good sign. I would sure hope people elsewhere have respected the external function signature and used keyword arguments as well. If there is no visible issue with this change, can it be merged?

@seberg
Copy link
Member

seberg commented Feb 24, 2016

Lets move the discussion to the mailing list first. But frankly I am doubtful.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 24, 2016

Sure thing. I am surprised to hear though that you are pessimistic about this being merge-able, as I'm not sure why people wouldn't want to copy the function signature (i.e. use keyword arguments when numpy does and vice-versa) that is presented externally to them if they are implementing their own version. I guess we shall see.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 24, 2016

@jreback: Wanted to add you here to the conversation since it was our discussion in the pandas library that led to this PR.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 29, 2016

Any suggestions on how to proceed on this one? I sent out an email to the numpy mailing list and haven't gotten much of a response. However, from a pandas perspective, it is considered a bug.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 29, 2016

@shoyer : I ran unit tests for scipy, pandas, dask, astropy, pint, and xarray. I did not see any test failures related to array subclassing with my changes.

@shoyer
Copy link
Member

shoyer commented Feb 29, 2016

I asked @gfyoung to try running the test suites of these external libraries to verify if this change breaks anything... if not, this is a good sign.

@seberg
Copy link
Member

seberg commented Feb 29, 2016

Yeah, sorry for sounding so worried. I have to admit it seems very unlikely anyone does not have identical signatures and using kwargs is always nicer then not. I think I am ok with trying this out and hope nobody finds it irritating. Should maybe have a note in the release notes?

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 29, 2016

@seberg : I don't see why not just to be safe! Added. Will ping as soon as tests pass.

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 6, 2016

@seberg , @shoyer : tests are passing. should be good to merge if there is nothing else.

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 12, 2016

Second breakage reported on the pandas, this time with np.round. I think this interface (and hence my PR) needs somewhat of a do-over to avoid such compatibility issues in the future.

@gfyoung gfyoung changed the title BUG, MAINT: Use keyword args internally in fromnumeric.py BUG, MAINT: Improve fromnumeric.py interface for downstream compatibility Mar 12, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 16, 2016

PR has been updated, and Travis and Appveyor are happy. @seberg , @shoyer , is this good to land?

Internal method calls to array methods should be written
to reflect the actual signature of the argument. Thus,
keyword arguments for example should be written explicitly
as keyword arguments.

This inconsistency has caused libraries like pandas to
adopt somewhat awkward function signatures for functions
with name equivalents in numpy in which certain arguments had
to be externalized for compatibility purposes but would preferably
have been hidden in a kwargs argument since they weren't actually
used in the implementation. Such externalization would only cause
more confusion for the user.
@njsmith
Copy link
Member

njsmith commented Mar 21, 2016

I have somewhat mixed feelings about this. On the one hand, it seems harmless enough -- the idea of np.sum etc. is that they're supposed to be called on objects that are interface-compatible with ndarray.sum, so it doesn't matter whether the call uses positional or kwargs. On the other hand, it... shouldn't matter, so why is that change needed? :-)

IIUC what's really triggering the problem here is that users are insisting on calling np.sum and friends on objects that aren't and have no intention of being numpy-compatible (i.e., pandas objects), and it doesn't work? This is where the somewhat haphazard design of numpy's duck-array stuff is biting us :-(. Basically at some point in the distant past we decided that every Python object with a sum method will implement exactly the same interface for it, and ditto for lots of other methods like this. (And there's a similar situation for the object dtype ufuncs, that assume things like "hey, all methods named .log compute logarithms; surely there's nothing else in a software program that might be referred to as 'logging'".) If these functions went through a more well-marked and designed interface like __numpy_ufunc__, or even just __array_sum__, __array_sort__, etc., then this problem wouldn't arise.

Given that we're stuck with calling .sum and friends for now for backcompat reasons, I guess this PR is probably a good incremental step? I agree with @seberg that there's still some backcompat concerns, and it'd be good to double-check that this doesn't break at least astropy and scipy.sparse, but if we can get away with it then it seems like an improvement... but in the long run it would be good if we can find a way to move away from this kind of blind method call :-/

@njsmith
Copy link
Member

njsmith commented Mar 21, 2016

@gfyoung: do you have the time to look through astropy and scipy.sparse for methods like sum, take, etc., and check if there are any that have matching positional args but different kwargs (i.e., ones that would work before this PR but be broken afterwards)?

In downstream libraries like pandas, they implement methods like 'round'
or 'searchsorted' but with slightly different arguments. However, that
causes a call like `np.round` to crash because numpy assumes that the
object's implementation of the method has a signature that  matches that
of numpy's.
@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 21, 2016

@njsmith : Yes, you are correct. The issue is that pandas has its own implementation of methods like searchsorted for its duck-typed objects like DataFrame or Series, where we don't want to allow the passage of certain arguments that numpy does allow.

I have in fact checked for backwards compatibility already, as @shoyer made a similar request beforehand. Given that this change will alleviate issues in pandas without breaking other downstream libraries, I see no reason why this can't be landed once Travis and Appveyor give the green light.

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 21, 2016

@seberg, @shoyer, @njsmith : Tests are passing, and I've addressed all of the issues raised, including the compatibility issue. This should be good to merge.

@shoyer
Copy link
Member

shoyer commented Mar 21, 2016

It might be worth a quick manual spot check through the code for
scipy.sparse... Test coverage can be aspirational.
On Mon, Mar 21, 2016 at 4:21 AM gfyoung notifications@github.com wrote:

@seberg https://github.com/seberg, @shoyer https://github.com/shoyer,
@njsmith https://github.com/njsmith : Tests are passing, and I've
addressed all of the issues raised, including the compatibility issue. This
should be good to merge.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#7325 (comment)

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 21, 2016

Yet another reason why this PR should be merged. If anything, this PR is going to improve downstream compatibility because all signature-incompatibility errors are going to get caught without compromising current compatibility as I have confirmed already.

I wouldn't be surprised if other incompatibilities would be found in other downstream libraries if they can be found in a close cousin such as scipy. Having done a spot check as @shoyer requested and having found additional reasons to merge this, can it be merged now?

@jreback
Copy link

jreback commented Mar 21, 2016

+1 on this. and agree fully with @njsmith that __numpy_ufunc__ makes this much nicer.

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 21, 2016

I would also agree that __numpy_ufunc__ is a good thing to do as a subsequent step to this PR to bridge these compatibility issues in the future.

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 21, 2016

@seberg , @shoyer , @njsmith : In light of the recent discussion, can this be merged?

@njsmith
Copy link
Member

njsmith commented Mar 22, 2016

@gfyoung: So the conclusion of your audit of scipy.sparse is that merging this PR will on net improve compatibility between numpy and scipy.sparse? Did you catch any places that would be broken? (Asking because there's a difference between taking 5 steps forward versus taking 10 steps forward and 5 steps back... bug fixes don't necessarily "cancel out" with newly introduced regressions. I'm definitely leaning towards merging this based on the discussion so far, but just want to make sure we understand + have recorded what the consequences will be.)

@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 22, 2016

@njsmith : I was not able to locate any places where this PR would break compatibility. I only could find places where it would be improved, so there is only gain with merging this, no losses.

@njsmith
Copy link
Member

njsmith commented Mar 22, 2016

@gfyoung: Okay, thanks very much for your patience and thoroughness in addressing everyone's concerns :-). It sounds like everyone else's concerns have also been addressed, so let's give it a try...

njsmith added a commit that referenced this pull request Mar 22, 2016
BUG, MAINT: Improve fromnumeric.py interface for downstream compatibility
@njsmith njsmith merged commit 11c8b4d into numpy:master Mar 22, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Mar 22, 2016

@njsmith : Certainly. This compatibility issue has been interesting to work on from all sides of the issue. Thanks for merging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants