Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: make '+' and '-' for Index either do numeric operations of raise TypeError (instead of setops) #8226

Closed
jreback opened this issue Sep 9, 2014 · 25 comments · Fixed by #8227 or #10185
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Sep 9, 2014

see comments at the end of #8184

so this type of syntax

index1 + index2
would be replaced by
index1.union(index2)

and

index1 - index2
by
index1.diff(index2)

Their is some internal code to fix but not a lot.

This would go a long ways toward simplifying the user API for index ops. I think.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Sep 9, 2014
@jreback jreback added this to the 0.15.0 milestone Sep 9, 2014
@jreback
Copy link
Contributor Author

jreback commented Sep 9, 2014

@jreback jreback changed the title API: make '+' and '-' for Index either do numeric operations of raise TypeError API: make '+' and '-' for Index either do numeric operations of raise TypeError (instead of setops) Sep 9, 2014
@jreback
Copy link
Contributor Author

jreback commented Sep 9, 2014

can easily just deprecate this in 0.15.0 (and then start raising in say 0.16.0), or could raise in 0.15.0.

@shoyer
Copy link
Member

shoyer commented Sep 9, 2014

+1 from me (obviously, this was sort of my idea).

There are a few other issues where this has come up (including a bug report/complaint) but I can't find them now.

@jreback
Copy link
Contributor Author

jreback commented Sep 9, 2014

I have a PR for this, its really a tiny change (internally was only used in 2 routines). I honestly don't think its a bit deal.

Still open to simply warn (rather than raise) though (or can just break it).

@immerrr
Copy link
Contributor

immerrr commented Sep 10, 2014

It makes sense to make set operations secondary to arithmetics, so +1 from me as well.

As for warn/raise, I'd say a graceful deprecation (a pre-change warning for one release and then a post-change for another one on a second thought, disregard that last part) would prevent some user rants. I recall using set difference somewhere in my code in the past so my own code would probably break after the change and I'd love a heads-up before that.

@jorisvandenbossche
Copy link
Member

I think this should be deprecated first (warning), as this really breaks your code and is used (I used it myself also), how strange it could be at first sight.

But I see a point in making this less confusing.

Some questions:

  • the idea is to disallow - and + (raise an error) in the future? Or do you want to allow it as arithmethic operation? (as you want to do for the datetime and timedelta index?)
  • to wrap up: compared to python's built-in set, the only difference after this change would be that - does not work as it does with sets? (as + does not work as union for sets as it did for index)
  • to make it even more consistent, we could also deprecate & and alike, as these also do something different when applied on a series (and then only leave the method versions)
  • maybe something not that related: is there some support for renaming diff to difference? (or at least start with having that alias and using that in the docs). That Series.diff and Index.diff do something completely different, that's something I have found very confusing in the past.

@TomAugspurger
Copy link
Contributor

Agree with the spirit of the PR and that we should warn for a release or two.

@jorisvandenbossche I think + and - will always do arithmetic ops. Then idx.union and idx.diff are for setops.

What does & do on series? And if we deprecate & do we get rid of ^ as well? I think that one is unambiguous.

@jreback
Copy link
Contributor Author

jreback commented Sep 10, 2014

  • now + and - for an Index and a scalar are add and subtract. This will now be consistent for indexes for which this makes sense (eg. DatetimeIndex + TimedeltaIndex and DatetimeIndex - DatetimeIndex but not DatetimeIndex + DatetimeIndex). Numeric like indexes can easily support this (it is disabled ATM, but just need tests and trivial to implement), for example Int64Index + Float64Index. These make everything more Series like and natural
  • will change to a warning - easy enough
  • difference from python set: well sets support both union (+) and difference (-) so Indexes will only by set-like for the boolean operators (&,|,^). I don't think this should change (you are right can reavaluate in the future)
  • I am ok with diff -> difference. Will make that change as well (and warn on the original).

@jorisvandenbossche
Copy link
Member

AFAIK python sets don't support + (union is |): " unsupported operand type(s) for +: 'set' and 'set' ". But ok, just an extra argument to not have this in pandas :-)

@jreback
Copy link
Contributor Author

jreback commented Sep 10, 2014

anyone negative on diff -> difference?

@jorisvandenbossche
Copy link
Member

It seems a clear rule that numerical operations on Index do arithmetic, and bitwise operations do set-like things (and comparison operations do elementwise comparison). But in numpy, bitwise operations also have gotten another meaning than in standard python (and do elementwise boolean operations), so in that regard you can still see it as a bit surprising.

@jreback
Copy link
Contributor Author

jreback commented Sep 10, 2014

I think least surprise principle is in effect. Some operations are SO compelling on indexes that they HAVE to be numeric (dti + tdi, dti - dti, int64index + int64index). and union/difference are just not in the thought train.

@jreback
Copy link
Contributor Author

jreback commented Sep 10, 2014

@jorisvandenbossche are we using FutureWarning? or DeprecationWarning I didn't follow that discussion

@jorisvandenbossche
Copy link
Member

FutureWarning (DeprecationWarning does not show up by default, so the point of warning in advance is missed then somewhat)

@immerrr
Copy link
Contributor

immerrr commented Sep 10, 2014

re: diff -> difference
+1

re: bitwise ops
I like the idea of bitwise ops behaving as arithmetical ones and being replaced with intersection, union and symmetric_difference, but I guess that's my C++ background talking -- when doing low-level you don't draw a line between say + and |, they are both frequently useful. And it seems that in pandas bitwise ops only apply to integer and boolean arrays which are not as common as container indices.

I wonder if anyone considered another option to solve this inconsistency in exactly the opposite way, given that Index is already separated from ndarray, so that operations on Index always have set semantics and if one is going for element-wise arithmetics they should use idx.values.

@jreback
Copy link
Contributor Author

jreback commented Sep 10, 2014

@immerrr In theory having set semantics is nice for Index. But as I said, once TimedeltaIndex is merged in it become really compelling to simply do

DatetimeIndex - DatetimeIndex -> TimedeltaIndex and DatetimeIndex + TimedeltaIndex -> DatetimIndex

thus the impetus for the change. And to avoid ambiguity on index types (e.g. Index - Index cannot then be set difference, too confusing).

can't use idx.values as they are unboxed (and defeat the point of using the Index proper).

To be honest, dropping | and & would not be a big deal either (and basically use .union/.intersection()

BTW in set ops (python), is + and | equivalent?

@immerrr
Copy link
Contributor

immerrr commented Sep 10, 2014

DatetimeIndex - DatetimeIndex -> TimedeltaIndex

Ok, got that, indeed Index(dti1.values - dti2.values) is both lengthier and may incur type detection penalties.

can't use idx.values as they are unboxed

Yup, a proper numpy dtype (or ndarray subclass) to do exactly that is still on my "someday" todo list :)

@shoyer
Copy link
Member

shoyer commented Sep 11, 2014

@jreback like @jorisvandenbossche said, python set ops don't even implement +, which makes its use for union on Index very confusing.

I was thinking about the binary operations issue. I agree that from a consistency perspective, |/^/& should do the ndarray operation. But from a practical pespective, I am not so concerned -- boolean indexes are nearly useless, and doing bitwise operations on non-boolean indexes seems like a very strange thing to do.

@shoyer
Copy link
Member

shoyer commented Sep 11, 2014

Here's a related API consistency issue -- what should the type of the object returned from comparisons and arithmetic operations with an index be?

index == 0 seems to return a boolean ndarray, whereas index + index (in this proposal) will return a new index.

I doubt the index functionality is actually desired or necessary most of the time, and if it is necessary, a ndarray could be coerced to an index anyways. What about making idx1 - idx2 equivalent to idx1.values - idx2.values? The advantage here would be consistency (especially considering comparison operators), and also avoiding the minimal overhead of creating a new index when it's not actually needed.

On the other hand, there would indeed be some small type detection penalty when casting to an index (as @immerrr points out), and from a practical perspective, TimedeltaIndex and DatetimeIndex are much more functional that np.datetime64 and np.timedelta64 ndarrays, so perhaps we are doing our users a favor by never returning those objects.

Thoughts?

@jorisvandenbossche
Copy link
Member

Personally I am -1 on returning an array from arithmetic operations on the Index, then you can better use idx.values which is much more explicit. The inconsistency is not that bad here I think. For comparison operations, this leads to an array of bools as you don't have a boolean index type. For arithmetic operations this does not hold.
So I would vote for or allow arithmetic on Indexes that return Index, or not allow it (or leave it as set operation)

@jreback
Copy link
Contributor Author

jreback commented Sep 11, 2014

currently Index comparison ops return a boolean array. We could have a BoolIndex, but to be honest it is just a wrapper on an ndarray, and IMHO not that useful, as you almost always simply index with it immediately.

ok, going to merge the PR unless objections? (I need to this to go before the tdi changes).

@jorisvandenbossche
Copy link
Member

To summarise what you did in the PR:

  • raise FutureWarning on +, - (except for datetime/timedelta index), leave &, | and ^ untouched
  • deprecate diff in favor of difference

That's right?

And what in the future? Error on + and -? Or let them do arithmetic?

@jreback
Copy link
Contributor Author

jreback commented Sep 11, 2014

@jorisvandenbossche yep

I would eventually raise on +/- on non-numeric Indexes. (e.g. Index + Index)

NotIMplemented right now is Int64Index + Int64Index

@shoyer
Copy link
Member

shoyer commented Sep 11, 2014

OK, I thought I would just raise the possibility of index + index -> ndarray. I agree that BoolIndex is not worth doing, and this inconsistency is not enough to be worth worrying about.

Thanks @jreback for resolving this!

@jreback
Copy link
Contributor Author

jreback commented Sep 11, 2014

@shoyer thanks for pushing on this!

IIRC this was something @wesm did mention quite a while ago

dvalters added a commit to GCEL/pySPECCHIO that referenced this issue Apr 12, 2018
The Index set operations + and - were deprecated in order to provide these for numeric type operations on certain index types. + can be replaced by .union() or |, and - by .difference(). Further the method name Index.diff() is deprecated and can be replaced by Index.difference() (GH8226) pandas-dev/pandas#8226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
5 participants