API/DEPR: Remove +/- as setops for Index (GH8227) #14127

Merged
merged 1 commit into from Sep 6, 2016

Conversation

Projects
None yet
4 participants
Owner

jorisvandenbossche commented Aug 31, 2016 edited

xref #13777, deprecations put in place in #8227

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

jorisvandenbossche added this to the 0.19.0 milestone Aug 31, 2016

@jreback jreback and 1 other commented on an outdated diff Aug 31, 2016

pandas/indexes/base.py
return Index(other + np.array(self))
__iadd__ = __add__
- def __sub__(self, other):
@jreback

jreback Aug 31, 2016

Contributor

I think this is not defined in super? so this should raise NotImplementedError I think?

@jorisvandenbossche

jorisvandenbossche Aug 31, 2016 edited

Owner

yep, this is something that I still have to handle.

The problem is that sub/add are added for subclasses in either _add_numericlike_set_methods_disabled or _add_numeric_methods or both ovenwritten (always handled both), but in the base class we only want __add__ and not __sub__

If I leave it in here, I think it should rather raise a TypeError? (for usage of base Index class)

@jreback

jreback Aug 31, 2016

Contributor

yeah TypeError is prob ok too

the only reason to raise NotImplemmeted is that then subclasses could override - but we don't want that

@jreback jreback commented on the diff Aug 31, 2016

pandas/indexes/base.py
@@ -1745,29 +1745,13 @@ def argsort(self, *args, **kwargs):
return result.argsort(*args, **kwargs)
def __add__(self, other):
- if is_list_like(other):
- warnings.warn("using '+' to provide set union with Indexes is "
- "deprecated, use '|' or .union()", FutureWarning,
- stacklevel=2)
- if isinstance(other, Index):
@jreback

jreback Aug 31, 2016

Contributor

this needs to remain I think? (the isinstance part)

@jorisvandenbossche

jorisvandenbossche Aug 31, 2016

Owner

Why the isinstance part? (we don't only allow to add an Index, but also a scalar or list/array)

@jreback

jreback Aug 31, 2016

Contributor

I think this really should use _ensure_index I think

@jorisvandenbossche

jorisvandenbossche Aug 31, 2016

Owner

But scalars should also work?

In [13]: pd.Index(['a', 'b']) + '1'
Out[13]: Index(['a1', 'b1'], dtype='object')

BTW, on numeric indexes, using lists is not allowed:

In [12]: pd.Index([1, 2]) + [10, 20]
TypeError: can only perform ops with scalar values

In [13]: pd.Index([1, 2]) + np.array([10, 20])
Out[13]: Int64Index([11, 22], dtype='int64')

while on base Index it is currently allowed:

In [15]: pd.Index(['a', 'b']) + ['c', 'd']
Out[15]: Index(['ac', 'bd'], dtype='object')

Unifying that would also be nice. Using _ensure_index in this case would still allow lists

@jreback

jreback Aug 31, 2016

Contributor

I think we are expanding the api w/o any testing of __add__. Its ok, but should create an issue to add tests and/or limit it. (so ideally we don't expand it at all for Index and add back if needed later), esp NOT for MultiIndex and Categorical (I think you just renamed, but just to be sure).

@jorisvandenbossche

jorisvandenbossche Sep 1, 2016

Owner

Yes, for MultiIndex/Categorical nothing should be changed. I just renamed the method to disable add/sub for those, as there are no longer set ops.

For Index, I agree we should be more careful on what to allow, but I think that this is already the case in current master.
The 'old' __add__ did basically:

def __add__(self, other):
    if isinstance(other, Index):
        self.union(other)
    return Index(self.values + other)

So by removing the if instance part I am not broading the scope, previously everything that was not an Index was already handled to do addition.

But, as I said, I agree we should be more careful / do more careful testing (eg Int64Index does not allow to add with lists, base Index does).
I will see if I do that here, or leave it for another issue.

@jreback

jreback Sep 2, 2016 edited

Contributor

I would be a break in here (e.g. where it currently calls .union) and see if we actually have test code that hits this. It obviously works, so prob ok to remove, just checking.

@jorisvandenbossche

jorisvandenbossche Sep 4, 2016

Owner

Not sure I understand this comment. We had indeed tests that tested the union behaviour of + for Index, and I changed this test to check for concatenation. If there are other tests that would hit this path, they should now fail given that I removed it.

@jreback jreback commented on an outdated diff Aug 31, 2016

pandas/indexes/base.py
@@ -3575,6 +3560,7 @@ def invalid_op(self, other=None):
cls.any = _make_invalid_op('any')
+#Index._add_numeric_methods()

codecov-io commented Aug 31, 2016 edited

Current coverage is 85.25% (diff: 100%)

Merging #14127 into master will decrease coverage by <.01%

@@             master     #14127   diff @@
==========================================
  Files           139        139          
  Lines         50501      50495     -6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43058      43051     -7   
- Misses         7443       7444     +1   
  Partials          0          0          

Powered by Codecov. Last update 4488f18...dbcc655

Contributor

jreback commented Aug 31, 2016

@jorisvandenbossche we should do this for the RC as well.

@jreback yep, as it is quite a substantial change (for people that ignored the warnings)

Is there a reason that we do not allow numeric ops for the base Index class? (eg pd.Index(['a', 'b'] could have a clear meaning, just letting the objects determine whether an operation can be handled)

Because eg the __add__ in the Index class could benefit from the machinery in _add_numeric_methods_binary (eg to ensure attributes like name are passed correctly. Something that is also not correct on current master)

Contributor

jreback commented Aug 31, 2016

I think we don't allow numeric operations on Index as only possible one is add for string concat
nothing else makes sense

in general these r not well defined so no reason to allow

Member

shoyer commented Aug 31, 2016

I think we should let the objects in the Index decide whether the numeric operation is valid or not. This is the approach NumPy uses for object dtype. For example, one might put decimal objects in a pandas.Index.

Contributor

jreback commented Aug 31, 2016

the numpy approach leads to lots of bugs and edge cases - constraining things somewhat to not allowing everything on everything is much more user friendly

decimal objects in an Index are a disaster - should not be encouraged

@jreback jreback and 1 other commented on an outdated diff Aug 31, 2016

doc/source/whatsnew/v0.19.0.txt
@@ -878,6 +878,31 @@ of ``int64`` (:issue:`13988`)
pi = pd.PeriodIndex(['2011-01', '2011-02'], freq='M')
pi.values
+.. _whatsnew_0190.api.setops:
+
+Index ``+`` / ``-`` no longer used for set operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Addition and subtraction of certain Index types performed set operations
+(set union and difference). This behaviour was already deprecated since 0.15.0,
+and is now disabled and, when possible, ``+``
+and ``-`` are used for numeric operations (:issue:`8227`, :issue:`14127`).
@jreback

jreback Aug 31, 2016

Contributor

I would be more specific that 'when possible' (maybe say for numeric & datetimelike)

@jorisvandenbossche

jorisvandenbossche Sep 1, 2016

Owner

datetimelike not yet (DatetimeIndex - DatetimeIndex does still set op, but this one is a bit more work to remove).

And + also works for strings, so not only numeric. Further, for numeric indices, + and - did already do numeric ops and not setops .. (to make it a bit more complex :-))

So I did not directly come up with a more accurate but simple wording, but if you have a suggestion all ears!

jorisvandenbossche changed the title from [WIP] API/DEPR: Remove +/- as setops for Index (GH8227) to API/DEPR: Remove +/- as setops for Index (GH8227) Sep 1, 2016

@jorisvandenbossche jorisvandenbossche modified the milestone: 0.19.0, 0.19.0rc Sep 1, 2016

jorisvandenbossche referenced this pull request Sep 1, 2016

Closed

RLS: 0.19.0 #13991

@jreback jreback commented on the diff Sep 2, 2016

doc/source/whatsnew/v0.19.0.txt
+Index ``+`` / ``-`` no longer used for set operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Addition and subtraction of certain Index types performed set operations
+(set union and difference). This behaviour was already deprecated since 0.15.0,
+and is now disabled and, when possible, ``+``
+and ``-`` are used for numeric operations (:issue:`8227`, :issue:`14127`).
+
+Previous Behavior:
+
+.. code-block:: ipython
+
+ In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
+ FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union()
+ Out[1]: Index(['a', 'b', 'c'], dtype='object')
+
@jreback

jreback Sep 2, 2016

Contributor

make this clear that this ONLY applies to Index itself and not-subclasses (maybe show that numeric/datetimelike ops are unchanged)

@jorisvandenbossche

jorisvandenbossche Sep 4, 2016

Owner

yes, will try to clarify

@jreback jreback commented on an outdated diff Sep 2, 2016

doc/source/whatsnew/v0.19.0.txt
@@ -878,6 +878,31 @@ of ``int64`` (:issue:`13988`)
pi = pd.PeriodIndex(['2011-01', '2011-02'], freq='M')
pi.values
+.. _whatsnew_0190.api.setops:
+
+Index ``+`` / ``-`` no longer used for set operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Addition and subtraction of certain Index types performed set operations
@jreback

jreback Sep 2, 2016

Contributor

This behaviour was deprecated since 0.15.0 (already is redundant)

Contributor

jreback commented Sep 3, 2016

@jorisvandenbossche some small comments, but otherwise lgtm. merge when ready.

@jorisvandenbossche jorisvandenbossche API/DEPR: Remove +/- as setops for Index (GH8227)
dbcc655

jreback referenced this pull request Sep 4, 2016

Open

DEPR: deprecations log for removed issues #13777

65 of 65 tasks complete

@jorisvandenbossche jorisvandenbossche merged commit 8023029 into pandas-dev:master Sep 6, 2016

3 checks passed

codecov/patch 100% of diff hit (target 50.00%)
Details
codecov/project 85.25% (target 82.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jorisvandenbossche jorisvandenbossche modified the milestone: 0.19.0rc, 0.19.0 Sep 7, 2016

@trbs trbs added a commit to trbs/pandas that referenced this pull request Sep 12, 2016

@trbs trbs Merge github.com:pydata/pandas into issue_9084_get_schema_index_param…
…eter

* github.com:pydata/pandas: (554 commits)
  BUG: compat with Stata ver 111
  Fix: F999 dictionary key '2000q4' repeated with different values (#14198)
  BLD: Test for Python 3.5 with C locale
  BUG: DatetimeTZBlock can't assign values near dst boundary
  BUG: union_categorical with Series and cat idx
  BUG: fix str.contains for series containing only nan values
  BUG: Categorical constructor not idempotent with ext dtype
  TST: Make encoded sep check more locale sensitive (#14161)
  DOC: minor typo in 0.19.0 whatsnew file (#14185)
  BUG: fix tz-aware datetime convert to DatetimeIndex (GH 14088)
  BUG : bug in setting a slice of a Series with a np.timedelta64
  RLS: v0.19.0rc1
  DOC: clean-up 0.19.0 whatsnew file (#14176)
  DOC: cleanup build warnings (#14172)
  Add steps to run gbq integration testing to the contributing docs (#14144)
  ENH: concat and append now can handle unordered categories (#13767)
  DEPR: Deprecate pandas.core.datetools (#14105)
  API/DEPR: Remove +/- as setops for DatetimeIndex/PeriodIndex (GH9630) (#14164)
  Fix trivial typo in comment (#14174)
  API/DEPR: Remove +/- as setops for Index (GH8227) (#14127)
  ...
8ab1250

@trbs trbs added a commit to trbs/pandas that referenced this pull request Sep 12, 2016

@trbs trbs Merge github.com:pydata/pandas into to_sql_secondary_indexes
* github.com:pydata/pandas: (554 commits)
  BUG: compat with Stata ver 111
  Fix: F999 dictionary key '2000q4' repeated with different values (#14198)
  BLD: Test for Python 3.5 with C locale
  BUG: DatetimeTZBlock can't assign values near dst boundary
  BUG: union_categorical with Series and cat idx
  BUG: fix str.contains for series containing only nan values
  BUG: Categorical constructor not idempotent with ext dtype
  TST: Make encoded sep check more locale sensitive (#14161)
  DOC: minor typo in 0.19.0 whatsnew file (#14185)
  BUG: fix tz-aware datetime convert to DatetimeIndex (GH 14088)
  BUG : bug in setting a slice of a Series with a np.timedelta64
  RLS: v0.19.0rc1
  DOC: clean-up 0.19.0 whatsnew file (#14176)
  DOC: cleanup build warnings (#14172)
  Add steps to run gbq integration testing to the contributing docs (#14144)
  ENH: concat and append now can handle unordered categories (#13767)
  DEPR: Deprecate pandas.core.datetools (#14105)
  API/DEPR: Remove +/- as setops for DatetimeIndex/PeriodIndex (GH9630) (#14164)
  Fix trivial typo in comment (#14174)
  API/DEPR: Remove +/- as setops for Index (GH8227) (#14127)
  ...
a092198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment