Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: removal of deprecation warnings for float indexers #12246

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 6, 2016

raise a TypeError instead, xref #4892
closes #11836

similar to numpy in 1.11 here

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Deprecate Functionality to remove in pandas labels Feb 6, 2016
@jreback jreback added this to the 0.18.0 milestone Feb 6, 2016
@jreback jreback force-pushed the deprecate2 branch 3 times, most recently from 45596d0 to 6680f1f Compare February 6, 2016 22:55
@jreback
Copy link
Contributor Author

jreback commented Feb 6, 2016

there are no real changes here (aside from a bug fix), except for now raising TypeError where we should have been showing the FutureWarning previously

but if anyone would has comments great.

@jreback
Copy link
Contributor Author

jreback commented Feb 11, 2016

@TomAugspurger @jorisvandenbossche @shoyer any objections?

except TypeError:

# but we will allow setting
if is_setter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that series[1.0] will TypeError, but series[1.0] = 2 will succeed? Or does that raise later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short answer is yes, though, series[1.0] = 2 works because its a label assignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems not great (on your branch)

In [10]: s = Series([1,2,3])

In [11]: s[1.] = 5

In [12]: s
Out[12]:
0    1
1    2
2    3
1    5
dtype: int64

In [13]: s.index
Out[13]: Int64Index([0, 1, 2, 1], dtype='int64')

So before this would have warned about the Floating scalar, cast to an int, and then overwritten the value at index=1. I guess that would make this an API change, but this seems like a pretty hairy edge-case

I think I see why you would want to allow setting of floats that could be cast as integers, but maybe you could explain your thinking a bit more?

EDIT: I accidentally left out a "not" in my first line, so now I sound like a sarcastic jerk 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was fixing this bug

In [1]: s = Series([1,2,3])

In [2]: s
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: s.index
Out[3]: RangeIndex(start=0, stop=3, step=1)

In [4]: s['0'] = 5

In [5]: s
Out[5]: 
0    1
1    2
2    3
0    5
dtype: int64

In [6]: s.index
Out[6]: Index([0, 1, 2, u'0'], dtype='object')

so s[1.0] = 1 should change this to a Float64Index

but I couldn't get it to work and it seemed innocuous enough. Let me have another look. This is a giant rabbit hole.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: s = Series([1,2,3])

This is on current master. This should work and coerce to a Float64Index.

In [2]: s[1.1] = 1
pandas/core/indexing.py:1095: FutureWarning: scalar indexers for index type RangeIndex should be integers and not floating point
IndexingError: 1.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I fixed everything, but had to change this:

What kind of index should this be:
Index([1., 2., 3.], dtype=int)
Index([1., 2, 3.], dtype=int)

w/o actually running it?

@jreback jreback force-pushed the deprecate2 branch 2 times, most recently from 4ee7073 to a702344 Compare February 11, 2016 04:43
@kawochen
Copy link
Contributor

If setting works and changes the index to Float64Index, what should s be after this?

In [17]: s = pandas.Series([1,2,3])

In [18]: s.index = pandas.Int64Index([2**53, 2**53+1, 2**53+2])

In [19]: s
Out[19]: 
9007199254740992    1
9007199254740993    2
9007199254740994    3
dtype: int64

In [20]: s[2**53+0.0] = 4

@shoyer
Copy link
Member

shoyer commented Feb 11, 2016

We should not make using floats instead of ints for label based indexing (e.g., with .loc) error. I think we need to following the precedence of the Python dictionary here:

In [16]: x = {1: 'one'}

In [17]: x[1.0]
Out[17]: 'one'

I'm totally on board with making this error in cases were it was previously doing positional indexing (e.g., .iloc), though. That would be consistent with NumPy and base Python:

In [18]: y = [1, 2]

In [19]: y[1.0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-d03a4798580f> in <module>()
----> 1 y[1.0]

TypeError: list indices must be integers or slices, not float

@TomAugspurger
Copy link
Contributor

This is a giant rabbit hole.

Sorry about sending you down it :/

I think I'm with @shoyer here. If two items compare as equal, label getting / setting go through (assuming that's possible to implement, while still raising on positional getting/setting).

The additional thing we have to decide on is what casting to do when setting. Do we cast the label to be like the rest of the Index or the Index to be like the new label? FWIW NumPy will cast the new item for arrays

In [40]: x = np.array([1, 2, 3], dtype='int64')

In [41]: x[1] = 2.0

In [42]: x.dtype
Out[42]: dtype('int64')

Oh, but they also do this even if the float can't be cast to an equal int. So setting x[1] = 2.5 also truncates to 2. I don't think we can do that.

@jreback
Copy link
Contributor Author

jreback commented Feb 11, 2016

So this comes down to: is 1.0 a key in an Int64Index/RangeIndex or is it not found, in which case we upcast the Index to either Float64Index or object

In [1]: s = Series(range(3), range(3))

In [2]: s
Out[2]: 
0    0
1    1
2    2
dtype: int64

what should
s[1.0] = 5 do

Option 1
In this PR (this looks kind of wrong to me, essentially

In [6]: s[1.0] = 5     

In [7]: s
Out[7]: 
0.0    0
1.0    1
2.0    2
1.0    5
dtype: int64

In [8]: s.index
Out[8]: Float64Index([0.0, 1.0, 2.0, 1.0], dtype='float64')

Option 2
and alternative is to upcast this

In [10]: s2 = Series(range(4),index=Index([0,1,2,1.0],dtype=object))  

In [11]: s2.index
Out[11]: Index([0, 1, 2, 1.0], dtype='object')

In [12]: s
Out[12]: 
0.0    0
1.0    1
2.0    2
1.0    5
dtype: int64

Option 3
Third we could make 1.0 match 1 (remember this is for .loc/[], e.g. ONLY for label indexing), though .ix actually would NOT match this for positional so .ix would do the same

and simply have

In [2]: s
Out[2]: 
0    0
1    5
2    2
dtype: int64

Then there is the case where we are doing s[0.1] = 5 (and option 3 is off the table).

@jreback jreback force-pushed the deprecate2 branch 2 times, most recently from 538be0d to ad78ce2 Compare February 12, 2016 16:28
@jreback
Copy link
Contributor Author

jreback commented Feb 12, 2016

Ok here's the revised setting behavior, I think its pretty natural

the value is coerced (gently) to the index type and regular indexing is performed
this only actually matters in the case of a float with an integer index (e.g. 0.0 -> 0)
strings and non-coercible floats (e.g. 4.1) will force conversion to object or Float64Index as appropriate.

In [2]: s = Series(range(3))

In [3]: s2 = s.copy()

In [4]: s2[0] = 10

In [5]: s2.index
Out[5]: RangeIndex(start=0, stop=3, step=1)

In [6]: s2 = s.copy()

In [7]: s2['0'] = 10

In [8]: s2.index
Out[8]: Index([0, 1, 2, u'0'], dtype='object')

In [9]: s2 = s.copy()

In [10]: s2[0.0] = 10

In [11]: s2.index
Out[11]: RangeIndex(start=0, stop=3, step=1)

In [12]: s2 = s.copy()

In [13]: s2[4] = 10

In [14]: s2.index
Out[14]: Int64Index([0, 1, 2, 4], dtype='int64')

In [15]: s2 = s.copy()

In [16]: s2[4.0] = 10

In [17]: s2.index
Out[17]: Int64Index([0, 1, 2, 4], dtype='int64')

In [18]: s2 = s.copy()

In [19]: s2[4.1] = 10

In [20]: s2.index
Out[20]: Float64Index([0.0, 1.0, 2.0, 4.1], dtype='float64')

Comparison to 0.17.1
differences are a 2 bug fixes
and no warning now issued (though I could do that)

In [3]: s = Series(range(3))

In [4]: s2 = s.copy()

In [5]: s2[0] = 10

In [6]: s2.index
Out[6]: Int64Index([0, 1, 2], dtype='int64')

In [7]: s2 = s.copy()

# this is wrong and a bug fix
In [8]: s2['0'] = 10

In [9]: s2.index
Out[9]: Int64Index([0, 1, 2, 0], dtype='int64')

In [10]: s2 = s.copy()

# warning issued in 0.17.1
In [11]: s2[0.0] = 10
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:1063: FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
  obj = self._convert_scalar_indexer(obj, axis)

In [12]: s2.index
Out[12]: Int64Index([0, 1, 2], dtype='int64')

In [13]: s2 = s.copy()

In [14]: s2[4] = 10

In [15]: s2.index
Out[15]: Int64Index([0, 1, 2, 4], dtype='int64')

In [16]: s2 = s.copy()

In [17]: s2[4.0] = 10

In [18]: s2.index
Out[18]: Int64Index([0, 1, 2, 4], dtype='int64')

In [19]: s2 = s.copy()

# this was buggy in 0.17.1
In [20]: s2[4.1] = 10
IndexingError: 4.1

@jreback jreback force-pushed the deprecate2 branch 3 times, most recently from 6da1949 to 1584661 Compare February 12, 2016 18:29
@jreback
Copy link
Contributor Author

jreback commented Feb 12, 2016

@TomAugspurger
Copy link
Contributor

At a glance those all look correct. Haven't had a chance to look through the code.

👍 to your general approach of coercing the new member to the Index type if possible, otherwise coerce the index.

# we will not create a duplicate index, rather
# index to that element
# e.g. 0.0 -> 0
# GH12245
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12246

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…nal setting,

and raise a TypeError, xref pandas-dev#4892

BUG: index type coercion when setting with an integer-like

closes pandas-dev#11836
@jreback jreback closed this in a8be55c Feb 13, 2016
@jorisvandenbossche
Copy link
Member

Some after merge feedback:

I took a look and I also like the logic now as how you explained it in your last post (for label based indexing: if the label evaluates as equal, then it is interpreted as the existing label)

A few things:

  • The whatsnew needs to be updated (as it still says it raises now for all cases, as you started the PR)

  • (minor) When using iloc with a float (so the case it still should raise an error), the error message is not fully correct:

    In [76]: s2 = s.copy()
    
    In [77]: s2.iloc[2.0] = 10
    TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
    with these indexers [2.0] of <type 'float'>
    

    This is not 'label' indexing, but 'integer' or 'positional' indexing

  • The behaviour of using a float indexer with ix on a non-numerical index has changed:

    In [78]: s3 = pd.Series(range(3), index=list('abc'))
    
    In [79]: s4 = s3.copy()
    
    In [80]: s4.ix[1.0] = 10
    
    In [81]: s4
    Out[81]:
    a       0
    b       1
    c       2
    1.0    10
    dtype: int64
    
    In [82]: s4.index
    Out[82]: Index([u'a', u'b', u'c', 1.0], dtype='object')
    
    In [83]: pd.__version__
    Out[83]: '0.17.1+350.g5f7c9e9'
    

    and with 0.16.2

    In [1]: pd.__version__
    Out[1]: '0.16.2'
    
    In [2]: s3 = pd.Series(range(3), index=list('abc'))
    
    In [3]: s4 = s3.copy()
    
    In [4]: s4.ix[1.0] = 10
    C:\Anaconda\lib\site-packages\pandas\core\index.py:805: FutureWarning: scalar i
    dexers for index type Index should be integers and not floating point
    type(self).__name__),FutureWarning)
    
    In [5]: s4
    Out[5]:
    a     0
    b    10
    c     2
    dtype: int64
    
    In [6]: s4.index
    Out[6]: Index([u'a', u'b', u'c'], dtype='object')
    
    

    The change is logical, as before the float was interpreted as a positional indexer (and for this the warning was raised). But now a float cannot be a positional indexer anymore, so it is interpreted as a new label.
    To be clear, I think this change is OK, but just wanted to point out a case where this change will not raise an error but alter your results (worth mentioning in the whatsnew docs?)

@@ -52,6 +52,7 @@ Highlights include:
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.breaking.resample>`.
- Removal of support for deprecated float indexers; these will now raise a ``TypeError``, see :ref:`here <whatsnew_0180.enhancements.float_indexers>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention that it only is about positional indexing? Otherwise people could think FloatIndex is removed.

@jorisvandenbossche
Copy link
Member

Actually, I don't yet agree with the behaviour of how it is now merged in this PR (but now only saw this by looking at your other PR for the doc issues #12330).

We had a discussion about what to do if a float (eg 1.0) evaluates as equal to a label (eg 1). Initially this raised, but after the discussion the behaviour was changed for setting:

In [7]: s = pd.Series([1,2,3])

In [8]: s[0.0] = 10

In [9]: s
Out[9]:
0    10
1     2
2     3
dtype: int64

In [10]: s.index
Out[10]: RangeIndex(start=0, stop=3, step=1)

This comment above gives an overview of this: #12246 (comment)

But, the behaviour was not changed for getting, so the getting variant of the example above raises:

In [11]: s[0.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
 with these indexers [0.0] of <type 'float'>

Which feels a bit inconsistent. And it is also, if I understand it correctly, what @shoyer opposed to:

We should not make using floats instead of ints for label based indexing (e.g., with .loc) error. I think we need to following the precedence of the Python dictionary here:

In [16]: x = {1: 'one'}

In [17]: x[1.0]
Out[17]: 'one'

@jreback That is actually the reason I gave the comment to update the whatsnew notes, as I thought they were incorrect (assuming that not only setting was changed, but also getting)

@jreback
Copy link
Contributor Author

jreback commented Feb 15, 2016

hmm, for []/.ix I could agree with you (certainly not for .loc/.iloc though) as these are quite strict.

It comes down to whether we are actually indexing with a label or a position (and that of course is the ambiguity here).

@shoyer comment is too simple for many of these cases. We have different indexers for this reason (and some legacy w.r.t. []).

@jorisvandenbossche can you enumerate when you think we ought to coerce by indexer?

maybe open a new issue.

@jorisvandenbossche
Copy link
Member

This just looks very odd:

In [11]: s = pd.Series([1,2,3])

In [12]: s.loc[0.0] = 10

In [13]: s.loc[0.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
 with these indexers [0.0] of <type 'float'>

And I don't think we should introduce even more differences between loc/ix/[] here.

But indeed, will open a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

int64 Index in 0.17 typcasts '0' string to integer
5 participants