New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.ix[] inconsistency between axis for MultiIndex #2904

Closed
lodagro opened this Issue Feb 20, 2013 · 11 comments

Comments

Projects
None yet
3 participants
@lodagro
Contributor

lodagro commented Feb 20, 2013

In [42]: from itertools import product

In [43]: import pandas as pd

In [44]: import numpy as np

In [45]: index = pd.MultiIndex.from_tuples([t for t in product([10, 20, 30], ['a', 'b'])])

In [46]: df = pd.DataFrame(np.random.randn(6, 6), index, index)

In [47]: df
Out[47]:
            10                  20                  30
             a         b         a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
   b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894
20 a -0.401678  0.481390 -1.080735  0.621861  1.410940 -1.106015
   b -0.504422 -1.555415 -0.023859  0.211287 -0.321643  0.140895
30 a -0.118969 -0.432082 -0.888786  1.167191 -1.642356 -0.281661
   b -0.580182  2.920769 -0.685617  1.327784  0.691514 -0.692361

Slicing ranges is consistent between both axis.

In [48]: df.ix[10:20, :]
Out[48]:
            10                  20                  30
             a         b         a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
   b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894
20 a -0.401678  0.481390 -1.080735  0.621861  1.410940 -1.106015
   b -0.504422 -1.555415 -0.023859  0.211287 -0.321643  0.140895

In [49]: df.ix[:, 10:20]
Out[49]:
            10                  20
             a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877
   b  0.890805 -2.142297  0.758411 -1.650710
20 a -0.401678  0.481390 -1.080735  0.621861
   b -0.504422 -1.555415 -0.023859  0.211287
30 a -0.118969 -0.432082 -0.888786  1.167191
   b -0.580182  2.920769 -0.685617  1.327784

This is inconsistent to me:

In [50]: df.ix[10, :]
Out[50]:
         10                  20                  30
          a         b         a         b         a         b
a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894

In [51]: df.ix[:, 10]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
...
IndexError: index out of bounds

and this also

In [52]: df.ix[0, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 0

In [53]: df.ix[:, 0]
Out[53]:
10  a    0.077368
    b    0.890805
20  a   -0.401678
    b   -0.504422
30  a   -0.118969
    b   -0.580182
Name: (10, a), Dtype: float64
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 6, 2013

Contributor

i think with #2922, these make more sense (obviously .ix unchanged, but users have a choice to use non-ambiguous selectors instead)...

eg.

In [8]: df
Out[8]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151
30 a -0.361386  0.172049  0.663303  0.545051 -1.071491 -0.144815
   b  1.339875 -0.831864  0.742964  1.297208  0.719399 -0.488385

# this treats the 10 like a label
In [9]: df.loc[10,:]
Out[9]: 
         10                  20                  30          
          a         b         a         b         a         b
a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424

# this treats the 10 like a label
In [10]: df.loc[:,10]
Out[10]: 
             a         b
10 a -0.969812  0.892646
   b  0.757901  0.441437
20 a -1.025012 -0.588379
   b -0.527607  0.897994
30 a -0.361386  0.172049
   b  1.339875 -0.831864

# slices are INCLUSIVE since these are labels
In [11]: df.loc[10:20,:]
Out[11]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151

# same here
In [12]: df.loc[:,10:20]
Out[12]: 
            10                  20          
             a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564
   b  0.757901  0.441437  0.049145  0.903316
20 a -1.025012 -0.588379 -0.011694  0.005748
   b -0.527607  0.897994 -1.043933 -1.200322
30 a -0.361386  0.172049  0.663303  0.545051
   b  1.339875 -0.831864  0.742964  1.297208

# positional slicing
In [13]: df.iloc[0,:]
Out[13]: 
10  a   -0.969812
    b    0.892646
20  a   -0.098479
    b    1.416564
30  a   -0.415579
    b   -1.863745
Name: (10, a), dtype: float64

# same
In [14]: df.iloc[:,0]
Out[14]: 
10  a   -0.969812
    b    0.757901
20  a   -1.025012
    b   -0.527607
30  a   -0.361386
    b    1.339875
Name: (10, a), dtype: float64
Contributor

jreback commented Mar 6, 2013

i think with #2922, these make more sense (obviously .ix unchanged, but users have a choice to use non-ambiguous selectors instead)...

eg.

In [8]: df
Out[8]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151
30 a -0.361386  0.172049  0.663303  0.545051 -1.071491 -0.144815
   b  1.339875 -0.831864  0.742964  1.297208  0.719399 -0.488385

# this treats the 10 like a label
In [9]: df.loc[10,:]
Out[9]: 
         10                  20                  30          
          a         b         a         b         a         b
a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424

# this treats the 10 like a label
In [10]: df.loc[:,10]
Out[10]: 
             a         b
10 a -0.969812  0.892646
   b  0.757901  0.441437
20 a -1.025012 -0.588379
   b -0.527607  0.897994
30 a -0.361386  0.172049
   b  1.339875 -0.831864

# slices are INCLUSIVE since these are labels
In [11]: df.loc[10:20,:]
Out[11]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151

# same here
In [12]: df.loc[:,10:20]
Out[12]: 
            10                  20          
             a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564
   b  0.757901  0.441437  0.049145  0.903316
20 a -1.025012 -0.588379 -0.011694  0.005748
   b -0.527607  0.897994 -1.043933 -1.200322
30 a -0.361386  0.172049  0.663303  0.545051
   b  1.339875 -0.831864  0.742964  1.297208

# positional slicing
In [13]: df.iloc[0,:]
Out[13]: 
10  a   -0.969812
    b    0.892646
20  a   -0.098479
    b    1.416564
30  a   -0.415579
    b   -1.863745
Name: (10, a), dtype: float64

# same
In [14]: df.iloc[:,0]
Out[14]: 
10  a   -0.969812
    b    0.757901
20  a   -1.025012
    b   -0.527607
30  a   -0.361386
    b    1.339875
Name: (10, a), dtype: float64
@lodagro

This comment has been minimized.

Show comment
Hide comment
@lodagro

lodagro Mar 6, 2013

Contributor

Hmm, i clearly did not follow the thread in #2922 close enough, since i am surprised by failure of df.loc[10:20, :], some catching up to do :-)

Contributor

lodagro commented Mar 6, 2013

Hmm, i clearly did not follow the thread in #2922 close enough, since i am surprised by failure of df.loc[10:20, :], some catching up to do :-)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 14, 2013

Contributor

close this? going to add to cookbook in any event

Contributor

jreback commented Mar 14, 2013

close this? going to add to cookbook in any event

@lodagro

This comment has been minimized.

Show comment
Hide comment
@lodagro

lodagro Mar 14, 2013

Contributor

You prefer to close this since .ix is old stuff now, and no plans to change this?

On the above DataFrame df.loc[10, ] and df.loc[:, 10] (in contrast to ix) work fine, however slicing on an integer MultiIndex level does not, as you already indicated (would that require a seperate issue?).

Contributor

lodagro commented Mar 14, 2013

You prefer to close this since .ix is old stuff now, and no plans to change this?

On the above DataFrame df.loc[10, ] and df.loc[:, 10] (in contrast to ix) work fine, however slicing on an integer MultiIndex level does not, as you already indicated (would that require a seperate issue?).

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 14, 2013

Contributor

your example probably SHOULD work, but ix is quite tricky, I am not sure there are plans to change/fix it. could certainly bump this to 0.12. if you would like

slicing does work on integer multi-index just respects labels or positions depending on what you choose. Your example in this issue is good at showing he ambiguity!

am I missing something?

Contributor

jreback commented Mar 14, 2013

your example probably SHOULD work, but ix is quite tricky, I am not sure there are plans to change/fix it. could certainly bump this to 0.12. if you would like

slicing does work on integer multi-index just respects labels or positions depending on what you choose. Your example in this issue is good at showing he ambiguity!

am I missing something?

@lodagro

This comment has been minimized.

Show comment
Hide comment
@lodagro

lodagro Mar 14, 2013

Contributor

Ok, we agree that both df.ix[10, :] and df.ix[:, 10] should work. For me it is even fine to bump this to some day, i can work around it just fine, it is just something i noted and thought could be improved.

The label slicing with loc is something else, i don not think i am missing between label and position, it is all label ... however

In [38]: df
Out[38]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.799097 -0.450663 -0.003029  0.340621 -1.248213 -0.900263
   b -0.049115 -1.540385 -0.299996 -3.520201 -0.631406  1.036550
20 a -1.051028 -0.952631  2.114734 -0.285703 -1.346419  0.791299
   b -1.225570  1.063159  0.731514 -0.153996  0.382094  0.797084
30 a -1.176216  1.235405 -0.226777  0.852648  2.481304  0.587310
   b  1.786893 -0.042711  0.742734 -0.041659  2.544889  0.558397

In [40]: df.loc[20:30, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 'stop bound [29] is not in the [index]'

compare this to

In [43]: df
Out[43]: 
            5                   6                   7          
            a         b         a         b         a         b
5 a -1.312814 -0.839775  0.812328  0.041647  0.231441  0.439760
  b -0.102015  2.163313 -0.489461  0.931466  1.168450  1.134386
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877

In [44]: df.loc[6:7, :]
Out[44]: 
            5                   6                   7          
            a         b         a         b         a         b
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877
Contributor

lodagro commented Mar 14, 2013

Ok, we agree that both df.ix[10, :] and df.ix[:, 10] should work. For me it is even fine to bump this to some day, i can work around it just fine, it is just something i noted and thought could be improved.

The label slicing with loc is something else, i don not think i am missing between label and position, it is all label ... however

In [38]: df
Out[38]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.799097 -0.450663 -0.003029  0.340621 -1.248213 -0.900263
   b -0.049115 -1.540385 -0.299996 -3.520201 -0.631406  1.036550
20 a -1.051028 -0.952631  2.114734 -0.285703 -1.346419  0.791299
   b -1.225570  1.063159  0.731514 -0.153996  0.382094  0.797084
30 a -1.176216  1.235405 -0.226777  0.852648  2.481304  0.587310
   b  1.786893 -0.042711  0.742734 -0.041659  2.544889  0.558397

In [40]: df.loc[20:30, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 'stop bound [29] is not in the [index]'

compare this to

In [43]: df
Out[43]: 
            5                   6                   7          
            a         b         a         b         a         b
5 a -1.312814 -0.839775  0.812328  0.041647  0.231441  0.439760
  b -0.102015  2.163313 -0.489461  0.931466  1.168450  1.134386
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877

In [44]: df.loc[6:7, :]
Out[44]: 
            5                   6                   7          
            a         b         a         b         a         b
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 14, 2013

Contributor

i think you are right, I am treating the integer slice and expanding it to the integers in the range rather than the labels, so your first example should work, I will file a bug on this. note that it will be an INCLUSIVE range because these are labels

Contributor

jreback commented Mar 14, 2013

i think you are right, I am treating the integer slice and expanding it to the integers in the range rather than the labels, so your first example should work, I will file a bug on this. note that it will be an INCLUSIVE range because these are labels

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 14, 2013

Contributor

@lodagro I updated the example...thanks for the catch!

Contributor

jreback commented Mar 14, 2013

@lodagro I updated the example...thanks for the catch!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 18, 2013

Contributor

@lodagro this should be closed by #3055 right?

Contributor

jreback commented Dec 18, 2013

@lodagro this should be closed by #3055 right?

@lodagro

This comment has been minimized.

Show comment
Hide comment
@lodagro

lodagro Dec 19, 2013

Contributor

@jreback We discussed in fact two issues here. One being the ix inconsistency (the reason why this issue was opened), the other one loc failure on integer slices (which are labels iso positions). Loc one is resolved, ix seems to be the same.

Contributor

lodagro commented Dec 19, 2013

@jreback We discussed in fact two issues here. One being the ix inconsistency (the reason why this issue was opened), the other one loc failure on integer slices (which are labels iso positions). Loc one is resolved, ix seems to be the same.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jan 19, 2017

Contributor

Closing due to deprecation in #15113

Contributor

TomAugspurger commented Jan 19, 2017

Closing due to deprecation in #15113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment