Indexing Regression in 0.13.0 #6394

dhirschfeld · 2014-02-18T11:46:19Z

In pandas 0.12 the order you indexed a DataFrame didn't matter, which I think is the correct behaviour:

In [6]: df = pd.DataFrame({'A': 5*[np.zeros(3)], 'B':5*[np.ones(3)]})

In [7]: df
Out[7]: 

    A   B
0   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
1   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
2   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
3   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
4   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]

In [8]: df['A'].iloc[2]
Out[8]: array([ 0., 0., 0.])

In [9]: df.iloc[2]['A']
Out[9]: array([ 0., 0., 0.])

In [10]: pd.__version__
Out[10]: '0.12.0'

In [11]: assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])

In [12]:

In pandas 0.13 if you index in a different order you can get a different type out which can be problematic for code expecting an array, especially because of the difference between array indexing and label indexing.

In [1]: df = pd.DataFrame({'A': 5*[np.zeros(3)], 'B':5*[np.ones(3)]})

In [2]: df
Out[2]: 

    A   B
0   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
1   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
2   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
3   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
4   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
5 rows × 2 columns 

In [3]: df['A'].iloc[2]
Out[3]: array([ 0., 0., 0.])

In [4]: df.iloc[2]['A']
Out[4]: 
A 0
A 0
A 0
Name: 2, dtype: float64

In [5]: pd.__version__
Out[5]: '0.13.1'

In [6]: assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])
Traceback (most recent call last):

  File "<ipython-input-11-946e15564ee1>", line 1, in <module>
    assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])

AssertionError

jreback · 2014-02-18T12:10:33Z

Storing lists of numpy arrays is not efficient nor really supported.
Chained indexing is to blame, see here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy'

which exposes how numpy creates / does not create a veiw

don't do it

dhirschfeld · 2014-02-18T13:02:56Z

I know it is inefficient, I said as much in my post on the mailing list.

I don't care whether I am returned a view or a copy - I'm not trying to assign to the data.

Returning a different type dependent on the order of chaining is never a desirable outcome and hence is a bug. It's certainly a regression since the example shown above worked perfectly well in pandas 0.12.

jorisvandenbossche · 2014-02-18T14:15:22Z

BTW, maybe related, there is also a difference between iloc and loc:

In [22]: df['A'].loc[2]
Out[22]:
2    0
2    0
2    0
Name: A, dtype: float64

In [23]: df['A'].iloc[2]
Out[23]: array([ 0.,  0.,  0.])

cpcloud · 2014-02-18T14:27:11Z

Is this really a regression? Seems strange that unsupported behavior would carry that label. I think the current 0.13 behavior makes more sense. Data frame isn't a generic blob to hold anything and everything. If you're relying on unsupported behavior then that isn't pandas' issue. I can see that @jorisvandenbossche example looks like a bug. Happy to help get around the need to store arrays inside of pandas objects.

jreback · 2014-02-18T14:30:04Z

I fixed this in #6396; its a little bit of an odd use case and have to 'infer' a bit based on the results whether the container is actually hold a list/ndarray, but not too difficult

but to @cpcloud point.....in general storing list/np.arrays INSIDE of a frame is just asking for trouble and no real reason to do it.

We have talked about this from time-to-time; prob what you are looking for is either a Panel, or really a 'collection of DataFrames' that have say aligning ability. But that's not implemented. If you would like to show a realistic usecase maybe can take some ideas.

jreback · 2014-02-18T14:41:34Z

fixed in master

jreback closed this as completed Feb 18, 2014

jreback reopened this Feb 18, 2014

jreback added Bug labels Feb 18, 2014

jreback added this to the 0.14.0 milestone Feb 18, 2014

jreback mentioned this issue Feb 18, 2014

BUG: Regression in chained getitem indexing with embedded list-like from 0.12 (GH6394) #6396

Merged

jreback closed this as completed in #6396 Feb 18, 2014

jreback mentioned this issue Feb 21, 2014

indexing using iterator (Frame.ix[#]) changed - .ix does not work, .iloc does #6426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing Regression in 0.13.0 #6394

Indexing Regression in 0.13.0 #6394

dhirschfeld commented Feb 18, 2014

jreback commented Feb 18, 2014

dhirschfeld commented Feb 18, 2014

jorisvandenbossche commented Feb 18, 2014

cpcloud commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

Indexing Regression in 0.13.0 #6394

Indexing Regression in 0.13.0 #6394

Comments

dhirschfeld commented Feb 18, 2014

jreback commented Feb 18, 2014

dhirschfeld commented Feb 18, 2014

jorisvandenbossche commented Feb 18, 2014

cpcloud commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014