Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing Regression in 0.13.0 #6394

Closed
dhirschfeld opened this issue Feb 18, 2014 · 6 comments · Fixed by #6396
Closed

Indexing Regression in 0.13.0 #6394

dhirschfeld opened this issue Feb 18, 2014 · 6 comments · Fixed by #6396
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@dhirschfeld
Copy link
Contributor

In pandas 0.12 the order you indexed a DataFrame didn't matter, which I think is the correct behaviour:

In [6]: df = pd.DataFrame({'A': 5*[np.zeros(3)], 'B':5*[np.ones(3)]})

In [7]: df
Out[7]: 

    A   B
0   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
1   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
2   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
3   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
4   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]

In [8]: df['A'].iloc[2]
Out[8]: array([ 0., 0., 0.])

In [9]: df.iloc[2]['A']
Out[9]: array([ 0., 0., 0.])

In [10]: pd.__version__
Out[10]: '0.12.0'

In [11]: assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])

In [12]: 

In pandas 0.13 if you index in a different order you can get a different type out which can be problematic for code expecting an array, especially because of the difference between array indexing and label indexing.

In [1]: df = pd.DataFrame({'A': 5*[np.zeros(3)], 'B':5*[np.ones(3)]})

In [2]: df
Out[2]: 

    A   B
0   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
1   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
2   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
3   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
4   [0.0, 0.0, 0.0] [1.0, 1.0, 1.0]
5 rows × 2 columns 

In [3]: df['A'].iloc[2]
Out[3]: array([ 0., 0., 0.])

In [4]: df.iloc[2]['A']
Out[4]: 
A 0
A 0
A 0
Name: 2, dtype: float64

In [5]: pd.__version__
Out[5]: '0.13.1'

In [6]: assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])
Traceback (most recent call last):

  File "<ipython-input-11-946e15564ee1>", line 1, in <module>
    assert type(df.ix[2, 'A']) == type(df['A'].iloc[2]) == type(df.iloc[2]['A'])

AssertionError
@jreback jreback closed this as completed Feb 18, 2014
@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

Storing lists of numpy arrays is not efficient nor really supported.
Chained indexing is to blame, see here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy'

which exposes how numpy creates / does not create a veiw

don't do it

@dhirschfeld
Copy link
Contributor Author

I know it is inefficient, I said as much in my post on the mailing list.

I don't care whether I am returned a view or a copy - I'm not trying to assign to the data.

Returning a different type dependent on the order of chaining is never a desirable outcome and hence is a bug. It's certainly a regression since the example shown above worked perfectly well in pandas 0.12.

@jorisvandenbossche
Copy link
Member

BTW, maybe related, there is also a difference between iloc and loc:

In [22]: df['A'].loc[2]
Out[22]:
2    0
2    0
2    0
Name: A, dtype: float64

In [23]: df['A'].iloc[2]
Out[23]: array([ 0.,  0.,  0.])

@cpcloud
Copy link
Member

cpcloud commented Feb 18, 2014

Is this really a regression? Seems strange that unsupported behavior would carry that label. I think the current 0.13 behavior makes more sense. Data frame isn't a generic blob to hold anything and everything. If you're relying on unsupported behavior then that isn't pandas' issue. I can see that @jorisvandenbossche example looks like a bug. Happy to help get around the need to store arrays inside of pandas objects.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

I fixed this in #6396; its a little bit of an odd use case and have to 'infer' a bit based on the results whether the container is actually hold a list/ndarray, but not too difficult

but to @cpcloud point.....in general storing list/np.arrays INSIDE of a frame is just asking for trouble and no real reason to do it.

We have talked about this from time-to-time; prob what you are looking for is either a Panel, or really a 'collection of DataFrames' that have say aligning ability. But that's not implemented. If you would like to show a realistic usecase maybe can take some ideas.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

fixed in master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
4 participants