Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe.unstack().stack(0) erroneously changes data if indices not initially sorted #17225

Closed
joseortiz3 opened this issue Aug 11, 2017 · 5 comments
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@joseortiz3
Copy link
Contributor

joseortiz3 commented Aug 11, 2017

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> # Notice that the columns are not sorted below.
>>> df = pd.DataFrame(data=[[0,1],[2,3],[4,5],[6,7]],index = pd.MultiIndex.from_product([['a','b'],['A','B']]),columns=['d','c'])
>>> # The value of the element with indices 'b', 'B', and 'd'
>>> df.loc[('b','B'),'d']
6
>>> # The value of that *same* element now.
>>> df.unstack().stack(0).loc[('b','d'),'B']
7
>>> # What went wrong?
>>> df
     d  c
a A  0  1
  B  2  3
b A  4  5
  B  6  7
>>> # During some step, the indices got sorted but the values did not follow.
>>> df.unstack().stack(0)
     A  B
a d  1  3
  c  0  2
b d  5  7
  c  4  6

Problem description

With MultiIndexed DataFrames, it becomes convenient to unstack(level) and stack(level) your DataFrame until it has the indices you need to do what you want to do. These methods will sort your indices or levels if they were not sorted to begin with.

However, apparently I have discovered a case where the indices got sorted, but the values did not follow, resulting in the "shuffling" you see above.

Expected Output

The expected behavior is that these operations should not result in data scrambling / shuffling; a complete set of indices (like {'b','B','d'}) should always refer to the same value (in this case, 6).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Can't Repro labels Aug 11, 2017
@gfyoung
Copy link
Member

gfyoung commented Aug 11, 2017

@joseortiz3 : Thanks for reporting! One thing that we suggest users do is upgrade if possible to the latest version, as we may have already resolved the issue.

I can't reproduce this in 0.20.3 (latest). Can you upgrade and see if you can still reproduce?

@joseortiz3
Copy link
Contributor Author

joseortiz3 commented Aug 11, 2017

Oh jeez my bad. One sec.

Yes, it has already been fixed. My bad.

@gfyoung
Copy link
Member

gfyoung commented Aug 11, 2017

No worries! This was only added to the issue template recently. Just trying to help users help themselves if possible 😀

@joseortiz3
Copy link
Contributor Author

Yes, it has been fixed. Great, thanks.

It is still an issue currently in the default anaconda, which uses 0.20.1, just fyi

@gfyoung
Copy link
Member

gfyoung commented Aug 11, 2017

Absolutely, which is why we recommend upgrading if possible. Anaconda doesn't come with the most up-to-date versions generally.

Closing.

@gfyoung gfyoung closed this as completed Aug 11, 2017
@gfyoung gfyoung modified the milestones: 0.21.0, No action Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants