You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>>importpandasaspd>>># Notice that the columns are not sorted below.>>>df=pd.DataFrame(data=[[0,1],[2,3],[4,5],[6,7]],index=pd.MultiIndex.from_product([['a','b'],['A','B']]),columns=['d','c'])
>>># The value of the element with indices 'b', 'B', and 'd'>>>df.loc[('b','B'),'d']
6>>># The value of that *same* element now.>>>df.unstack().stack(0).loc[('b','d'),'B']
7>>># What went wrong?>>>dfdcaA01B23bA45B67>>># During some step, the indices got sorted but the values did not follow.>>>df.unstack().stack(0)
ABad13c02bd57c46
Problem description
With MultiIndexed DataFrames, it becomes convenient to unstack(level) and stack(level) your DataFrame until it has the indices you need to do what you want to do. These methods will sort your indices or levels if they were not sorted to begin with.
However, apparently I have discovered a case where the indices got sorted, but the values did not follow, resulting in the "shuffling" you see above.
Expected Output
The expected behavior is that these operations should not result in data scrambling / shuffling; a complete set of indices (like {'b','B','d'}) should always refer to the same value (in this case, 6).
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
@joseortiz3 : Thanks for reporting! One thing that we suggest users do is upgrade if possible to the latest version, as we may have already resolved the issue.
I can't reproduce this in 0.20.3 (latest). Can you upgrade and see if you can still reproduce?
Code Sample, a copy-pastable example if possible
Problem description
With
MultiIndex
edDataFrames
, it becomes convenient tounstack(level)
andstack(level)
your DataFrame until it has the indices you need to do what you want to do. These methods will sort your indices or levels if they were not sorted to begin with.However, apparently I have discovered a case where the indices got sorted, but the values did not follow, resulting in the "shuffling" you see above.
Expected Output
The expected behavior is that these operations should not result in data scrambling / shuffling; a complete set of indices (like {'b','B','d'}) should always refer to the same value (in this case, 6).
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: