Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data is mismatched with labels after stack with MultiIndex columns #20945

Closed
pauleikis opened this issue May 3, 2018 · 4 comments
Closed

Data is mismatched with labels after stack with MultiIndex columns #20945

pauleikis opened this issue May 3, 2018 · 4 comments

Comments

@pauleikis
Copy link

@pauleikis pauleikis commented May 3, 2018

df = pd.DataFrame([
    ['DIM', 'A', 1, 2, 3, 4],
    ['DIM', 'B', 11, 22, 33, 44],
])
df.columns = ["dim1", "dim2", 'c', 'b', 'a', 'd']
df = df.set_index(["dim1", "dim2"])
df.columns.name = 'metrics'

print(df)
metrics     c   b   a   d
dim1 dim2                
DIM  A      1   2   3   4
     B     11  22  33  44

print(df.unstack('dim2').stack('metrics'))

dim2          A   B
dim1 metrics       
DIM  a        1  11
     b        2  22
     c        3  33
     d        4  44

Problem description

After stacking data by one level of MultiIndex columns, the output is unexpected. 'metrics' labels are sorted alphabetically, but data is show as is - without being matched with labels.

Expected Output

Either option shown below would be fine and expected:

dim2          A   B
dim1 metrics       
DIM  a        3  33
     b        2  22
     c        1  11
     d        4  44

dim2          A   B
dim1 metrics       
DIM  c        1  11
     b        2  22
     a        3  33
     d        4  44

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 38.2.3
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: 0.7.11.None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@pauleikis
Copy link
Author

@pauleikis pauleikis commented May 3, 2018

I just wanted to add that there is a workaround for this issue, but it looks rather ugly:

df = df.stack('metrics').to_frame().unstack('dim2')
df.columns = df.columns.droplevel()
print(df)

dim2          A   B
dim1 metrics       
DIM  c        1  11
     b        2  22
     a        3  33
     d        4  44

As you can see it produces one of the expected outputs mentioned above.

@WillAyd
Copy link
Member

@WillAyd WillAyd commented May 4, 2018

Hmm interesting. Strangely enough if you change the order in which you apply the methods it gets you to the right answer

In [7]: df.stack('metrics').unstack('dim2')
Out[7]: 
dim2          A   B
dim1 metrics       
DIM  c        1  11
     b        2  22
     a        3  33
     d        4  44
@gfyoung
Copy link
Member

@gfyoung gfyoung commented May 8, 2018

This is indeed a little weird! PR to patch is welcome!

@WillAyd
Copy link
Member

@WillAyd WillAyd commented May 8, 2018

I believe the problem stems from the block below:

if not this.columns.is_lexsorted():

This appears related to #8809 and essentially is causing a disconnect as it sorts the data but the subsequent constructor does not use the sorted labels for the level that gets pushed to the row index.

Not terribly familiar with this code so investigating further but lmk if anyone has insights there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants