Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Data is mismatched with labels after stack with MultiIndex columns #20945
df = pd.DataFrame([ ['DIM', 'A', 1, 2, 3, 4], ['DIM', 'B', 11, 22, 33, 44], ]) df.columns = ["dim1", "dim2", 'c', 'b', 'a', 'd'] df = df.set_index(["dim1", "dim2"]) df.columns.name = 'metrics' print(df) metrics c b a d dim1 dim2 DIM A 1 2 3 4 B 11 22 33 44 print(df.unstack('dim2').stack('metrics')) dim2 A B dim1 metrics DIM a 1 11 b 2 22 c 3 33 d 4 44
After stacking data by one level of MultiIndex columns, the output is unexpected. 'metrics' labels are sorted alphabetically, but data is show as is - without being matched with labels.
Either option shown below would be fine and expected:
dim2 A B dim1 metrics DIM a 3 33 b 2 22 c 1 11 d 4 44 dim2 A B dim1 metrics DIM c 1 11 b 2 22 a 3 33 d 4 44
I just wanted to add that there is a workaround for this issue, but it looks rather ugly:
df = df.stack('metrics').to_frame().unstack('dim2') df.columns = df.columns.droplevel() print(df) dim2 A B dim1 metrics DIM c 1 11 b 2 22 a 3 33 d 4 44
As you can see it produces one of the expected outputs mentioned above.
I believe the problem stems from the block below:
This appears related to #8809 and essentially is causing a disconnect as it sorts the data but the subsequent constructor does not use the sorted labels for the level that gets pushed to the row index.
Not terribly familiar with this code so investigating further but lmk if anyone has insights there