Data is mismatched with labels after stack with MultiIndex columns #20945
df = pd.DataFrame([ ['DIM', 'A', 1, 2, 3, 4], ['DIM', 'B', 11, 22, 33, 44], ]) df.columns = ["dim1", "dim2", 'c', 'b', 'a', 'd'] df = df.set_index(["dim1", "dim2"]) df.columns.name = 'metrics' print(df) metrics c b a d dim1 dim2 DIM A 1 2 3 4 B 11 22 33 44 print(df.unstack('dim2').stack('metrics')) dim2 A B dim1 metrics DIM a 1 11 b 2 22 c 3 33 d 4 44
After stacking data by one level of MultiIndex columns, the output is unexpected. 'metrics' labels are sorted alphabetically, but data is show as is - without being matched with labels.
Either option shown below would be fine and expected:
dim2 A B dim1 metrics DIM a 3 33 b 2 22 c 1 11 d 4 44 dim2 A B dim1 metrics DIM c 1 11 b 2 22 a 3 33 d 4 44
The text was updated successfully, but these errors were encountered:
I just wanted to add that there is a workaround for this issue, but it looks rather ugly:
df = df.stack('metrics').to_frame().unstack('dim2') df.columns = df.columns.droplevel() print(df) dim2 A B dim1 metrics DIM c 1 11 b 2 22 a 3 33 d 4 44
As you can see it produces one of the expected outputs mentioned above.
I believe the problem stems from the block below:
This appears related to #8809 and essentially is causing a disconnect as it sorts the data but the subsequent constructor does not use the sorted labels for the level that gets pushed to the row index.
Not terribly familiar with this code so investigating further but lmk if anyone has insights there