Pandas unstack() unexpected behavior with multiindex row and column #28306

fmmirzaei · 2019-09-05T22:43:48Z

Code Sample, a copy-pastable example if possible

data = {
    ('effect_size', 'cohen_d', 'mean'): {
        ('m1', 'P3', '222'): 0.52,
        ('m1', 'A5', '111'): -0.07,
        ('m2', 'P3', '222'): -0.53,
        ('m2', 'A5', '111'): 0.05,
    },
    ('wilcoxon', 'z_score', 'stouffer'): {
        ('m1', 'P3', '222'): 2.2,
        ('m1', 'A5', '111'): -0.92,
        ('m2', 'P3', '222'): -2.0,
        ('m2', 'A5', '111'): -0.52,
    }
}
df = pd.DataFrame(data)
df.index.rename(['metric', 'bar', 'foo'], inplace=True)
df.unstack(['foo', 'bar'])

Problem description

The df looks like this before unstacking:

               effect_size wilcoxon
                   cohen_d  z_score
                      mean stouffer
metric bar foo                     
m1     A5  111       -0.07    -0.92
       P3  222        0.52     2.20
m2     A5  111        0.05    -0.52
       P3  222       -0.53    -2.00

by unstacking bar and foo, I had expected to see them as column indices, but that's not what happens. Instead foo and metric are unstacked, and bar is left stacked as a row index:

> df.unstack(['foo', 'bar'])

       effect_size                   wilcoxon                
           cohen_d                    z_score                
              mean                   stouffer                
foo            111         222            111        222     
metric          m1    m2    m1    m2       m1    m2   m1   m2
bar                                                          
A5           -0.07  0.05   NaN   NaN    -0.92 -0.52  NaN  NaN
P3             NaN   NaN  0.52 -0.53      NaN   NaN  2.2 -2.0

I got around the problem by doing the following, but I think the above behavior might be a bug.

Here's my workaround:

> print df.stack([0, 1, 2]).unstack(0).transpose()

bar             A5                   P3         
foo            111                  222         
       effect_size wilcoxon effect_size wilcoxon
           cohen_d  z_score     cohen_d  z_score
              mean stouffer        mean stouffer
metric                                          
m1           -0.07    -0.92        0.52      2.2
m2            0.05    -0.52       -0.53     -2.0

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.37-5+deb10u1rodete2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: None
setuptools: unknown
Cython: None
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: 3.5.2
numexpr: 2.6.10dev0
feather: None
matplotlib: 1.5.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0+unknown
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

MatthieuLoustaunau · 2019-09-06T16:48:19Z

For more precision, I would say it occurs because of a mismatch between the order of the levels in the dataframe and in the call (foo/bar vs bar/foo). If you call df.unstack(['bar', 'foo']) you will get the expected behaviour.

In your case an other workaround is to chain calls to unstack: df.unstack(['foo']).unstack(['bar'])

…o. Add whats new

jbrockmendel added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Oct 16, 2019

simonjayhawkins mentioned this issue Mar 25, 2020

BUG: Multiple unstack using row index level labels and multi level columns DataFrame #32990

Merged

7 tasks

phofl added a commit to phofl/pandas that referenced this issue Mar 25, 2020

Add unittest for pandas-dev#28306 and pandas-dev#24729 and revert typ…

d617caa

…o. Add whats new

jreback added this to the 1.1 milestone Mar 26, 2020

TomAugspurger closed this as completed in #32990 Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas unstack() unexpected behavior with multiindex row and column #28306

Pandas unstack() unexpected behavior with multiindex row and column #28306

fmmirzaei commented Sep 5, 2019

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

MatthieuLoustaunau commented Sep 6, 2019

Pandas unstack() unexpected behavior with multiindex row and column #28306

Pandas unstack() unexpected behavior with multiindex row and column #28306

Comments

fmmirzaei commented Sep 5, 2019

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

MatthieuLoustaunau commented Sep 6, 2019

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS