Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas unstack() unexpected behavior with multiindex row and column #28306

Closed
fmmirzaei opened this issue Sep 5, 2019 · 1 comment · Fixed by #32990
Closed

Pandas unstack() unexpected behavior with multiindex row and column #28306

fmmirzaei opened this issue Sep 5, 2019 · 1 comment · Fixed by #32990
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@fmmirzaei
Copy link

Code Sample, a copy-pastable example if possible

data = {
    ('effect_size', 'cohen_d', 'mean'): {
        ('m1', 'P3', '222'): 0.52,
        ('m1', 'A5', '111'): -0.07,
        ('m2', 'P3', '222'): -0.53,
        ('m2', 'A5', '111'): 0.05,
    },
    ('wilcoxon', 'z_score', 'stouffer'): {
        ('m1', 'P3', '222'): 2.2,
        ('m1', 'A5', '111'): -0.92,
        ('m2', 'P3', '222'): -2.0,
        ('m2', 'A5', '111'): -0.52,
    }
}
df = pd.DataFrame(data)
df.index.rename(['metric', 'bar', 'foo'], inplace=True)
df.unstack(['foo', 'bar'])

Problem description

The df looks like this before unstacking:

               effect_size wilcoxon
                   cohen_d  z_score
                      mean stouffer
metric bar foo                     
m1     A5  111       -0.07    -0.92
       P3  222        0.52     2.20
m2     A5  111        0.05    -0.52
       P3  222       -0.53    -2.00

by unstacking bar and foo, I had expected to see them as column indices, but that's not what happens. Instead foo and metric are unstacked, and bar is left stacked as a row index:

> df.unstack(['foo', 'bar'])

       effect_size                   wilcoxon                
           cohen_d                    z_score                
              mean                   stouffer                
foo            111         222            111        222     
metric          m1    m2    m1    m2       m1    m2   m1   m2
bar                                                          
A5           -0.07  0.05   NaN   NaN    -0.92 -0.52  NaN  NaN
P3             NaN   NaN  0.52 -0.53      NaN   NaN  2.2 -2.0

I got around the problem by doing the following, but I think the above behavior might be a bug.

Here's my workaround:

> print df.stack([0, 1, 2]).unstack(0).transpose()

bar             A5                   P3         
foo            111                  222         
       effect_size wilcoxon effect_size wilcoxon
           cohen_d  z_score     cohen_d  z_score
              mean stouffer        mean stouffer
metric                                          
m1           -0.07    -0.92        0.52      2.2
m2            0.05    -0.52       -0.53     -2.0

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.37-5+deb10u1rodete2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: None
setuptools: unknown
Cython: None
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: 3.5.2
numexpr: 2.6.10dev0
feather: None
matplotlib: 1.5.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0+unknown
pandas_datareader: None
gcsfs: None

@MatthieuLoustaunau
Copy link

For more precision, I would say it occurs because of a mismatch between the order of the levels in the dataframe and in the call (foo/bar vs bar/foo). If you call df.unstack(['bar', 'foo']) you will get the expected behaviour.

In your case an other workaround is to chain calls to unstack: df.unstack(['foo']).unstack(['bar'])

@jbrockmendel jbrockmendel added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Oct 16, 2019
phofl added a commit to phofl/pandas that referenced this issue Mar 25, 2020
@jreback jreback added this to the 1.1 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants