df.stack() behaving differently between 0.19.2 and 0.20.1 #16323

dpsugasa · 2017-05-10T23:04:18Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


index = pd.DatetimeIndex(['2017-05-04', '2017-05-05', '2017-05-08', '2017-05-09',
               '2017-05-10'],
              dtype='datetime64[ns]',name = 'date', freq='B')
columns = pd.MultiIndex(levels=[['HSBA LN Equity', 'UCG IM Equity', 'ISP IM Equity'], ['LAST PRICE', 'HIGH', 'LOW']],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
data = np.array([[ 663.8, 672.5, 661.1, 15.97, 16.02, 15.49, 2.76, 2.768, 2.694],
              [ 658.6, 663.9, 656.0, 16.22, 16.48, 15.77, 2.842, 2.868, 2.77 ],
              [ 660.6, 664.1, 658.9, 16.01, 16.49, 15.94, 2.852, 2.898, 2.826],
              [ 664.9, 669.2, 662.5, 15.90, 16.41, 15.90, 2.848, 2.898, 2.842],
              [ 670.9, 673.4, 663.8, 16.09, 16.15, 15.59, 2.85,  2.888, 2.802]])
df = pd.DataFrame(data, columns=columns, index = index)

Problem description

Since switching to 0.20.1, when using df.stack(0), the output looks like this:

                              HIGH  LAST PRICE      LOW
date                                                   
2017-05-04 HSBA LN Equity  672.500     663.800  661.100
           UCG IM Equity     2.768       2.760    2.694
           ISP IM Equity    16.020      15.970   15.490
2017-05-05 HSBA LN Equity  663.900     658.600  656.000
           UCG IM Equity     2.868       2.842    2.770
           ISP IM Equity    16.480      16.220   15.770
2017-05-08 HSBA LN Equity  664.100     660.600  658.900
           UCG IM Equity     2.898       2.852    2.826
           ISP IM Equity    16.490      16.010   15.940
2017-05-09 HSBA LN Equity  669.200     664.900  662.500
           UCG IM Equity     2.898       2.848    2.842
           ISP IM Equity    16.410      15.900   15.900
2017-05-10 HSBA LN Equity  673.400     670.900  663.800
           UCG IM Equity     2.888       2.850    2.802
           ISP IM Equity    16.150      16.090   15.590

The columns change order and the tickers no longer correspond to the correct prices.

` Expected Output

0.19.2 maintains the correct hierarchy.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 35.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.1
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

dsm054 · 2017-05-10T23:34:41Z

Simpler case:

columns = pd.MultiIndex(levels=[['B', 'A'],
                                ['C', 'D']],
                        labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = pd.DataFrame(columns=columns, data=[range(4)])
assert df.loc[0, ("A","C")] == df.stack(0).loc[(0, "A"), "C"]

passes under 0.19.2 for me and fails under 0.20.1.

dsm054 · 2017-05-11T01:14:57Z

In 0.19.2, with the above dataframe, we have

In [8]: df.sortlevel(level=0, axis=1)
Out[8]: 
   B     A   
   C  D  C  D
0  0  1  2  3

In 0.20.1, we have

In [378]: df.sortlevel(level=0, axis=1)
/home/dsm/sys/miniconda3/envs/py36/bin/ipython:1: FutureWarning: sortlevel is deprecated, use sort_index(level= ...)
  #!/home/dsm/sys/miniconda3/envs/py36/bin/python
Out[378]: 
   A     B   
   C  D  C  D
0  2  3  0  1

or, to use the new syntax,

In [380]: df.sort_index(level=0, axis=1)
Out[380]: 
   A     B   
   C  D  C  D
0  2  3  0  1

That is, in 0.20.1, the sort actually affects the frame, and frame.columns and this.columns differ in _stack_multi_columns. As a result, I think this line:

    new_levels.append(frame.columns.levels[level_num])

is no longer justified, because the level order can have changed. I think that just using new_levels.append(level_vals) (taken from this after the sort) should suffice to repair it. Anyone else want to call dibs, or should I take a run at it?

sinhrks · 2017-05-11T01:20:30Z

Thanks for the report. It looks like a regression. I think @dsm054 's fix looks good.

)

) (pandas-dev#16325)

) (pandas-dev#16325) (cherry picked from commit b1ff291)

(cherry picked from commit b1ff291)

) (pandas-dev#16325)

ilmioalias · 2017-07-14T16:02:31Z

hi, i think there is still some problem

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd

# We create a MultiIndex
PAE = ['ITA', 'FRA']
VAR = ['A1', 'A2']
TYP = ['CRT', 'DBT', 'NET']
MI = pd.MultiIndex.from_product([PAE, VAR, TYP], names=['PAE', 'VAR', 'TYP'])

# We create a dataframe with multindex MI
V = [20, 10, 10, 40, 10, 30, 120, 110, 10, 140, 110, 30]
DF = pd.DataFrame(data=V, index=MI, columns=['VALUE'])

# We unstack the dataframe and drop level 0
DF = DF.unstack(['VAR', 'TYP'])
DF.columns = DF.columns.droplevel(0)
DF[('A0', 'NET')] = 9999

# We stack the dataframe
DF0 = DF.stack(['VAR', 'TYP'])
# DF0 is wrong
DF1 = DF.sort_index(axis=1).stack(['VAR', 'TYP'])
# DF1 is right

Problem description

data in DF0 doesn't correspond to original data before unstack and droplevel.

Expected Output

PAE  VAR  TYP
FRA  A0   NET    9999.0
     A1   CRT     120.0
          DBT     110.0
          NET      10.0
     A2   CRT     140.0
          DBT     110.0
          NET      30.0
ITA  A0   NET    9999.0
     A1   CRT      20.0
          DBT      10.0
          NET      10.0
     A2   CRT      40.0
          DBT      10.0
          NET      30.0
dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 3.19.0-61-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.18.1
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.6
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: 2.4.5 (dt dec mx pq3 ext)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

gfyoung · 2017-07-14T16:28:13Z

@ilmioalias : Can you try installing 0.20.2 and report back if the output matches for you now?

dsm054 · 2017-07-14T16:31:49Z

I can confirm that it looks like @ilmioalias has found a different failure mode not solved by the first fix, and obviously not caught by the original tests, which I thought were pretty deep. :-/

gfyoung · 2017-07-14T16:33:45Z

@dsm054 : Are you using 0.20.3 ? If so, could you also try 0.20.2 and see what you get?

dsm054 · 2017-07-14T16:35:32Z

Same under both, more's the pity.

gfyoung · 2017-07-14T16:38:13Z

Okay, could you file a separate issue then and cross-reference this one? Afterwards, feel free to patch this since you were the one who spearheaded the initial fix.

dsm054 · 2017-07-14T16:43:07Z

Sure, I'll take it. I'm annoyed that something managed to sneak through. 😒

Thanks for the report, @ilmioalias!!

gfyoung · 2017-07-14T16:44:38Z

I'm annoyed that something managed to sneak through.

Well, if we could write tests that ALWAYS covered EVERY use-case, then we wouldn't need to take contributions from anyone because our tests would be perfect. 😉

ilmioalias · 2017-07-17T08:45:42Z

Hi,
i tried pandas 0.20.2, it doesn't work. I obtained same 0.20.3 result.

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-61-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.2.0
Cython: 0.25.2
numpy: 1.11.1
scipy: 0.18.1
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.6
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: 2.4.5 (dt dec mx pq3 ext)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

sinhrks added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Bug Regression Functionality that used to work in a prior pandas version labels May 11, 2017

dsm054 added a commit to dsm054/pandas that referenced this issue May 11, 2017

BUG: Preserve data order when stacking unsorted levels (pandas-dev#16323

5fcb44d

)

dsm054 mentioned this issue May 11, 2017

BUG: Preserve data order when stacking unsorted levels (#16323) #16325

Merged

4 tasks

dsm054 added a commit to dsm054/pandas that referenced this issue May 11, 2017

BUG: Preserve data order when stacking unsorted levels (pandas-dev#16323

bd0eda2

)

jreback added this to the 0.20.2 milestone May 11, 2017

jreback closed this as completed in #16325 May 11, 2017

jreback pushed a commit that referenced this issue May 11, 2017

BUG: Preserve data order when stacking unsorted levels (#16323) (#16325)

b1ff291

pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017

BUG: Preserve data order when stacking unsorted levels (pandas-dev#16323

3170871

) (pandas-dev#16325)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

BUG: Preserve data order when stacking unsorted levels (pandas-dev#16323

3a25bb9

) (pandas-dev#16325) (cherry picked from commit b1ff291)

TomAugspurger pushed a commit that referenced this issue May 30, 2017

BUG: Preserve data order when stacking unsorted levels (#16323) (#16325)

b7c5e3b

(cherry picked from commit b1ff291)

stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017

BUG: Preserve data order when stacking unsorted levels (pandas-dev#16323

f1b03f6

) (pandas-dev#16325)

dsm054 mentioned this issue Jul 14, 2017

df.stack() still misbehaving in unsorted case #16925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.stack() behaving differently between 0.19.2 and 0.20.1 #16323

df.stack() behaving differently between 0.19.2 and 0.20.1 #16323

dpsugasa commented May 10, 2017

dsm054 commented May 10, 2017

dsm054 commented May 11, 2017

sinhrks commented May 11, 2017

ilmioalias commented Jul 14, 2017

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

ilmioalias commented Jul 17, 2017

INSTALLED VERSIONS

df.stack() behaving differently between 0.19.2 and 0.20.1 #16323

df.stack() behaving differently between 0.19.2 and 0.20.1 #16323

Comments

dpsugasa commented May 10, 2017

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

dsm054 commented May 10, 2017

dsm054 commented May 11, 2017

sinhrks commented May 11, 2017

ilmioalias commented Jul 14, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

dsm054 commented Jul 14, 2017

gfyoung commented Jul 14, 2017

ilmioalias commented Jul 17, 2017

INSTALLED VERSIONS

Output of `pd.show_versions()`

Output of `pd.show_versions()`